Transcript
Compression in Open Source Databases
Peter Zaitsev CEO, Percona Percona Technical Webinars January 27th, 2016
About the Talk 2
A bit of the History Approaches to Data Compression What some of the popular systems implement
Lets Define The Term 3
Compression -‐ Any Technique to make data size smaller
A bit of History 4
Early Computers were too slow to compress data in SoMware Hardware Compression (ie Tape) Compression first appears for non performance criQcal data
We did not need it much for space… 5
Welcome to the modern age 6
Data Growth outpaces HDD improvements
Powerful CPUs
Cloud
Flash
Data we store now
ExponenCal Data Size Growth 7
Powerful CPUs 8
High Performance
MulQple Cores
Compression and Decompression Performance 9
• From hZps://github.com/inikep/lzbench Compressor
Compress
Decompress
RaCo
memcpy
8368 MB/sec 8406 MB/sec
100%
Brotli level 2
66 MB/sec
207 MB/sec
45%
Lz4
487 MB/sec
2452 MB/sec
62%
Lz4fast level 17 964 MB/sec
3112 MB/sec
74%
Snappy
326 MB/sec
1147 MB/s
62%
Lzma Level 2
10 MB/s
37 MB/s
39%
Zlib level 1
39 MB/s
201 MB/s
49%
Zstd level 1
249 MB/s
537 MB/s
49%
Flash (Solid State) 10
Disk space is more costly than for HDDs Write Endurance is expensive Want to write less data Decent at handling fragmentaQon
Cloud 11
Pay for Space
Pay for IOPS
More limited Storage Performance
Network Performance may be limited
Data we store in Databases 12
• Text Modern • JSON Data • XML Compresses • T ime S eries D ata Well! • Log Files
13
IntroducQon into a ways of making your data smaller
COMPRESSION BASICS
Lossy and Lossless 14
Database generally use Lossless Compression Lossy compression done on the applicaQon level
Some ways of geQng data smaller 15
Layout OpQmizaQons
“Encoding”
DicQonary Compression
Block Compression
Layout OpCmizaCons 16
Column Store versus Row Store Hybrid Formats Variable Block Sizes
Encoding 17
Depends on Data Type and Domain Delta Encoding, Run Length Encoding (RLE) Can be faster than read of uncompressed data UTF8 (strings) and VLQ (Integers) Index Prefix Compression
DicConary Compression 18
Replacing frequent values with DicQonary Pointers
Kind of like STL String
ENUM type in MySQL
Block Compression 19
Compress “block” of data so it is smaller for storage Finding PaZerns in Data and Efficiently encoding them Many Algorithms Exist: Snappy, Zlib, LZ4, LZMA
Block Compression Details 20
Compression rate highly depends on data Compression rate depends on block size Speed depends on block size and data
Block Size Dependence (by Leif Walsh) 21
There is no one size fits all 22
Typically Compression Algorithm can be selected OMen with addiQonal seongs
23
Where do we compress data and how do we do that
WHERE AND HOW
Where to Compress Data 24
In Memory ? In the Database Data Store ? As Part of File System ? Storage Hardware ? ApplicaQon ?
Compression in Memory 25
Reduce amount of memory needed for same working set
Reduce IO for Fixed amount of Memory
Typically in-‐ Memory Performance Hit
Encoding/ DicQonary Compression are good fit
Database Data Store 26
Reduce Database Size on Disk
Works with all file systems and storage
With OS cache can be used as In-‐ Memory compression variant
Dealing with fragmentaQon is common issue
Compression on File System Level 27
Works with all Databases/ Storage Engines
Performance Impact can be significant
Logical Space on disk is not reduced
ZFS
Compression on Storage Hardware 28
Hardware Dependent
Does not reduce space on disk
Can result in Performance Gains rather than free space (SSD)
Can become a choke point
By ApplicaCon 29
No Database Support needed Reduce Database Load and Network Traffic ApplicaQon may know more about data More Complexity Give up many DBMS features (search, index)
30
What makes database system to do well with compression
DESIGN CONSIDERATIONS
The Goal 31
Minimize NegaQve Impact for User OperaQons (Reads and Writes)
Design Principles 32
Fast Decompression
Compression in Background
Parallel Compression/ Decompression
Reduce need of Re-‐Compression on Update
Choosing Block Size 33
Large Blocks
Small Blocks
• Most efficient for compression • Bulky Read Writes
• Fastest to Decompress • Best for point lookups
34
What Database systems Really do with Compression
IMPLEMENTATION EXAMPLES
MySQL “Packed” MyISAM 35
Compress table “offline” with myisampack Table Becomes Read Only Variety of compression methods are used Only data is compressed, not indexes Note MyISAM support index prefix compression for all indexes
MySQL Archive Storage Engine 36
Does not support indexes EssenQally file of rows with sequenQal access Uses zlib compression
Innodb Table Compression 37
Available Since MySQL 5.1 Pages compressed using zlib Compressed page target (1K, 4K, 8K) has to be set Both Compressed and Uncompressed pages can be cached in Buffer Pool Per Page “log” of changes to avoid recompression Extenrally Stored BLOBs are compressed as single enQty
Innodb Transparent Page Compression 38
Available in MySQL 5.7 Zib and LZ4 Compression Compresses pages as they are wriZen to disk Free space on the page is given back using “hole punching” Originally designed to work with FusionIO NVMFS Can cause problems for current filesystem due to very high hole number
Disk usage (Linkbench data set by Sunny Bains) 39
Performance on Fast SSD (FusionIO NVMFS) 40
Results on Slower SSD (Intel 730*2, EXT4) 41
Fractal Trees Compression 42
Available as Storage Engine for MySQL and MongoDB Can use many compression libraries Tunable Compression Block Size Reduce Re-‐Compression due to message buffering
Can get a lot of compression 43
MongoDB WiredTiger Storage Engine 44
Engine Has many compression seongs Indexes are using Index Prefix Compression Data Pages can be compressed using zlib,lz4 or Snappy
Compression Size (results by Asya Kamsky) 45
Compression in RocksDB 46
RocksDB – LSM Based Storage Engine for MongoDB and MySQL LSM works very well with compression Supports, zlib, lz4, bzip2 compression Can use different compression methods for different Levels in LSM
Compression results from Mike Kania 47
PostgreSQL 48
Uses compression by default with TOAST 2KB (default) or longer Strings, BlOBs Unlike Innodb External Storage is not required for Compression Recommended to use File system compression ie ZFS if Compression is Desired
Summary 49
Compression is Important in Modern Age Consider it for your system Many different techniques are used to make data smaller by databases Compression support is rapidly changing and improving
Percona Live Data Performance Conference • April 18-‐21 in Santa Clara, CA at the Santa Clara ConvenQon Center • Register with code “WebinarPL” to receive 15% off at registraQon • MySQL, NoSQL, Data in the Cloud www.perconalive.com
www.percona.com
51
Thank You! Peter Zaitsev
[email protected] hZps://www.linkedin.com/in/peterzaitsev