Preview only show first 10 pages with watermark. For full document please download

Compression In Open Source Databases

   EMBED


Share

Transcript

Compression  in  Open  Source   Databases   Peter  Zaitsev   CEO,  Percona   Percona  Technical  Webinars   January  27th,  2016     About  the  Talk   2   A  bit  of  the  History   Approaches  to  Data  Compression   What  some  of  the  popular  systems   implement     Lets  Define  The  Term   3   Compression  -­‐    Any   Technique  to  make   data  size  smaller     A  bit  of  History     4   Early  Computers  were  too  slow  to  compress   data  in  SoMware   Hardware  Compression  (ie  Tape)   Compression  first  appears  for  non   performance  criQcal  data   We  did  not  need  it  much  for  space…   5   Welcome  to  the  modern  age   6   Data  Growth   outpaces  HDD   improvements   Powerful  CPUs   Cloud   Flash   Data  we  store   now     ExponenCal  Data  Size  Growth   7   Powerful  CPUs   8   High   Performance     MulQple   Cores   Compression  and  Decompression  Performance   9   •  From  hZps://github.com/inikep/lzbench   Compressor   Compress   Decompress   RaCo   memcpy   8368  MB/sec   8406  MB/sec   100%   Brotli  level  2   66  MB/sec   207  MB/sec   45%   Lz4   487  MB/sec   2452  MB/sec   62%   Lz4fast  level  17   964  MB/sec   3112  MB/sec   74%   Snappy   326  MB/sec   1147  MB/s   62%   Lzma  Level  2   10  MB/s   37  MB/s   39%   Zlib  level  1   39  MB/s   201  MB/s   49%   Zstd  level  1   249  MB/s   537  MB/s   49%   Flash  (Solid  State)   10   Disk  space  is  more  costly  than  for  HDDs   Write  Endurance  is  expensive   Want  to  write  less  data   Decent  at  handling  fragmentaQon   Cloud   11   Pay  for  Space   Pay  for  IOPS   More  limited   Storage   Performance   Network   Performance   may  be  limited   Data  we  store  in  Databases   12   • Text   Modern   • JSON     Data   • XML   Compresses   •  T ime   S eries   D ata   Well!   • Log  Files   13   IntroducQon  into  a  ways  of  making  your  data  smaller   COMPRESSION  BASICS   Lossy  and  Lossless   14   Database  generally  use   Lossless  Compression   Lossy  compression  done  on   the  applicaQon  level   Some  ways  of  geQng  data  smaller   15   Layout   OpQmizaQons   “Encoding”   DicQonary   Compression   Block   Compression   Layout  OpCmizaCons   16   Column  Store  versus  Row  Store   Hybrid  Formats   Variable  Block  Sizes   Encoding   17   Depends  on  Data  Type  and  Domain   Delta  Encoding,  Run  Length  Encoding  (RLE)   Can  be  faster  than  read  of  uncompressed  data   UTF8  (strings)  and    VLQ  (Integers)   Index  Prefix  Compression     DicConary  Compression   18   Replacing   frequent  values   with  DicQonary   Pointers   Kind  of  like  STL   String   ENUM  type  in   MySQL     Block  Compression   19   Compress  “block”  of  data  so  it  is  smaller  for   storage   Finding  PaZerns  in  Data  and  Efficiently   encoding  them   Many  Algorithms  Exist:  Snappy,  Zlib,  LZ4,   LZMA   Block  Compression  Details   20   Compression  rate  highly  depends  on  data   Compression  rate  depends  on  block  size   Speed  depends  on  block  size  and  data   Block  Size  Dependence  (by  Leif  Walsh)   21   There  is  no  one  size  fits  all   22   Typically  Compression   Algorithm  can  be  selected   OMen  with  addiQonal   seongs   23   Where  do  we  compress  data  and  how  do  we  do  that   WHERE  AND  HOW   Where  to  Compress  Data     24   In  Memory  ?   In  the  Database  Data  Store  ?   As  Part  of  File  System  ?   Storage  Hardware  ?   ApplicaQon  ?   Compression  in  Memory   25   Reduce  amount  of   memory  needed   for  same  working   set   Reduce  IO  for   Fixed  amount  of   Memory   Typically  in-­‐ Memory   Performance  Hit   Encoding/ DicQonary   Compression  are   good  fit   Database  Data  Store   26   Reduce  Database   Size  on  Disk   Works  with  all  file   systems  and  storage   With  OS  cache  can   be  used  as  In-­‐ Memory   compression  variant   Dealing  with   fragmentaQon  is   common  issue   Compression  on  File  System  Level   27   Works  with  all   Databases/ Storage  Engines   Performance   Impact  can  be   significant   Logical  Space   on  disk  is  not   reduced   ZFS   Compression  on  Storage  Hardware   28   Hardware   Dependent   Does  not  reduce   space  on  disk   Can  result  in   Performance   Gains  rather  than   free  space  (SSD)   Can  become  a   choke  point   By  ApplicaCon   29   No  Database  Support  needed   Reduce  Database  Load  and  Network  Traffic   ApplicaQon  may  know  more  about  data   More  Complexity   Give  up  many  DBMS  features  (search,  index)   30   What  makes  database  system  to  do  well  with  compression   DESIGN  CONSIDERATIONS   The  Goal   31   Minimize  NegaQve   Impact  for  User   OperaQons  (Reads  and   Writes)   Design  Principles   32   Fast   Decompression   Compression  in   Background   Parallel   Compression/ Decompression   Reduce  need  of   Re-­‐Compression   on  Update   Choosing  Block  Size   33   Large  Blocks   Small  Blocks   • Most  efficient   for   compression     • Bulky  Read   Writes   • Fastest  to   Decompress   • Best  for  point   lookups   34   What  Database  systems  Really  do  with  Compression   IMPLEMENTATION  EXAMPLES   MySQL  “Packed”  MyISAM   35   Compress  table  “offline”  with  myisampack   Table  Becomes  Read  Only   Variety  of  compression  methods  are  used   Only  data  is  compressed,  not  indexes   Note  MyISAM  support  index  prefix  compression  for  all  indexes   MySQL  Archive  Storage  Engine   36   Does  not  support  indexes     EssenQally  file  of  rows  with  sequenQal   access   Uses  zlib  compression     Innodb  Table  Compression   37   Available  Since  MySQL  5.1   Pages  compressed  using  zlib   Compressed  page  target  (1K,  4K,  8K)  has  to  be  set     Both  Compressed  and  Uncompressed  pages  can  be  cached  in  Buffer  Pool   Per  Page  “log”  of  changes  to  avoid  recompression   Extenrally  Stored  BLOBs  are  compressed  as  single  enQty   Innodb  Transparent  Page  Compression   38   Available  in  MySQL  5.7   Zib  and  LZ4  Compression   Compresses  pages  as  they  are  wriZen  to  disk   Free  space  on  the  page  is  given  back  using  “hole  punching”   Originally  designed  to  work  with  FusionIO  NVMFS   Can  cause  problems  for  current  filesystem  due  to  very  high  hole  number   Disk  usage  (Linkbench  data  set  by  Sunny  Bains)   39   Performance  on  Fast  SSD  (FusionIO  NVMFS)   40   Results  on  Slower  SSD  (Intel  730*2,  EXT4)   41   Fractal  Trees  Compression   42   Available  as  Storage  Engine  for  MySQL  and  MongoDB   Can  use  many  compression  libraries     Tunable  Compression  Block  Size   Reduce  Re-­‐Compression  due  to  message  buffering   Can  get  a  lot  of  compression   43   MongoDB  WiredTiger  Storage  Engine   44   Engine  Has  many  compression  seongs     Indexes  are  using  Index  Prefix  Compression   Data  Pages  can  be  compressed  using   zlib,lz4  or  Snappy   Compression  Size  (results  by  Asya  Kamsky)   45   Compression  in  RocksDB   46   RocksDB  –  LSM  Based  Storage  Engine  for  MongoDB  and   MySQL   LSM  works  very  well  with  compression   Supports,  zlib,  lz4,  bzip2  compression   Can  use  different  compression  methods  for  different  Levels   in  LSM   Compression  results  from  Mike  Kania   47   PostgreSQL   48   Uses  compression  by  default  with  TOAST   2KB  (default)  or  longer  Strings,  BlOBs   Unlike  Innodb  External  Storage  is  not  required  for   Compression   Recommended  to  use  File  system  compression  ie  ZFS  if   Compression  is  Desired   Summary   49   Compression  is  Important  in  Modern  Age   Consider  it  for  your  system   Many  different  techniques  are  used  to  make  data  smaller   by  databases   Compression  support  is  rapidly  changing  and  improving   Percona  Live   Data  Performance  Conference   •  April  18-­‐21  in  Santa  Clara,  CA  at  the  Santa  Clara   ConvenQon  Center   •  Register  with  code  “WebinarPL”  to  receive  15%   off  at  registraQon   •  MySQL,  NoSQL,  Data  in  the  Cloud     www.perconalive.com       www.percona.com   51   Thank You! Peter  Zaitsev   [email protected]   hZps://www.linkedin.com/in/peterzaitsev