Transcript
Evaluating Cloud Storage Strategies James Bottomley; CTO, Server Virtualization
Introduction to Storage • Attachments: - Local (Direct – cheap) • SAS, SATA
- Remote (SAN, NAS – expensive) • FC • net
• Types - Block • Spinning Disk Drive • SSD • RAID unit
- File • NFS • CEPH
- Object • RADOS • PCS Profit from the cloud™
| 2
Storage Performance Comparison
Profit from the cloud™
| 3
Storage Cost Comparison
Profit from the cloud™
| 4
A Closer Look at the Terms • Block device - A unit of storage - May be divided inflexibly (by partitioning) - Usually locally attached, but may be on a SAN
• File based Storage - Exports views of a filesystem via NFS, CIFS or other protocols - Is flexible • storage in views can be expanded and contracted on the fly
- Suffers from metadata issues on the server
• Object Storage - Really just means a flexible block device - May be expanded and contracted on the fly - Easily administrable (unlike LUN partitioning in SANs)
Profit from the cloud™
| 5
Storage Types Comparison Cloud Utility
• Simple Web API • No easy way to update objects • Slow
• CEPH, Gluster • Object Size tuning problem
• Tuned to disk image size objects • Designed for rapid update • Scalable B/W
• Inelastic • Hard to Aggregate • Attached to individual systems
• • • •
• Based on SAN • Limited Scaling
Slightly Elastic Fixed size Good B/W Dedicated network
Hosting Utility Profit from the cloud™
| 6
Object vs File and the Metadata Problem • A large number of Cloud storage systems are file based - CEPH, Gluster
• The specific problem is that updating any file requires a change in the metadata -
This produces both a hotness in the journal As well as locking hierarchy issues And communication with the metadata server All of which slow the operations down
• Object storage only uses metadata when objects are resized, created or destroyed - Using a fixed size object incurs no metadata overhead whatsoever
• So objects providing virtual environment roots allows efficient embedded filesystems with zero metadata overhead Profit from the cloud™
| 7
FUSE Issues • Fuse is the Linux Userspace Filesystem • Main problem is it’s incredibly SLOW • However, it is very useful, so a large number of cloud filesystems use it - Gluster
• Parallels originally avoided using it. • However, now we’ve decided we’ll fix it for everyone • Parallels engineers are currently interacting with the linux filesystems and fuse lists • Object is to add write caching and mtime fixes to accelerate fuse • Tests show we can get ~95% of the performance of a natively written filesystem
Profit from the cloud™
| 8
Consistency • Strong Consistency is hard to achieve in clusters - Strong Consistency means that all updates are seen immediately after they are committed - Strong consistency is most often violated across cluster reconfigurations - Ironically, this is precisely when you usually need it (HA) - Sheepdog, CEPH, PStorage
• Eventual Consistency is the usual norm - Means that all updates are eventually seen, but may not be immediately visible after they are committed - SWIFT, Gluster (does have a much slower strong consistency quorum enforcement mode)
• Weak Consistency - Does not guarantee write ordering and visibility - Too weak to be useful for most cloud storage
Profit from the cloud™
| 9
Performance and Scalability • Cloud storage must be designed to scale not just per node, but also per Virtual Environment per node • This requires there be no bottlenecks connecting a virtual environment to storage - Sheepdog problem: it uses a single threaded per-node gateway process causing its scalability per VE to be poor
• Ideally, a direct connection should be made between the virtual environment using the object and the storage providing it with no intermediate broker - Or using an intermediate broker tuned for scalability
• Chunking (large block size for objects) also improves performance
Profit from the cloud™
| 10
Requirements for Hosting Storage • The Cardinal hosting requirement is that existing local storage should be repurposed as generic object based storage for 1. Supporting Existing Hosting Environments and additional services 2. Enabling the provision of Cloud Services
• Equating to the technical requirements 1. Performance must be wire speed SATA (100MB/s) •
Tuned exactly for GB objects containing small files
2. Storage must be object based to avoid metadata issues 3. Objects should be capable of rapid random read/write updates 4. Storage bandwidth should scale linearly with the cluster
Profit from the cloud™
| 11
Simple Requirements for Additional Benefits • Hosting Enhancements 1. Free storage from individual nodes • Easy, fast migration of Virtual Environments • High Availability
2. Simple and Efficient resizing with assist for legacy roots (ext3) • Makes storage easier to sell in increments
3. Cloning and Snapshotting • Value add for templating block based roots • Permits easy backup
4. Redundancy • Allows different storage SLAs for different prices
• Cloud Enhancements (Ideal Storage Solution) 1. Dropbox like services 2. Storage as a Service (like S3) 3. Storage on Demand 4. Tiered Storage Pricing Profit from the cloud™
| 12
Ideal solution • Technical Specs - Metadata is the key to improving performance - Large Static objects with rapid updates have fixed metadata - 100MB/s performance over gigabit ethernet (no 10GE requirement)
• Avoid - Anything like a filesystem (CEPH, Gluster) because of • Locking problems • Speed issues with per file need to consult metadata
- Anything using FUSE (Gluster) • At least anything using FUSE without the Parallels acceleration patches
- Anything with a single threaded connection multiplexor (sheepdog) • Per cluster is worse (kills all scalability) • Per node is still bad (kills VE scalability)
Profit from the cloud™
| 13
Introducing Parallels Cloud Storage • Why Choose Us? - We’re the experts in the field (we studied the problem) - We fixed FUSE - We redid the Linux loop device to work efficiently for virtual environment roots • In collaboration with Oracle who did the Direct I/O patches
- Loop device also modified to do snapshotting and legacy filesystem resizing. - All the necessary infrastructure patches are upstream in linux • Or are moving that way
• What we provide - Complete leverage of existing local node storage - Strong Consistency and Redundancy - Wire speed transfers because of optimised data architecture • Up to 100MB/s/node over 1GigE
- Hot object tiering and SSD caching Profit from the cloud™
| 14
Parallels Cloud Storage Architecture
Profit from the cloud™
| 15
Future Features • Chunk Server based snapshotting • De-duplication • Thin Provisioning - Actual storage size can appear much larger than in-use backing store because of sparsity of objects - Also provides ability to do dynamic in-place upgrades of actual storage capacity
• Innovative redundancy algorithms • Geographic Object Replication for advanced disaster recovery
Profit from the cloud™
| 16
Conclusions • Getting Cloud storage right for current hosting needs is not a simple problem - The basic construction of many cloud storage offerings is unsuitable to hosting provider environments
• Parallels has devoted considerable study and effort to mapping the needs of hosters on to cloud storage • Parallels has studied the strengths and weaknesses of current cloud storage offerings and incorporated the best into our cloud storage offerings - While attempting to eliminate all the negative issues - And improve performance
• Parallels will leverage (and enhance) open source to achieve the best cloud storage system for hosters
Profit from the cloud™
| 17