Transcript
SUSE Enterprise Storage Highly Scalable Software Defined Storage
Māris Smilga
Storage Today •
•
Traditional Storage ‒
Arrays of disks with RAID for redundancy
‒
SANs based on Fibre Channel connectivity
Total System purchase ‒
Control integrated with capacity
‒
Hardware and software bundled
‒
System costs amortized against capacity
Gartner "Magic Quadrant for General-Purpose Disk Arrays," 21 March 2013, G00237599 2
Storage Tomorrow •
•
•
3
Software Defined Storage ‒
Separate control from capacity
‒
Enable self-service provisioning
Commoditization ‒
Use standard servers for storage control
‒
Off the shelf drives ‒
Example: 2 TB 7200 rpm “enterprise drive”
‒
NetApp certified - $699
‒
Seagate - $199
Pervasive Flash ‒
Performance gets cheaper
‒
No need for 15K rpm hard drives
Enterprise Data Capacity Utilization 1-3%
15-20%
Tier 1 High-value, OLTP, Revenue Generating
20-25%
Tier 2 Backup/Recovery, Reference Data, Bulk Data
50-60%
Tier 3 Object, Archive, Compliance Archive, Long-term Retention
of Enterprise Data
Source: Horison Information Strategies - Fred Moore 5
Tier 0 Ultra High Performance
SUSE Enterprise Storage
SUSE Storage Product Positioning High-end Disk Array Mid-range Array Fully Featured NAS Device Mid-range NAS Entry-level Disk Array JBOD Storage 7
SUSE Enterprise Storage
SUSE Storage Feature Overview Scalability Upper Limit: EBs, Incremental expansion
100% Software Defined Storage
8
No Single Point of Failure
Self managing, Self healing
Traditional Storage vs. SUSE Storage
9
Traditional Storage
Ceph
Proprietary hardware
Commodity hardware
Proprietary software
Open Source software
Life-cycle enforced by vendor
Hardware
Hard scale limit
Exabyte scale
$$$$
$$
APP
Librados A library to directly access RADOS (C,C++, Java, Python, Ruby and PHP)
APP
Host / VM
Client
RADOSGW
RBD
CEPH FS
A REST Gateway compatible with Amazon S3 and OpenStack Swift
Object based distributed block device with support for Linux kernel and Qemu/KVM
POSIX compliant distributed file system with support for Linux
RADOS Reliable Autonomous Distributed Object Store comprised of self-healing, self-managing intelligent storage nodes 10
The Components of the Ceph cluster Object Storage Daemon (OSD) ► Responsible
for storing objects on a local file system and providing access to them over the network
OSD OSD OSD OSD FS
FS
FS
FS
Disk Disk Disk Disk
File System ► XFS,
Disk
M
► Local
M M 11
Btrfs, Ext4
SATA or SAS disk
The Components of the Ceph cluster A Ceph Monitor
M
► Maintains
state, incl: ►monitor ►OSD
map,
map,
►Placement ►CRUSH
Group (PG) map,
map.
Disk
M
► Local
M M 12
maps of the cluster
SATA or SAS disk
Accessing the RADOS Cluster
Voilà, a Small RADOS Cluster
M
M M
14
Application Access Application librados
Socket
M
M M
15
Application Access Application
Application
librados
REST radosgw
Socket
librados Socket
M
M M
16
RADOS Block Device Linux Host
krbd
M
librados
M M
17
RADOS Block Device
19
•
Disk images are striped across (parts of) the cluster
•
Supports ‒
Snapshot and rollback
‒
COW cloning
‒
Thin provisioning
CRUSH: Data Placement
Several Ingredients •
•
24
Basic Idea ‒
Coarse grained partitioning of storage supports policy based mapping (don't put all copies of my data in one rack)
‒
Topology map and Rules allow clients to “compute” the exact location of any storage object
Three conceptual components ‒
Pools
‒
Placement groups
‒
CRUSH: deterministic decentralized placement algorithm
Self-Healing
Self-Healing swimmingpool/rubberduck
38.b0b
M
Monitors detect a dead OSD
M M
28
Self-Healing swimmingpool/rubberduck
38.b0b
M
Monitors allocate other OSDs to PG and update CRUSH map
M M
29
Self-Healing swimmingpool/rubberduck
38.b0b Monitors initiate recovery
M
M M
30
Self-Healing swimmingpool/rubberduck
38.b0b Future writes update the new replica
M
M M
31
Redundancy Levels •
•
RADOS offers standard 1:N replication ‒
Cheap in terms of compute power
‒
Expensive in terms of disk space and write bandwidth
In addition, RADOS offers Erasure Coding ‒
32
Essentially, you compute “parity” data over M disks that allows you to recover data from a lost disk ‒
A bit like RAID 5, which uses XOR for 1 block of parity for 2 blocks of data
‒
Except that erasure code algorithms are more space efficient and have better recovery properties
‒
Reasonably cheap in the read case
‒
Very expensive in write and recovery case
‒
Best suited for read-only/cold data (eg archiving)
Summary SAVINGS: Total cost of ownership •
Reduced CAPEX expenditures
•
Reduced OPEX expenditures
•
Ease of management
FLEXIBILITY: Adaptability to evolving business needs •
Reduced dependency upon proprietary “Locked In” storage
CONFIDENCE: Reliability and availability •
33
Leverage SUSE world-class support and services
Questions?
Thank you.
34
For more information please visit our website: www.suse.com
Thank you.
35
Unpublished Work of SUSE. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.
Backup Slides