Transcript
Unistore: A Unified Storage Architecture for Cloud Computing Progress and Proposal
Presenter: Wei Xie Project Members: Wei Xie, Jiang Zhou, and Yong Chen Data-Intensive Scalable Computing Laboratory (DISCL) Computer Science Department Texas Tech University We are grateful to the Nimboxx and the Cloud and Autonomic Computing site at Texas Tech University for the valuable support for this project.
Unistore Overview q
q
To build a unified storage architecture (Unistore) for Cloud storage systems with the co-existence and efficient integration of heterogeneous HDDs and SCM (Storage Class Memory) devices Scalable and high performance storage for virtual machine
Scalability, manageability, elasticity, Performance, data reliability, fault tolerance, Energy efficiency, low cost
Data
Characterization Component Data Placement Component
Heterogeneous Placement Algorithm
VM Provision Component
Heterogeneous Replication Algorithm
Deduplication Component
Data Migration/Cache Component 1
Project Highlights q
Two papers published: § §
q
One paper submitted: §
q
SUORA paper published at IEEE NAS’16 Hierarchical Consistent Hashing at IEEE ISPA’16 Strategy Consistent Hashing submitted to PPoPP’17
Ongoing work and New proposal § §
Elasticity in decentralized storage Deduplication of SSD cache in cloud storage
2
Consistent Hashing for Heterogeneous Storage Strategy CH
p
Virtual node
C2
B2
p
A1 c1 b1
capacity bandwidth
p
B1
A2
c2 b2
C1 c3 b3
capacity bandwidth
Adopt a unified hashing ring to manage heterogeneous nodes Maintain attributes of each node Use a selection strategy for mapping nodes
capacity bandwidth
§
Diagram of HiCH algorithm
Hierachical CH New data
1st copy
MetaTree of objects on SSDs
Read data
SSDs ring When SSD is full
§ §
p
True
Data placement strategy §
When data become hot
Location strategy Uniform strategy Performance strategy
HDDs ring
§
When new data objects come, the first choice is to place the data on SSD ring Data movement between SSD and HDD rings
16
3
Elasticity in Scale-out Storage p
Size-up and size-down in elastic storage n n
p
Size-up to meet I/O demand Size-down to save energy or free up resource for other use
Agility to scale is critical to better satisfy the goals 4 hours Facebook Trace
[1]L. Xu et al, “SpringFS: Bridging Agility and Performance in Elastic Distributed Storage.”, FAST’14
4
Consistent Hashing based Storage p
Consistent hashing n n n
For decentralized distributed system Sizing up and down without completely changing data layout Sizing down multiple nodes will need to migrate data first
5
Elasticity in CH based Storage p
Challenges to improve agility n
n
n
Scaling-up: p data movement to redistribute data p If node is turned down but not failed, there is no need to move data when it is turned on again Scaling-down: p keep data available and reduce data movement Data consistency and failure handling p The redundancy level may decrease when scaling down p Keep data replicas consistent across multiple nodes
6
Elasticity in CH based Storage p
Our solution n Primary and secondary nodes: primary keeps one copy of whole data set, while secondary keep replicas n Size-down p p
n
p
n
Secondary nodes
Re-replicate to ensure data availability Track modified data so that the data in the shutdown nodes can be updated when turned on
9
10
4
3
D1
Size-up p
n
Primary nodes
CH automatically size-up Migrate updated data to new nodes but no need to transfer existing data
Data consistency: need to keep data consistent when migrating modified data Version consistent hashing
1 8 5 7 2
6
6
Version Consistent Hashing Scheme Build versions into the virtual nodes Avoid data migration when adding nodes or node fails Maintain efficient data lookup
p p
Version1, committed D1
c
1 c c 2
Update, lookup
1
4
D1
3
3
1
2
Data lookup algorithm
3
4
4 v2 6 v3 1 5 v3
3 2
7
v4
c
D1
Lookup
1
2
3
v1: 1,2 v2: 4,1 v3: 4,6 v4: 4,6 Lookup locations: {4, 6, 1, 2}
c
2 D1 lookup
Update, lookup
Update, lookup
4
Performance improvement:
D1
• • • • •
1 c
2 Update, lookup
Version3, committed 5 c D1 4 c 1 c c 3
n, v3 n, v2 5 4
3
c
D1
Lookup
3
D1
c
2
D1
n, v2 1 c
c
2
Version3, uncommitted
Version2, uncommitted
1
5
Average lookup amplification
p
2
3
4
5
10 4 Consistent Hashing Commit CH Version CH
10 3 10 2 10 1 10 0
0
500
1000
1500
2000
Number of nodes (half are added)
7
On-going/Future Work p p p
Implement/evaluate the elastic consistent hashing in Sheepdog that is deployed on the DISCI cluster at TTU Use Microsoft Enterprise I/O trace and compare the agility of resizing with existing techniques Prepare a submission for IPDPS17 conference
8
Proposal of New Project p p p
Deduplication is critical especially in storage systems with SSD cache due to SSDs’ limited endurance and capacity Current study focuses on single machine deduplication, but redundancy could occur across machines We propose distributed deduplication in SSD caches in the cloud storage environment Virtual machine guests
Hosts
SSD Cache
Storage Array
De-duplication
SAN
A typical scenario in cloud-scale data centers 9
Motivation of the Study p
Motivation n n
VMs across hosts may share the same operating system and dataset SSD cache is usually host-local, it is challenging to de-dup across multiple node local storage (how to reference data on remote host) Virtual machine guests
Hosts
SSD Cache
Storage Array
De-duplication SAN
A typical scenario in cloud-scale data centers
10
Proposed Solution p
Virtual machine guests
Hosts
SSD Cache
Storage Array
p p
SAN
Metadata (fingerprints) is distributed using memcached Data not de-duplicated across hosts Consider a global LRU instead of local LRU n n
addr 1
fingerprint 0ae31fq
Fingerprint table
host 1 2
fingerprint 0ae31fq 13ews19
LRU table
n
Recently used data on one host may be reused on another host If duplicate data is written back to primary storage, only create a reference on primary storage If duplicate data is inserted, a reference to the remote host is created
11
Summary p p p p
p
Elasticity is an important feature in cloud computing and considered in scale-out storage systems We propose techniques to allow agile resizing and minimal performance degradation It is going to benefit data centers to optimize resource utilization and/or save power consumption We will continue investigating workload characterization based on statistical techniques We have proposed a de-duplication system for cloud storage as a new project and seeking sponsorship for either continuing the Unistore project or the new project
12
Thank You
Please visit: http://cac.ttu.edu/, http://discl.cs.ttu.edu/
Acknowledgement: The CAC@TTU is funded by the National Science Foundation under grants IIP1362134 and IIP-1238338.
Please take a moment to fill out your L.I.F.E. forms. http://www.iucrc.com Select “Cloud and Autonomic Computing Center” then select “IAB” role. What do you like about this project? What would you change? (Please include all relevant feedback.)