Preview only show first 10 pages with watermark. For full document please download

Isilon Fundamentals

   EMBED


Share

Transcript

Welcome to the Isilon Fundamentals course. Copyright ©2016 EMC Corporation. All Rights Reserved. Published in the USA. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. The trademarks, logos, and service marks (collectively "Trademarks") appearing in this publication are the property of EMC Corporation and other parties. Nothing contained in this publication should be construed as granting any license or right to use any Trademark without the prior written permission of the party that owns the Trademark. EMC, EMC² AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC CertTracker. CIO Connect, ClaimPack, ClaimsEditor, Claralert ,cLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information Model, Compuset, Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation Computing, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge , Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document Sciences, Documentum, DR Anywhere, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender , EMC Centera, EMC ControlCenter, EMC LifeLine, EMCTV, Enginuity, EPFM. eRoom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, Illuminator , InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS,Kazeon, EMC LifeLine, Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor , Metro, MetroPoint, MirrorView, Multi-Band Deduplication,Navisphere, Netstorage, NetWorker, nLayers, EMC OnCourse, OnAlert, OpenScale, Petrocloud, PixTools, Powerlink, PowerPath, PowerSnap, ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, EMC Snap, SnapImage, SnapSure, SnapView, SourceOne, SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex, UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM-Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xPression, xPresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage. Publish Date: February 2016 Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 1 This course introduces EMC’s Big Data storage product, Isilon. Isilon is a scale-out NAS storage solution with an architecture that differs substantially from other EMC storage. This e-Learning introduces the architecture, features, and capabilities of Isilon to audiences who have not encountered it previously. This course is for you if you will be promoting, selling, or using Isilon in any way. You’ll also find value in the course if you are an EMC employee seeking familiarity with all of EMC’s product portfolio. When you have completed the course, you’ll know how Isilon differs from other storage products; you’ll understand Isilon’s node-based architecture; you’ll know what Isilon can do; and you’ll be well set up to understand deeper, more technical training about Isilon. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 2 Because Isilon Fundamentals is a foundational course, there are no formal prerequisites for taking this training. There are no other classes you must have passed before taking this course. However, note that Isilon Fundamentals is, itself, a prerequisite course you must pass before you can proceed along most other learning paths related to Isilon. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 3 While there are no formal course prerequisites for this training, this training was created for professionals who have some experience with networking and data storage. You will have difficulty following this course unless you already know the following: • The meaning of basic computer terms; for example, processor, RAM, and hard drive • Basic Ethernet and TCP/IP networking concepts, such as IP addressing, routing, DNS, VLANs, and switch management • The basics of administrating a Windows-based local area network • The basics of UNIX-style user and file management • Concepts and technologies used in backup recovery solutions • Basic awareness of applications that EMC customers typically use on their business networks, such as file servers, databases, and email systems. If you are already familiar with other data storage systems besides Isilon, that could work for or against you. You’ll be able to understand the terminology because you’ve already been using it. However, Isilon’s scale-out architecture differs enough from some other storage systems that you may also have to unlearn concepts you thought pertained to all storage. Keep an open mind, and you’ll probably feel intrigued by the innovations Isilon offers. If you are not already familiar with all these concepts, you are not part of the intended audience for this training. Consult with your manager about appropriate courses to take before taking this one. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 4 Here’s what to expect from this course. Module 1 sets the context for today’s data storage offerings by refreshing your mind about what came before. How did we go from slow computers the size of refrigerators to today’s modern data centers? Why is some data just called “data” but other data is “Big Data”? What are DAS, SAN, and NAS? How do scale-up NAS and scale-out NAS differ? Module 1 explores those topics. Module 2 delves into what the Isilon product is, in terms of hardware and architecture. Are there models of Isilon products? If so, what are they for? How would you know when to use one model versus another? Module 2 tells you what Isilon is, while Module 3 starts explaining what Isilon does. What protocols does Isilon support? How do you authenticate to the cluster? What’s the management interface like? Module 4 concludes the course by explaining how you manage and protect data using Isilon. To give these issues real-world context, Modules 3 and 4 are written from the perspective of a fictitious storage administrator who is retiring. He briefs you, his successor, on what you need to know in order to take over his Isilon clusters. We hope you find this training enjoyable and enlightening. Let’s get started! Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 5 Isilon is the industry leading, scale-out data storage system made up of multiple servers called ‘nodes’. Nodes are combined together with software to create a ‘cluster’, which behave as a single, central storage system for a company’s unstructured data. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 6 The advantages of Isilon include a single file system, single volume, fast and easy scalability, simplified management, different levels of data protection and security, a rich feature set of Enterprise ready components, and the ability to reduces silos of storage by using Isilon as a data lake. In the next slides, we’ll show how the Isilon storage system implements these features. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 7 Isilon offers industry leading enterprise features including security, compliance, replication, snapshots, tiering, and CloudPools. Having all of your storage infrastructure in one data lake allows ease of management, new insight into your data, and how you can use this information to expand your business or improve your processes. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 8 This module focuses on setting the context in which Isilon operates. It briefly considers the history of digital storage, trends in the way corporations use digital data today, and what separates conventional data from Big Data. You’ll also learn about major categories of data. The module concludes by explaining how the most popular storage architectures came to be, and Isilon’s position within that context. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 9 After completing this lesson, you will be able to explain what Isilon is, examine history of computer storage, illustrate changes in data storage needs, define Big Data, and identify what makes a Data Lake. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 10 The first general-purpose computer became operational in 1946. It was called the Electronic Numerical Integrator And Computer (ENIAC) and used more than 17,000 vacuum tubes and 70,000 resistors to hold a ten-digit decimal number in memory. The data was output as IBM punch cards, a format that continued well into the 1960s. In the 1960s, magnetic tape eclipsed punch cards as the way to store corporate computer data. Later, mag tape gave way to the hard disk drive. The first IBM hard drive was the size of two refrigerators and required 50 disks to store less than four megabytes of data. In the 1980s, the personal computer revolution introduced miniaturization, bringing a wide array of storage form factors. Less than 30 years after two refrigerator-sized units stored less than 4 megabytes, the average consumer could store a good portion of that, about one-third, on a three-and-a-half inch plastic disk. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 11 Unstructured data continues to grow by greater amounts every year. An IDC study published in 2008 showed that the amount of digital data created, captured, and replicated worldwide grew tenfold in just five years. This finding was based on the proliferation of then-new technologies such as Voice over IP, RFID, smartphones, and consumer use of GPS; and the continuance of data generators such as digital cameras, HD TV broadcasts, digital games, ATMs, email, videoconferencing, medical imaging, and so on. A 2012 study from IDC found that the digital universe is still expanding at a breathtaking pace. To understand the results, it helps to realize that the preface “exa” means one billion billion, or one quintillion. An exabyte (EB) is one quintillion bytes. Another way to say it is, an exabyte is one billion gigabytes. In 1986, the entire world had the technical capacity to store merely 2.6 exabytes. By 2020, the world will need to store more than 40,000 exabytes. Much of this growth occurs because a person formerly had to sit in an office to use a computer, but today, billions of individuals generate data, all day, everywhere they go, from mobile devices. Thus, studies document that the world’s data storage needs are not merely growing; they are mushrooming. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 12 When talking about data storage, there are two main types of data: structured data and unstructured data. Structured data often resides in a fixed field inside a larger record or file. A large file usually requires a data model that defines the type of data (i.e., is the data numeric or alphanumeric?), how the data will be accessed, and how it will be processed. Today, structured data is most often expressed as a relational database. The rigid table structure makes structured data easy to query. Spreadsheets, library catalogs, inventory sheets, phone directories, and customer contact information are all examples of structured data that would fit neatly into the rows and columns of a database. Unstructured data does not fit into neat rows and columns because it has little or no classification data. Image files, photographs, graphics files, video and audio files are all examples of unstructured data. Imagine that you have a spreadsheet with information about your pet dog. The spreadsheet might have the dog’s name, birthdate, breed, color, weight, parent’s names and information, breeder information, location, etc., and this data would be very easy to plug into the predefined field of a database, as the information deals with classifying individual traits. Now imagine what would happen if you tried to fit a photograph of your dog into those same fields: it wouldn’t fit. There is no way to classify an image in the same way that we list out names, birthdates, and height. According to industry analysts, the creation rate of unstructured data outpaces structured data, with unstructured data comprising 80 to 90% of ALL digital data. Isilon specializes in storing unstructured data. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 13 Another way of categorizing data storage systems is to describe them as block-based or file-based. Block data is usually found in SAN (storage area network) technology, for example, the VNX, whereas file data is usually associated with NAS (network attached storage) technology, such as Celera and Isilon. A block of data is a sequence of bits or bytes in a fixed length; the length is determined by the file system. Saving a single piece of data requires the operating system, or OS, to break the file into blocks and each block is written to a particular sector (area) of the drive. A single file may require compiling many, many blocks together. Block data is especially useful when working with small bits of information that need to be accessed or written frequently; for example, a large database full of postal codes. Someone querying the database probably wants only some or one of the postal codes, but rarely wants all of them. Block data makes it easy to gather information in partial sets and is particularly adept at handling high volumes of small transactions, such as, stock trading data, which could generate one billion 18k files in only a few hours. Block format is the go-to for flexibility and for when you need intensive speed of input and output operations. File data is created depending upon the application and protocol being used. Some applications store data as a whole file, which is broken up and sent across the network as packets. All of the data packets are required to reassemble the file. Unlike block where you can grab only one type of postal code, in file storage you would need the whole file content in order for it to be useful. For example, a PDF file is generally not readable unless you have all of it downloaded; having only part of the file will generate an error and not allow the file to be opened. File-based data is organized in chunks too large to work well in a database or in an application that deals with intense amounts of transactions. In IT applications, block data usually relates to structured data while file data usually relates to unstructured data. Isilon specializes in handling file-based data. Can Isilon do block-based storage? Technically, yes, but if you are looking for a block-based solution there are other EMC products that specialize in block and would best handle that type of workflow. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 14 The term Big Data is being used across the technology industry but what exactly is Big Data? Big Data is defined as any collection of data sets so large, diverse, and fast changing that it is difficult for traditional technology to efficiently process and manage. What exactly makes computer data “Big Data”? The storage industry says that Big Data is digital data having too much volume, velocity, or variety to be stored traditionally. To make sure the three V’s of Big Data are perfectly clear, let’s consider some examples. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 15 What do we mean by volume? Consider any global website that works at scale. YouTube’s press page says YouTube ingests 100 hours of video every minute. That is one example of Big Data volume. What’s an example of velocity? Machine-generated workflows produce massive volumes of data. For example, the longest stage of designing a computer chip is physical verification, where the chip design is tested in every way to see not only if it works, but also if it works fast enough. Each time researchers fire up a test on a graphics chip prototype, sensors generate many terabytes of data per second. Storing terabytes of data in seconds is an example of Big Data velocity. Perhaps the best example of variety is the world’s migration to social media. On a platform such as Facebook, people post all kinds of file formats: text, photos, video, polls, and more. According to a CNET article from June 2012, Facebook was taking in more than 500 terabytes of data per day, including 2.7 billion Likes and 300 million photos. Every day. That many kinds of data at that scale represents Big Data variety. The “Three Vs” – volume, velocity, and variety – often arrive together. When they combine, administrators truly feel the need for high performance, higher capacity storage. The three Vs generate the challenges of managing Big Data. Growing data has also forced an evolution in storage architecture over the years. Growing data has also forced an evolution in storage architecture over the years due to the amount of data that needs to be maintained, sometimes for years on end. Isilon is a Big Data solution because it can handle the volume, velocity, and variety that defines the fundamentals of Big Data. These topics will be addressed as the course continues. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 16 Let’s start with the first of the three Vs, managing Big Data volume. Challenge: Complex data architecture. SAN and scale-up NAS data storage architectures encounter a logical limit at 16 terabytes, meaning, no matter what volume of data arrives, a storage administrator has to subdivide it into partitions smaller than 16 terabytes. This is part of why customers wind up with silos of data. To simplify this challenge, scale-out NAS such as an Isilon cluster holds everything in one single volume with one LUN. Isilon is like one gigantic bucket for your data, and really can scale seamlessly without architectural hard stops forcing subdivisions on the data. Challenge: Low utilization of raw capacity. SAN and scale-up NAS architectures must reserve much of the raw capacity of the system for management and administrative overhead, such as RAID parity disks, metadata for all those LUNs and mega-LUNs, duplicate copies of the file system, and so on. As a result, conventional SAN and NAS architectures often use only half of the raw capacity available, because you have to leave headroom on each separate stack of storage. Suppose you have seven different silos of data. As soon as you put them in one big volume, you immediately get back the headroom from six of the seven stacks. In that way, Isilon offers high utilization. Isilon customers routinely use 80% or more of raw disk capacity. Challenge: Non-flexible data protection. When you have Big Data volumes of information to store, it had better be there, dependably. If an organization relies on RAID to protect against data loss or corruption, the failure of a single disk drive causes disproportionate inconvenience. The most popular RAID implementation scheme allows the failure of only two drives before data loss. (A sizable Big Data installation will easily have more than 100 individual hard drives, so odds are at least one drive is down at any given time.) The simpler answer is to protect data using a different scheme. Shortly you’ll learn about Isilon’s clustered architecture based on nodes that do not use RAID. Nodes full of hard drives rely less on any single drive and can recover a failed drive as a non-emergency. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 17 What advantages does scale-out NAS offer for administrators coping with high velocity, the second V of Big Data? Here are some examples. Challenge: Difficult to scale performance. Some data storage architectures use two controllers, sometimes referred to as servers or filers, to run a stack of many hard drives. You can scale capacity by adding more hard drives, but it’s difficult to scale performance. In a given storage stack, the hard drives offer nothing but capacity -- all the intelligence of the system, including computer processing and RAM, must come from the two filers. If the horsepower of the two filers becomes insufficient, the architecture does not allow you to pile on more filers. You have to start over with another stack and two more filers. In contrast, every node in an Isilon cluster contains capacity PLUS compute power PLUS memory. The nodes can work in parallel, so each node you add scales out linearly – in other words, all aspects of the node scale up, including capacity and performance. Challenge: Silos of data. Due to the architectural restrictions we just discussed, SAN and scale-up NAS end up with several isolated stacks of storage. Many customer sites have a different storage stack for each application or department. If the R&D stack performs product testing that generates results at Big Data velocity, the company may establish an HPC stack, which could reach capacity rapidly. Other departments or workflows may have independent storage stacks that have lots of capacity left, but there’s no automated way for R&D to offload their HPC overflow to, for example, a backup storage stack. Instead, an administrator has to manually arrange a data migration. In contrast, an Isilon cluster distributes data across all its nodes to keep them all at equal capacity. You don’t have one node taking a pounding while other nodes sit idle. There are no hot spots, and thus, no manual data migrations. Automated balancing makes much more sense if the goal is to keep pace with Big Data velocity. Challenge: Concurrency. In conventional storage, a file is typically confined to a RAID stripe. That means that the maximum throughput of reading that file is limited to how fast those drives can deliver the file. But in modern workflows, you may have a hundred engineers or a thousand digital artists all needing access to a file, and those RAID drives can’t keep up. Perhaps the two filers on that stack can’t process that many requests efficiently. Isilon’s answer is that every node has at least a dozen drives, plus more RAM and more computer processing, for more caching and better concurrent access. When there is heavy demand for a file, several nodes can deliver it. Challenge: Many manual processes. Besides manual data migrations, conventional storage has many more manual processes. An administrator over a SAN or a scale-up NAS product spends a significant amount of time creating and managing LUNs, partitioning storage, establishing mounts, Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 18 launching jobs, and so on. In contrast, Isilon is policy-driven. Once you define your policies, the cluster does the rest automatically. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals ‹#› A scale-out data lake is a large storage system where enterprises can consolidate vast amounts of their data from other solutions or locations, into a single store called a data lake. This helps address the variety issue with Big Data. The data can be secured, analysis can be performed, and actions can be taken based on the insights that surface. Enterprises can then eliminate the cost of having silos or “islands” of information spread across their enterprises. The scale-out data lake further enhances this paradigm by providing scaling capabilities in terms of capacity, performance, security, and protection. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 19 This lesson briefly reviewed how business entities have stored digital data since it first came into the mainstream in the 1950s. We examined several trends in data storage, including the fact that because the amount of data stored always grows, that forces changes in storage formats. We learned that Big Data is defined by its volume, velocity, and variety, all of which occur at too great a scale for traditional means to process, store, and manage and we took a quick look into what makes a data lake. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 20 In this lesson, we introduce how and what makes Isilon's OneFS operating system unique. We then delve into the challenges of using Big Data and the uses of multi-protocol. We'll also explore the Edge-to-Core-to-Cloud solution and where Isilon fits into this overall storage landscape. Let’s get started! Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 21 In the early days of computer data, corporations stored data on hard drives in the server. The company’s intellectual property depended entirely upon that hard drive continuing to work. Thus, to minimize risk, corporations mirrored the data on a Redundant Array of Independent Disks, or RAID, for short. RAID disks were directly attached to a server so that the server thought the hard drives were part of it. This technique is called Direct Attached Storage, or DAS. As applications proliferated, soon there were many servers, each with its own DAS. This worked fine, with some drawbacks. If one server’s DAS was full while another server’s DAS was half empty, the empty DAS couldn’t share its space with the full DAS. People thought, “What if we took all these individual storage stacks and put them in one big stack, then used the network to let all the servers access that one big pool of storage? Then our servers could share capacity!” Accomplishing that approach required a traffic cop to keep track of what data went with what application. Thus, the Volume Manager was invented. Adding a volume manager to the storage system created the Storage Attached Network, or SAN. SAN was optimized for block data. It worked fine until employers began giving their employees computers. Employees then needed to get to the stored data, but they couldn’t: SAN was set up for servers, not personal computers. PCs worked differently from the storage file server, and network communications only communicate from one file system to another file system. The answer arrived when corporations put employee computers on the network, and added to the storage a file system to communicate with users. Thus, Network Attached Storage, or NAS, was born. NAS works pretty well. But it could be improved. For example, now the server is spending as much time servicing employee requests as it is doing the application work it was meant for. The file system doesn’t know where data is supposed to go, because that’s the volume manager’s job. The volume manager doesn’t know how the data is protected; that’s RAID’s job. If high-value data needs more protection than other data, you need to migrate the data to a different volume that has the protection level that data needs. So there is opportunity to improve NAS. To alleviate these issues, Isilon combined the file system, the volume manager, and the data protection into one seamless, self-aware OS: OneFS. Some advantages of this approach include the simplicity of having all data in a single file system and a single volume. When you have storage capacity without hard architectural limitations, your system is easier to manage and grow. Isilon was designed to work in a mixed environment. Even if the clients attached to the server are a Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 22 mix of Windows, UNIX, and Mac OS X operating systems, Isilon offers a single unified platform for all. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals ‹#› Though we’ve defined DAS, SAN, and NAS, let’s draw attention to the distinction between two kinds of NAS architectures: scale-up and scale-out. Scale-up NAS came first, represented here with a green line. In this architecture, a pair of controllers or filers manages a stack of disk trays. You can readily add capacity – if you need more storage, you simply add more drives. But the architecture doesn’t let you pile on more filers. As disk space grows, computing resources do not. In contrast, scale-out NAS, represented here with a blue line, uses nodes. Each node contains drives, but also more processing and more memory. By adding nodes, performance and capacity scale out in proportion. The green line shows that over time, the filers must work harder and harder to manage the growing capacity. Result: performance slows. The blue line shows that as you add nodes, performance improves, because every node can exploit all the resources of every other node. DAS, SAN, and scale-up NAS have their places, but were invented before the Big Data era. Scale-out NAS systems were built for Big Data. Thus, in many regards, scale-out NAS architecture makes managing Big Data less challenging. The next three slides give examples of how. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 23 Isilon scale-out NAS architecture simplifies managing Big Data. Isilon’s innovative operating system, OneFS, gets its name from the fact that it can scale up to more than 50+ petabytes of storage in one volume, one namespace, and one file system. Isilon was purpose-built to ease the challenges of processing, storing, managing, and delivering data at scale. Isilon’s positioning is to provide simple yet powerful answers for Big Data storage administrators. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 24 Isilon's centralized file storage system scales easily for each customer’s needs and provides access to data by using standardized protocols. The single central repository of data can be accessed from all of these protocols simultaneously. Even Hadoop Distributed File system, or HDFS, is offered as a protocol, eliminating the need for separate Name Nodes and Storage Nodes for Hadoop functionality. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 25 EMC IsilonSD Edge is a new software-defined storage product that expands the data lake by bringing in data from the edge (remote and branch offices), enabling you to consolidate, simplify management, and protect unstructured data. This new product will enable you to consolidate data from edge locations to your core data center and then leverage the multiprotocol capabilities to support a wide range of 2nd and 3rd platform applications, including big data analytics, to enable you to gain value and insight from the enterprise edge data. Management at the edge is simplified by using the familiar software tools and the automated features found in OneFS. By leveraging off-the-shelf hardware and virtual server environments located at your remote locations, you can deploy an economical softwaredefined storage solution with the power of Isilon OneFS. You can also increase efficiency and storage utilization at the edge to over 80% by aggregating unused storage capacity. Finally, by using IsilonSD Edge, you can increase data protection by automatically replicating data to your core data center while eliminating the need for manual data backup processes and protect data consistently at all of your remote locations. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 26 The new Isilon CloudPools software—which enables your customers to select from a number of public cloud services or use a private cloud based on EMC Elastic Cloud Storage (ECS)— provides the policy-based, automated tiering that enables your customers to seamlessly integrate with the cloud as an additional storage tier from the Isilon cluster at their data center. CloudPools lets your customers address rapid data growth and optimize data center storage resources by using the cloud as a highly economical storage tier with massive storage capacity for cold or frozen data that is rarely used or accessed. This enables more valuable on premise storage resources to be used for more active data and applications. To secure data that is archived in the cloud, CloudPools encrypts data that is transmitted from the Isilon cluster at the core data center to the cloud storage service. This data remains encrypted in the cloud until it is retrieved and returned to the Isilon cluster at the data center. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 27 You can use EMC Isilon to consolidate file-based, unstructured data into a data lake that can eliminate costly storage silos, simplify management, increase data protection, and acquire more value from your data assets. With built-in multi-protocol capabilities, Isilon can support a wide range of traditional and next-generation applications on a single platform—including powerful Big Data analytics that provide you with better insight and use of your stored information. Data at edge locations (e.g., remote or branch offices) is growing. These edge locations are often inefficient islands of storage, running with limited IT resources and inconsistent data protection practices. Data at the edge generally lives outside of the business data lake, making it difficult to incorporate into data analytics projects. The new edge-to-core-cloud approach can extend your Isilon data lake to edge locations and out into the cloud, thus enabling consolidation, protection, management, and backups of remote edge location data. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 28 Isilon provides the industry leading scale-out clustered storage solution. It provides a single volume of data storage at a massive scale that is easy to use and manage, offering linear scalability and readiness for a customer’s performance applications, Hadoop analytics, and other workflows. A data lake is a single central data repository that can store data from a variety of sources, such as file shares, web apps, and the cloud. It enables businesses to access the same data for a variety of uses and enables the data to be manipulated using a variety of clients, analyzers, and applications. The data is real-time production data and does not need to be copied or moved from an external source, like another Hadoop cluster, into the data lake. The data is stored on the central storage solution and provides secure access to the data. The data lake provides tiers based on data usage, and the ability to instantly increase the storage capacity when needed. The above slide identifies the key characteristics of a scale-out data lake. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 29 This lesson covered how Isilon OneFS came to be. We identified Isilon's Big Data position, examined IsilonSD Edge and the Isilon CloudPool feature, and discussed how to use these features to provide data management from Edge-to-Core-to-Cloud. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 30 This module focused on setting the context in which Isilon operates. We briefly considered the history of digital storage, trends in the way corporations use digital data today, and what separates conventional data from Big Data. You also learned about major categories of data. The module concluded by explaining how the most popular storage architectures came to be, and Isilon’s position within that context. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 31 In this module, we turn our focus from abstract ideas about storage, to the tangible parts of the Isilon product and how they’re arranged. After this module, you should understand how the parts fit together. You’ll also become familiar with our principal node models, and understand at a high level, which node typically goes with what uses. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 32 This lesson explains what pieces make up an Isilon cluster, and how the pieces fit together to communicate with one another. Let’s start right in. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 33 The basic building block of an Isilon NAS cluster is a node. Our smallest cluster begins with three nodes; combine them and you have a cluster. Clusters can scale from 3 to 144 nodes and can be comprised of different node types. Architecturally, every Isilon node is equal to every other Isilon node in a cluster. There is NO node that is THE controller or THE filer. Instead, OneFS unites the entire cluster in a globally coherent pool of memory, CPU, and capacity. OneFS writes files in stripes across the nodes for built-in high availability. For reads, whichever node receives the request for a file can manage the other nodes in re-assembling the file, then delivering it. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 34 This slide provides an overview of the Isilon scale-out NAS architecture. • Starting at the Client/Application layer, the Isilon NAS architecture supports mixed modes. Windows, UNIX, and OSX operating systems can all connect to an Isilon cluster and access the same files. • At the networking level, the Isilon OneFS operating system supports key industrystandard protocols over Ethernet, including network file shares, Server Message Block (SMB), HTTP, FTP, Hadoop Distributed File System (HDFS) for data analytics, SWIFT, and REST for object and cloud computing requirements. As a file-based storage system, Isilon does not support protocols associated with block data. • Nodes are combined into one volume by the OneFS operating system. All information is shared among nodes, thus allowing a client to connect to any node in the cluster and access any directory in the file system. • On the back end, all the nodes are connected with an InfiniBand fabric switch for lowlatency internal communication with one another. That’s the overview of the system. Next, let’s look at each portion in further depth. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 35 The external networking components of an EMC Isilon cluster provides client access over a variety of protocols. Each storage node connects to one or more external Ethernet networks using 1 Gigabit Ethernet, or 1GbE, connections or 10 GbE connections. The 1 GbE and 10 GbE interfaces support link aggregation. Link aggregation creates a logical interface that clients connect to. In the event of a NIC or connection failure, clients do not lose their connection to the cluster. For stateful protocols, such as SMB and NFSv4, this prevents client-side timeouts and unintended reconnection to another cluster. Instead, clients maintain their connection to the logical interface and continue operating normally. Support for Continuous Availability, or CA, for stateful protocols like SMB and NFSv4 is available with OneFS 8.0. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 36 The nodes in the cluster communicate internally using InfiniBand, which was designed as a high-speed interconnect for high performance computing. The reliability and performance of the interconnect is very important in creating a true scale-out storage system. The interconnect needs to provide both high throughput and very low latency. InfiniBand meets this need, acting as the backplane of the cluster, enabling each node to contribute to the whole. A single front-end operation can generate multiple messages on the back end, since the nodes coordinate work among themselves when they write or read data. Thus, the dual backend InfiniBand switches handle all intra-cluster communication and provide redundancy in the event that one switch fails. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 37 Isilon does not use RAID. This section details what Isilon does instead. In a single cluster, how many hardware failures can the system withstand while offering the customer 100% data availability? That depends. The answer requires understanding N + M. N + M comes from the Reed-Solomon algorithm, an industry standard developed to enhance data integrity when it’s undesirable to have data retransmitted from another source. Most DVDs and television broadcasts use Reed-Solomon codes so that you can view video data without interruption. In the N+M data model, N represents the number of nodes, and M represents the number of simultaneous hardware failures that the cluster can withstand without incurring data loss. “Hardware failures” refers to drives, nodes, or a combination of drives AND nodes. As the system writes the data, it also protects the data with parity bits. The OneFS operating system spreads the data across numerous drives in multiple nodes so that if part of the data goes missing, the missing data can be recalculated and restored. This involves complex mathematics, but to illustrate the concept, we’ll use a basic example. In this calculation, 5 plus 3 plus 1 represents data stored on three different drives. What is the sum of 5 plus 3 plus 1? Obviously, 9. Nine represents a parity bit; a value that OneFS sets to show what total should result when the binary data is added together. Suppose the drive holding the “3” stops working. Knowing that five plus something plus one must equal nine, OneFS can easily rebuild the missing data. With the aid of the parity bit, any one value could vanish, and OneFS could readily recalculate and restore it. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 38 This Reed-Solomon approach of using parity bits to reconstruct data is called Forward Error Correction, or FEC. FEC allows the customer to choose how many bits of parity to implement. One bit of parity for many disks is known as N + 1; two parity points for many disks is known as N + 2, and so on. With N + 1 protection, data is 100% available even if a single drive or node fails. With N + 2 protection, two components can fail, but the data will still be 100% available. OneFS supports up to N+4 – users can organize their cluster so that as many as four drives, or four entire nodes, can fail without loss of data or of access to the data. RAID is disk-based, so when you choose a level of protection – that is, how many parity bits – you’ve chosen for the entire RAID volume. With Isilon’s FEC approach, you can set different levels of protection for different nodes, directories, or even different files. Also you can change protection levels on the fly, non-disruptively. Unlike RAID where you have the same protection level across all the disks and this cannot be changed without reformatting the disks. When a client connects though a single node and saves data, the write operation occurs across multiple nodes in the cluster. This is also true for read operations. When a client connects to a node and requests a file from the cluster, the node that the client has connected uses the backend InfiniBand network to coordinate with other nodes to retrieve, rebuild, and deliver the file back to the client. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 39 During a write operation, when OneFS stripes data across nodes, the system breaks the file data into smaller logical sections called a stripe unit. The smallest element in a stripe unit is 8 kilobytes and each stripe unit is 128K, or 16 8kb blocks. If the data file is larger than 128 kb, the next part of the file is written to a second node. If the file is larger than 256 kilobytes, the third part is written to a third node, and so on. OneFS stripes these 128 kilobyte units across the cluster, using advanced algorithms to determine data layout for maximum efficiency and performance. The process of striping spreads all write operations from a client across the nodes of a cluster. The example in this animation demonstrates how a file is broken down into chunks, after which it is striped across disks in the cluster along with parity, also known as forward error correction (FEC). Though a client connects to only one node, when that client saves data to the cluster, the write operation occurs in multiple nodes in the cluster. Each node contains between 12 and 60 hard disk drives, or a combination of SSDs and disk drives. As the system lays data across the cluster, it sequences stripe units across nodes, and each node in turn may utilize numerous drives, if the file is large enough. This method minimizes the role of any specific drive or node. If that piece of hardware stops working, the data it contains can be reconstructed. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 40 A client connects to a single node at a time. When that client requests a file from the cluster, the node that the client has connected to will not have the entire file on its local drives. The node with the requesting client then uses the backend InfiniBand network to coordinate with other nodes to retrieve, rebuild, and deliver the file. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 41 This lesson explained what pieces make up an Isilon cluster and how the pieces fit together to communicate with one another. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 42 In this lesson, we dive deeper into various models of Isilon nodes. You’ll get an overview of the intended uses for each model, then some of the technical detail supporting the overview. You’ll also see how users can add nodes to scale their Isilon cluster. The lesson concludes with an explanation of optional enhancements to nodes, such as encryption and accelerators. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 43 Nodes comprise the storage capacity and processing power of the Isilon scale-out NAS platform. Each node provides network connectivity, storage, memory, non-volatile RAM (or NVRAM) and processing power (or CPUs). There are also different types of nodes that can be mixed and matched to meet specific requirements. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 44 The Isilon product family consists of four storage node series: S-Series, X-Series, NLSeries, and the new HD-Series. The S-Series is for ultra-performance primary storage and is designed for high-transactional and IO-intensive tier 1 workflows. The X-Series strikes a balance between large capacity and high-performance storage. X-Series nodes are best for high-throughput and high-concurrency tier 2 workflows and also for larger files with fewer users. The NL-Series is designed to provide a cost-effective solution for tier 3 workflows, such as nearline storage and data archiving. It is ideal for nearline archiving and for disk-based backups. The HD-Series is the new high-density, deep archival platform. This platform is used for archival level data that must be retained for long, if not indefinite, periods of time. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 45 A primary purpose of scale-out NAS is the “scale-out” part. An administrator should be able to expand the storage at will by adding a new node. In Isilon’s case, once the node is racked and cabled, adding it to the cluster takes about one minute. That’s because automated policies will automatically discover the node, set up addresses for the node, incorporate the node into the cluster, and begin automatically rebalancing capacity on all nodes to take advantage of the new space. In that brief time, the node fully configures itself, is ready for new data writes, and begins taking on data from the other nodes to AutoBalance the entire cluster. The video on this slide was shot by an enthused Isilon customer who posted it on YouTube with the comment that adding a node was “insanely fast.” It’s not especially exciting to watch, until you realize that accomplishing the same tasks with another NAS solution takes 26 steps and multiple hours. If you look at the free space available in the pie chart, this Isilon customer took his system from 280 terabytes to 403 terabytes, adding 120 terabytes of storage in a minute. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 46 In addition to the base node models, Isilon offers some optional add-ons. DARE stands for “Data At Rest Encryption.” Data at rest is any data sitting on your drives. DARE is used for confidential or sensitive information. If somehow a hostile party infiltrated your network, and IF they were able to access your Isilon cluster, and even IF they somehow acquired the various levels of permissions and access to see the data striped on the clusters –with DARE, they still could not read the data, because it’s encrypted. In fact, many vertical markets require DARE. For instance, federal governments, financial services, and HIPAA-compliant health care providers all must encrypt stored data. A less obvious benefit of DARE occurs when it’s time to upgrade your hardware. If you run a corporation in one of the regulated industries, and you need to dispose of an old drive, you have a problem. The data on it is readable. Once you pull the drive from your systems, you can’t just issue a GUI command to erase it. Further, many “erase” programs don’t literally delete the data – they just mark the sector of the drive that holds the data as available for overwrite. Unless something overwrites the data, it is still there, and can be recovered with hacker tools. For that reason, a whole industry has sprung up around physically destroying retired hardware. DARE provides an easier solution. It means that all data at rest has been encrypted. If the data has been encrypted with a 256-bit key for all its life, you can recycle the drive as is, without fear that anyone can read it. Isilon implements DARE by offering optional Self-Encrypting Drives, or SEDs. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 47 An Isilon accelerator is made for customers who don’t need more capacity but need certain workflows to go faster. An accelerator contains processors and memory, but no storage. It dedicates performance to a single client or group of clients. In essence, an accelerator acts as a large cache to increase single stream performance. The accelerator is based on a single hardware platform, but comes in two variations: a model for pure performance, and a model to speed up backups. Both models have Intel Sandy Bridge Hex-Core 2 GHz processors. The main differences between the models is that the pure performance accelerator has more memory, 256 GB of RAM, used for L1 cache – while the model for accelerating backups is the only Isilon node that offers Fibre Channel ports. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 48 This lesson dove deeper into various models of Isilon nodes. You got an overview of the intended uses for each model then some of the technical detail supporting the overview. You saw how users can add nodes to scale their Isilon cluster. The lesson concluded with an explanation of optional enhancements to nodes, such as encryption and accelerators. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 49 In this module, we turned our focus from abstract ideas about storage to the tangible parts of the Isilon product and how they’re arranged. You should now be able to understand how the parts fit together. You became familiar with our principal node models, and now understand, at a high level, which node typically goes with what uses. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 50 In this module, we'll go through an overview of OneFS, the management interface, authentication, multiprotocol support, and your options for getting remote technical support. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 51 In this lesson, we will look more deeply at what OneFS is, understand the management options, discuss authentication, understand how we do multiprotocol support, and identify options for getting remote technical support. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 52 The key to Isilon’s scale-out NAS is the architecture of OneFS, which is a distributed clustered file system. That means one file system spans across every node in the storage cluster. If you add more nodes, that file system automatically redistributes content to take advantage of the entire cluster. When the system needs to write a file, it breaks data into smaller sections‒called “stripes” ‒and then, for performance reasons, as well as for data protection, OneFS stripes data across all the nodes and drives in a cluster. As the system writes the data, it also protects the data. We already spoke about N + M protection and parity bits. The technical way to describe that kind of fault tolerance protection level is to say that OneFS uses Reed Solomon forward error correction algorithms. The system can continuously reallocate data and make storage space more usable and efficient. Isilon calls this ability Flexprotect. Unlike scale-up NAS, in scale-out NAS there is no single master node or device that controls the cluster. Each node is a peer that shares the management workload and acts independently as a point of access for incoming data requests. That way, you don’t get bottlenecks when there are a bunch of simultaneous requests. Thus, there is a copy of OneFS on every node in the cluster. This approach prevents downtime, since every node can take over activities for another node that happens to go offline. Internally in the cluster, OneFS coordinates all the nodes on the back end, across an InfiniBand switch. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 53 The OneFS architecture is designed to optimize processes and applications across the cluster. Usually one node doesn’t do a service or function unless it gets more nodes to help. The shared infrastructure permits access to resources on any node in the cluster from any other node in the cluster and you get the performance benefits of parallel processing. The results are improved utilization of cluster resources for compute power, disk, memory and networking. Because all the nodes work together, the more nodes you add, the more powerful the cluster gets. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 54 OneFS supports access to the same file using different protocols and authentication methods at the same time. SMB clients that are authenticating using AD, and NFS clients that are authenticating using LDAP, can access the same file with their appropriate permissions applied. The permissions activities are seamless to the client. To enable multiprotocol file access, Isilon translates Windows Security Identifiers, or SIDS, and UNIX User Identities, or UIDs, into a common identity format. OneFS stores these identities on the cluster, tracking all the user IDs from the various authentication sources. OneFS also stores the appropriate permissions for each identity or group. We call this common identity format stored on the cluster the “on-disk representation” of users and groups. So, for instance, the SMB protocol exclusively uses SIDs—that is, Security Identifiers—for authorization data. If a user needs to retrieve a file for a Windows client, as OneFS starts to retrieve the file, it converts the on-disk identity into a Windows-friendly SID and checks permissions. Or, if the user is saving a file, OneFS would do the same kind of translation, from the on-disk representation to SIDs, before saving. This works the same way on the UNIX side, only using UIDs and GUIDs instead of SIDs. And that’s how all users can access OneFS files in our mixed-platform client environment. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 55 Authentication services offer a layer of security by verifying users’ credentials before allowing them to access and modify files. Authentication answers the question, “Are you really who you say you are?” OneFS supports four methods for authenticating users: Active Directory, LDAP, NIS, and Local (or file provider) accounts on the cluster. You are likely already familiar with Active Directory. While Active Directory can serve many functions, the primary reason for joining the cluster to an AD domain is to let the AD domain controller perform user and group authentication. Each node in the cluster shares the same Active Directory computer account, making it easy to administer and manage. You probably know LDAP, too. A primary advantage of LDAP is the open nature of its directory services and the ability to use LDAP across many platforms. OneFS can use LDAP to authenticate user and group access to the cluster. Network Information Service (NIS) is Sun Microsystem’s directory access protocol. By the way, NIS differs from NIS+, which the Isilon cluster does not support. Local / File Provider – Isilon supports local user and group authentication using the web administration interface. If you want to enable multiple authentication methods to the cluster, be sure you test how they interact. You have to work methodically, or you can get confused about what is authenticating who. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 56 Each and every node contains a copy of the OneFS operating system and the cluster’s configuration files. To execute its functions, OneFS creates automated policies. Managing by automated policies makes processes repeatable, which decreases the time you spend manually managing the cluster. You can change any policy as required. If you change a configuration, the configuration updates on every node. The cluster executes policies as one cohesive system. Policies drive every process in OneFS. That includes the way data is distributed across the cluster (and on each node); how client connections get distributed among the nodes; and when and how maintenance tasks execute. Policies are kind of a big deal in OneFS, because they enable so many automated activities. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 57 You have three options for managing the cluster. You can use the web administration interface, the CLI, and the Platform Application Programming Interface, or PAPI. The web administration interface is pretty robust, but if you’re willing to dive into the CLI, you can do a bit more advanced configuration. Some management functionality is only available in the web administration interface. Conversely, sometimes the CLI offers a feature that’s not available in the web administration interface. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 58 The third way to access the cluster is through the application programming interface, or API. The API enables customer and third-party applications to programmatically execute management commands and directly access data on the cluster. The APIs are separated into two types of operations: cluster management and access to data. Cluster management is performed using the platform API, or PAPI. A limited number of OneFS commands are currently PAPI enabled. The other API is RESTful access to the namespace, or RAN. An API that adheres to the principles of REST does not require the client to know anything about the structure of the API. Rather, the server provides whatever information the client needs to interact with the service. Our internal developers really like accessing the cluster using RAN; it’s one of the reasons API accesses the cluster using good ol’ HTTP. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 59 There are three different types of support available. You can manually upload log files to EMC Isilon support FTP site as needed. The log files provide detailed information about the cluster activities if an issue arises and a client needs technical support. EMC’s support personnel request these files at the beginning of a support call. The second option, called SupportIQ, is integrated into OneFS. The cluster automatically generates and uploads log files to the EMC Isilon support site, on a schedule. This saves clients time and provides multiple log files to EMC, which is very useful if a technical issue has changed over time. System alerts called events are sent to EMC Isilon Support as part of the service. Isilon provides some proactive support based on events and Isilon tech support may reach out to clients when they see that a cluster is filled to 90% of its capacity, for example. If they choose to, clients can grant permission for Isilon support to remotely access the cluster through a secure connection. For the third option, OneFS can use the EMC Secure Remote Support service (ESRS). This is similar to SupportIQ and uses the same secure remote administration service. Data centers that already have EMC devices under remote support can choose this option to keep their support unified. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 60 In this lesson, we looked more deeply at what OneFS is, discussed the management options and authentication, understood how to provide multiprotocol support, and identified options for getting remote technical support. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 61 This module covered Isilon general functionality, including an overview of OneFS, the management interface, authentication, multiprotocol support, and three remote support options. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 62 This module goes into more detail about how to manage the data on Isilon clusters, and it outlines various ways OneFS assures data security, integrity, and availability. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 63 After completing this lesson, you should be able to understand connection management, explain data distribution, I/O optimization and data protection, be able to configure management roles, manage the cluster’s capacity, identify data visibility and analytics, and examine deduplication. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 64 Clients can connect to a cluster in various ways. If you want to keep it simple, OneFS lets you give the cluster a virtual host name and clients can access the cluster using DNS. If you have clients that must connect to specific interfaces into the cluster, you can assign Static IP addresses to the clients; and you can assign static IP addresses to nodes in the cluster. You can also use both DNS and direct access to static IP addresses, depending on the workflows. Connectivity is based on standard networking and DNS principles. You can assign multiple subnets to the cluster. Isilon refers to these as SmartConnect zones. Each zone is assigned a group of external interfaces from a set of nodes. Using the virtual host name, clients are assigned an IP address to connect to when they access the cluster. The standard distribution policy uses round-robin, assigning clients to the next available node interface IP address. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 65 Other enhanced functionality includes additional policies for distributing connections in addition to the standard round-robin style. With the enhanced license, you can have clients connect to a given node based on criteria you define. For example, you can have clients connect to the IP address with the lowest number of connections at the moment. Or, you can direct client connections to the interface currently showing the least throughput, or the least CPU usage. The enhanced functionality includes continuously availability for SMB, NFSv3, and NFSv4. This feature allows SMB, NFSv3, and NFSv4 clients to dynamically move to another node in the event the node they are connected to goes down. This feature applies to Microsoft Windows 8, Windows 10 and Windows Server 2012 R2 clients. This feature is part of Isilon's non-disruptive operation initiative to give customers more options for continuous work and less down time. The CA option allows seamless movement from one node to another and no manually intervention on the client side. This enables a continuous workflow from the client side with no appearance or disruption to their working time. CA supports home directory workflows as well. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 66 Let’s start with the data distribution – we’re talking about how OneFS spreads data across the cluster. You can have various models of Isilon nodes, or “node types,” in a cluster. Nodes are assigned to 'node pools' based on model type. The cluster can have multiple node pools within a single cluster, and groups of node pools can be combined to form tiers of storage. The data target can be a tier or, if SmartPools is licensed, a specific individual node pool. Several policies determine how the data distribution occurs. The default policy is for data to write anywhere in the cluster. Data distributes among the different node pools based on percentage of most available space. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 67 You can optimize Data Input/Output to match the workflows in the environment. Optimization can be managed cluster wide – that’s the default – and at the level of individual directories and even individual files. The data access pattern can be optimized for random access, sequential access or concurrent access. Pre-fetch is a guess at what you’ll need before you ask for it. When clients open larger files, especially streaming formats like video and audio, the cluster assumes that you will generally watch minute 4 of that video after minute 3. So it proactively loads minutes 4, 5, and maybe even 6 into memory ahead of when your computer requests it. Then delivering those minutes will be faster than if the cluster had to go to the hard drive repeatedly upon each request. That’s the concept behind “pre-fetch.” With OneFS, you can configure the pre-fetch cache characteristics to work best with the selected access pattern. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 68 Data protection refers to how many components in a cluster can malfunction without loss of data. Data protection on OneFS is manageable and flexible. The system is enabled by default for virtual hot spares (VHS). VHSenables you to allocate disk space to hold the data to be rebuilt when a disk drive fails. Your earlier training acquainted you with Forward Error Correction. RAID works at the disk level – once you’ve chosen a RAID type, that whole RAID volume can only be that type of RAID; and if you want to change the RAID type, you’d have to move all the data off the RAID disks before you can reformat. But since Isilon uses FEC for protection, you can set the protection level different based on tier, node pool, directory, and even by the individual file. Extra protection creates extra overhead because OneFS writes more parity bits. You decide how to trade off extra capacity (meaning, less protection) with greater redundancy (meaning, less capacity). So based on the value of the data, you can adjust the protection level. For example, an R&D department has a node pool dedicated to testing. Test data is not true production data so they’ve set minimal +1 protection. But their customer database is the company’s most valuable asset. Customer data hits a different node pool set to +4 protection. Protection is flexible with OneFS. They could even set up mirrors of file up to eight mirrors of each file. This is not space efficient, but for very frequently read files it can really speed things up. The standard functionality is called unlicensed SmartPools; or sometimes, SmartPools Basic. If you license SmartPools, you get enhanced capabilities. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 69 In addition to the built-in root and admin roles, OneFS provides the capability for role-based access control (RBAC). RBAC means you can define privileges for an administrator to customize management access. The privileges apply to the web administration interface and the CLI, and a smaller set of privileges for PAPI management. There are four built-in roles or you can create custom roles to fit your needs or your business model. A user can be assigned to more than one role at a time. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 70 One of the ways you can subdivide capacity is by assigning storage quotas to users or groups. You manage the quotas by policy. Quotas can be set by user, by group or by directory or path. You can also nest quotas, apply quotas within quotas. For example, you can place a quota on a whole department, then a smaller quota on each user within that department, and even a different quota on a File Share they all use, and yet another on the sub-directories of that File Share. All these are flexible and can be applied or modified on the fly. Quotas let you implement thin provisioning. For example, the day you tell a group “You can use up to one terabyte of storage!” that group won’t instantly fill up the full terabyte. They may NEVER fill it. But with quota-based thin provisioning, you can keep showing the group an available terabyte of storage, even if you don’t have a full terabyte actually available on the cluster currently. OneFS has three primary types of quotas: accounting or advisory quotas; plus two levels of enforcement quotas, soft limit and hard limit. Advisory quotas are informational only. If a user exceeds their advisory storage quota, OneFS lets them; but the cluster provides a comparison between the quota allocation and actual usage. In contrast, if a user exceeds a soft limit quota, the system notifies the user with a warning email. If the user exceeds a hard limit quota, the system will from then on deny the user the ability to write. It also notifies the user that the quota has been violated. You can customize the quota notifications in OneFS so that they meet your requirements. Quotas are enhanced functionality that requires licensing. To get the feature, you must purchase a SmartQuotas license for each node in the cluster. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 71 Another enhanced function provides advanced data visibility and analytics. InsightIQ, a powerful tool monitors one or multiple clusters, then presents data visually in a robust graphical interface with reports you can export. You can customize the reports. You have the ability to drill down into the information and break out specific information as desired, and even take advantage of usage growth and prediction features. The tool monitors many aspects of system performance, such as CPU utilization and interface throughput. The tool also reports on the file system analytics including quota usage, files per user, files per directory, average file size, and more, and lets you export the analytics if you want them. An external VMware system or standalone Linux server is required for this enhanced functionality. The separate server runs external to the cluster and collects data from the cluster in scheduled intervals. To enable these capabilities, you get a free InsightIQ license for each cluster. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 72 OneFS can implement deduplication. Deduplication provides an automated way to increase storage efficiency. OneFS achieves deduplication by analyzing to find duplicate sets of data blocks, then storing only a single copy of any data block that is duplicated. Deduplication runs as a post-process job; in other words, on data already stored on the cluster. Deduplication works at the 8K block level on files over 32K in size. You can run deduplication jobs against a specific directory path or on the entire directory structure. OneFS also provides a dry run deduplication assessment tool, standard. This allows you to test drive deduplication to see how much capacity you would save if you ran the actual deduplication process. To enable the deduplication full functionality requires a SmartDedupe license for each node in the cluster. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 73 Having completed this lesson, you can now understand connection management, explain data distribution, I/O optimization and data protection, configure management roles, manage the cluster’s capacity, identify data visibility and analytics, and examine deduplication. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 74 After completing this lesson, you should be able to define data integrity, understand data resiliency, and explain data recovery and data retention. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 75 We’ve already discussed data protection and that OneFS uses the Reed-Solomon algorithm for forward error correction, or FEC, instead of RAID. In earlier training, you saw how Isilon breaks data into stripes and spreads it across nodes. Each stripe is protected separately with FEC blocks, or parity. Stripes are spread across the nodes and not contained in a single node. Only 1 or 2 data or protection stripe units are contained on a single node for any given data stripe. Protecting at this granular level allows you to vary your protection levels and set them separately for node pools, directories, or even individual files. What’s the point of all this? Well, in most popular SANs and in typical scale-up NAS, you have a pair of heads so that one can back up the other, and that’s what provides high availability. With OneFS, you could say that high availability is baked right in to every data transaction, because the data is spread onto many drives and multiple nodes, all of them ready to pitch in and help reassemble the data if a component fails. This approach creates an amazingly resilient platform. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 76 OneFS protects all metadata by mirroring it. Up to 8 copies of the metadata are maintained for any file, depending upon the file’s protection level. OneFS also allows directories to be protected one level higher than the data. This creates additional metadata copies and ensures the data is always accessible. In addition to FEC protection, OneFS checks the block and file integrity by implementing a cyclic redundancy check, or CRC. CRCs are often referred to as checksums. Checksums run at each stage of the storage process, ensuring each block’s integrity. Since OneFS checks all the data as it is being written or read, you don’t have to do a show-stopping, entiresystem integrity check, as with some rival platforms. All of the FEC protection, metadata, and CRC data integrity functionality come standard in OneFS. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 77 OneFS supports the Internet Content Adaptation Protocol (ICAP) for integration with major anti-virus provider applications. ICAP is an HTTP-like protocol used to manage the offcluster anti-virus scan engines. The ICAP vendors that OneFS supports include Symantec, McAfee, TrendMicro and Kaspersky. OneFS supports different types of anti-virus scans. You can scan files when they are accessed, when a file is opened for reading or modifying, and when a file is closed or saved. OneFS also supports scheduled scans, based on policies. The policies let you refine how and when scans occur. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 78 OneFS also empowers you to decide how to treat a threat once it’s detected. You can set policies to merely record an event; attempt to repair the file; quarantine the file; or truncate the file. You can also configure OneFS to send you an alert when a threat is detected, or if issues occur with an ICAP server. OneFS logs its scans, and you can pull reports from those logs. ICAP support is a standard OneFS functionality. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 79 Data resiliency refers to the ability to recover past versions of a file that has changed over time. Sooner or later, every storage admin gets asked to roll back to a previous “known good” version of a file. OneFS provides this capability using snapshots. Snapshots capture the changed blocks and metadata information for the file. OneFS uses the copy-on-write snapshot methodology. This approach keeps the live version of data intact while storing differences in a snapshot. Because the system is only writing changes, the writes are very fast. Snapshot policies are used to determine the snapshot schedule, the path to the snapshot location, and snapshot retention periods. Snapshot deletions happen as part of a scheduled job, or you can also delete them manually. Yes, you can delete them out of chronological order. Some OneFS system processes use snapshots internally. No license is required for systembased snapshot usage. However, to use snapshots for data resiliency requires a SnapshotIQ license for each node in the cluster. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 80 When we think data recovery, first we think of data backup. Isilon supports Network Data Management Protocol (NDMP) for integration with backup applications provided by major manufacturers such as Symantec, EMC, CommVault, and IBM. A backup application external to the cluster manages the backup process. You can set this up in one of two ways: send cluster data over your LAN to the backup device; or, send data directly from the cluster to the backup device using Isilon backup accelerators. Depending upon the amount of data and the interfaces selected on the external network, backing up across a network might not be as efficient as using the backup accelerator. The backup accelerator provides access to the data across the fast InfiniBand internal network and delivers it to the backup device over Fibre Channel ports. This NDMP support comes standard with OneFS. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 81 While NDMP backup comes standard with OneFS, replication is an enhanced data recovery option. Replication keeps a copy of one cluster’s data on another cluster. OneFS performs replication during normal operations, from one Isilon cluster to another. Replication can occur over a LAN or over a WAN. Replication may be from one to one, or from one to many Isilon clusters. Synchronization only works in one direction. OneFS supports two types of replication – copy and synchronization. With copy, any new files on the source are copied over to the target, while files that have been deleted on the source remain unchanged on the target. With synchronization, both the source and target clusters maintain identical file sets, except that files on the target are read-only. When synchronization occurs, the changed data blocks and the associated file metadata are sent to the target. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 82 Replication is policy-based and runs as a synchronization job. You can set replication policies to run synchronization jobs whenever you want, or to replicate automatically if the source data changes. The policies can be set up per directory, and for specific data types. You can set up exceptions to include or exclude specific files. OneFS also empowers you to limit the bandwidth used for replication, in order to optimize the traffic for more important workflows. Replication requires a license with OneFS. The license is called SyncIQ. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 83 Data retention is the ability to prevent data from being deleted or modified before some future date. In OneFS, you can configure data retention at the directory level, so different directories can have different retention policies. You can also use policies to automatically commit certain types of files for retention. OneFS offers two types of data retention: enterprise and compliance. Enterprise is more flexible than Compliance, and meets most companies’ retention requirements. It can allow privileged deletes by an administrator. Compliance level of retention is even more secure, designed to meet SEC regulatory requirements. In Compliance mode, once data is committed to disk, no one can change or delete the data until the retention clock expires. A common hacker ploy for beating retention safeguards is to temporarily change the system clock to some date way in the future, thus releasing all files. Compliance mode defeats this approach by relying upon a specialized clock that prohibits clock changes. You can still use SyncIQ to replicate the files that have retention policies applied. Retention in OneFS is an enhanced function and requires a license called SmartLock. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 84 Having completed this lesson, you should be able to define data integrity, understand data resiliency, and explain data recovery and data retention. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 85 This module went into more detail about how to manage the data on Isilon clusters, and outlined various ways OneFS assures data security, integrity, and availability. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 86 This concludes the Isilon Fundamentals training course. Thank you for your participation. Copyright 2016 EMC Corporation. All rights reserved. Isilon Fundamentals 87