Preview only show first 10 pages with watermark. For full document please download

Cisco Ucs Integrated Infrastructure For Big Data With Mapr

   EMBED


Share

Transcript

Cisco UCS Integrated Infrastructure for Big Data with MapR Highlights Ease of deployment • Cisco UCS® Manager automates deployment and scaling, reducing risk of configuration errors that can cause downtime. Scalability for big data workloads • The Cisco UCS Integrated Infrastructure for Big Data solution offers linear scalability and simplification of essential operations for single-rack and multiple-rack deployments. Comprehensive integrated infrastructure • Cisco UCS Integrated Infrastructure for Big Data solutions include computing, storage, connectivity, and unified management. Simplified management • Cisco UCS Director Express for Big Data offers one-click provisioning, installation, and configuration. Multi-tenancy with MapR • The MapR Distribution including Apache Hadoop offers multi-tenancy with no need for additional setup. It supports logical partitions in a physical cluster for separate administrative control, data placement, and job processing. Simplified management through MapR Control System (MCS) • MCS gives Hadoop administrators a single place for configuring, monitoring, and managing their clusters. Two major features exposed by MCS, heatmaps and job metrics, dramatically simplify administration of a cluster. Cisco and MapR Deliver Performance and Multi-tenancy to Help Tame Big Data Big data provides an enormous wealth of information to your organization. But to gain the most benefit, you need to manage it efficiently. And you must make sure that all this data is separated and isolated so that each set of users can see and work on only the data that they are authorized to use. Challenges of Multi-tenancy for Big Data Organizations seek to share IT resources cost efficiently and securely among multiple applications, data, and user groups. Platforms that support this architecture are commonly known as multitenant technologies. Multi-tenancy is the capability of a single instance of software to serve multiple tenants. A tenant is a group of users that have the same view of the system. Hadoop is an enterprise data hub, and it demands multi-tenancy. Big data platforms are increasingly expected to support multi-tenancy by default. Multi-tenancy requires isolation of the distinct tenants: both the data in the data platform and the computing aspect. To support, solutions need to: • Help ensure that service-level agreements (SLAs) are met • Help guarantee data and compute isolation • Enforce quotas • Establish security and delegation • Help ensure low-cost operations and simpler manageability The Solution: Cisco UCS Integrated Infrastructure for Big Data with MapR The Cisco UCS® Integrated Infrastructure for Big Data solution includes computing, storage, connectivity, and unified management capabilities to help companies manage the dramatically increasing data that they must cope with today. It is built on Cisco Unified Computing System™ (Cisco UCS) infrastructure using Cisco UCS 6200 Series Fabric Interconnects, (optional) Cisco Nexus® 2200 platform fabric extenders, and Cisco UCS C-Series Rack Servers. Installed in pairs, the fabric interconnects offer redundant, active-active connectivity and embedded management using Cisco UCS Manager. MapR is a complete distribution for Apache Hadoop that packages more than a dozen projects from the Hadoop ecosystem to provide you with a broad set of big data capabilities. The MapR platform provides enterprise-class features such as high availability, disaster recovery, security, and full data protection. It also allows Hadoop In collaboration with: 1 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Cisco UCS Integrated Infrastructure for Big Data with MapR to be easily accessed as traditional network attached storage (NAS) with read-write capabilities and multitenancy. The MapR Distribution offers multitenancy from the start. It provides powerful features to logically partition a physical cluster to provide separate administrative control, data placement, job processing, user quotas, and network access. Volumes—a unique feature in MapR—are the foundation of multi-tenancy. Volumes provide a way to organize data and apply different policies to different data sets, applications, and users and groups. A single cluster can have many volumes: up to hundreds of thousands. Together, Cisco and MapR provide enterprises with transparent, simplified data as well as management integration with an enterprise application ecosystem. They transparently work together to provide a uniquely capable, industry-leading architectural platform for Hadoop-based applications. Cisco UCS Solution for MapR The Cisco UCS solution for MapR is based on Cisco UCS Integrated Infrastructure for Big Data, a highly scalable architecture that includes computing, storage, connectivity, and unified management capabilities and is designed to meet a variety of scale-out application demands. It achieves this with transparent data integration and management integration capabilities built using the components described here, shown in Figure 1. Cisco UCS 6200 Series Fabric Interconnects Fabric interconnects establish a single point of connectivity and management for the entire system. They provide high-bandwidth, low- latency connectivity for servers, with integrated, unified management for all connected devices provided by Cisco UCS Manager. Deployed in redundant pairs, the interconnects offer the full active-active redundancy, performance, and exceptional scalability needed to support the large number of nodes that are typical in clusters serving big data applications. The manager enables rapid and consistent server configuration using service profiles, automating ongoing system maintenance activities such as firmware updates across the entire cluster as a single operation. It also offers advanced monitoring with options to raise alarms and send notifications about the health of the entire cluster. Cisco UCS C240 M4 Rack Server The rack server supports a wide range of computing, I/O, and storage-capacity demands in a compact design. The server is based on the Intel® Xeon® E5 v3 Family Processors and supports Figure 1 Cisco UCS Integrated Infrastructure for Big Data: A 64-Node Cluster 2 x Cisco UCS 6296 Fabric Interconnects 16 x Cisco UCS C240 M4 Servers 2 © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. Cisco UCS Integrated Infrastructure for Big Data with MapR 12-Gbps SAS throughput, delivering significant performance and efficiency gains over the previous generation of servers. The server uses dual Intel Xeon processor E5-2600 v3 series CPUs and supports up to 768 GB of main memory (128 or 256 GB is typical for big data applications) and a range of disk drive and SSD options. Twentyfour small-form-factor (SFF) disk drives are supported in the performanceoptimized option, and 12 large-formfactor (LFF) disk drives are supported in the capacity-optimized option, along with two 1 Gigabit Ethernet embedded LAN-on-motherboard (LOM) ports. The Cisco UCS Virtual Interface Card (VIC) 1227 is designed for the M4 generation of Cisco UCS C-Series Rack Servers. The VIC is optimized for high-bandwidth and low-latency cluster connectivity, with support for up to 256 virtual devices that are configured on demand through Cisco UCS Manager. MapR Distribution Including Apache Hadoop: Complete Hadoop Platform • Scalability: Up to a trillion files, with no restrictions on the number of nodes in a cluster As one of the technology leaders in Hadoop, MapR provides an enterprise-class Hadoop solution that can be quickly developed and easily administered. With significant investment in critical technologies, MapR offers a comprehensive Hadoop platform fully optimized for performance and scalability. The MapR Distribution includes over 20 tested and validated Hadoop software modules on an advanced data platform, offering exceptional ease of use, reliability, and performance for Hadoop deployments (See Figure 2). The benefits of the MapR’s distribution solution include: • Performance: Ultra-fast throughput • Standards-based APIs and tools: Standard Hadoop APIs, including Open Database Connectivity (ODBC), Java Database Connectivity (JDBC), Lightweight Directory Access Protocol (LDAP), and Linux (Pluggable Authentication Module (PAM) • MapR Direct Access Network File System (NFS): Random read-write high speed operations, real-time data flows, and transparent support for existing non-Java applications • Manageability: Advanced management console, rolling upgrades, and support for Representational State Transfer (REST) API • Integrated security: Kerberos and non-Kerberos options with wire-level encryption Figure 2 APACHE HADOOP AND OPERATIONS SUPPORT SYSTEM ECOSYSTEM MapR Control System (Management) Batch SQL NoSQL and Search Streaming Tez Data Integration and Access Security Workflow Provisioning and Data and Governance Coordination Drill Spark Hue Cascading GraphX Spark SQL Pig MLLib Impala Solr Storm HttpFS MapReduce v1 & v2 Mahout Hive HBase Spark Streaming Flume Sqoop YARN EXECUTIVE ENGINES 3 ML and Graph Oozie Hue Oozie ZooKeeper DATA GOVERNANCE AND OPERATIONS MapR File System (MapR-FS) MapR Data Platform © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. MapR-DB Cisco UCS Integrated Infrastructure for Big Data with MapR • Advanced multi-tenancy: Volumes, data placement control, job placement control and queues. • Consistent snapshots: Full data protection with point-in-time recovery • High availability: Ubiquitous high availability with no-NameNode architecture, YARN high availability, and NFS high availability • Disaster recovery: Cross-site replication with mirroring • MapR-DB: Integrated enterpriseclass NoSQL database Main Benefits of Multi-tenancy in MapR with UCS Volumes (unique to MapR) form the foundation of multi-tenancy as offered by MapR. In a typical deployment, the data for each user, group, application, or business unit is placed in a single volume so that it can be managed separately from the data of other users, groups, applications, and business units. Other Hadoop distributions do not support volumes, so policies can be defined only at the file or directory level (too detailed) or at the cluster level (not detailed enough). As a workaround, organizations using other Hadoop distributions create separate physical clusters for each tenant, which add architectural complexity, and thus higher risk of errors and failure. 4 Multi-tenancy in MapR also has significant total cost of ownership (TCO) advantages. It allows organizations to use a single cluster for multiple use cases rather than having to maintain a large number of isolated clusters. This approach reduces overall administrative overhead. It also enables the higher efficiency of a common resource pool. Here are some of the unique features of multi-tenancy in Cisco UCS Integrated Infrastructure for Big Data with MapR: • Data placement control: MapR provides the ability to restrict a volume to a subset of a cluster’s nodes. This feature allows to isolate sensitive data and applications and to use heterogeneous hardware. For example, data placement control can be used to keep specific data on separate nodes with different configurations, or to keep Apache Spark data on nodes that have SSDs. It can also be used for more advanced storage tiering policies, such as to keep old data on nodes that have a higher storage capacity and less computing power (such as Cisco UCS C3160 servers), and hence a lower cost per terabyte (TB) of storage. In combination with the MapR warden pluggable services, data placement control also enables administrators to designate specific nodes for a given application or service, such as Spark, effectively creating a minicluster within the larger cluster to help guarantee SLAs and resource availability. • Job placement control: MapR provides the ability to restrict a © 2015 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information. specific job or jobs from a specific user or group to a subset of the nodes in the cluster. This feature enables administrators to help guarantee SLAs for specific applications and to create separation between different applications or business units. This feature also allows administrators to designate a small subset of the nodes for low-priority jobs or jobs that require access to external systems through the corporate firewall. • Access control and security: MapR provides fine-grained, role-based access controls (RBAC) with access control expressions (ACEs) for tables, column families, and columns in MapR-DB; Unix permissions for files; and field-level access control via Apache Drill views. • MapR also provides cryptographically secure wire-level authentication and encryption. Organizations that have a Kerberos infrastructure can use it for authentication. Organizations that do not have a Kerberos infrastructure can use an integrated and simpler scheme that provides the same security without the complexity associated with Kerberos deployment and management. This leverages Linux Pluggable Authentication Modules (PAM) to enable integration with any PAMsupported registry. • Administration and reporting: MapR allows organizations to define and enforce storage, CPU, and memory quotas at the volume, user, and group levels. To help enable service Cisco UCS Integrated Infrastructure for Big Data with MapR providers to provide accurate usage and billing information, MapR offers resource usage reports encompassing more than 60 different metrics. These metrics are available through the MCS browserbased user interface, and—for upstream integration—through the command-line interface (CLI) and the REST API. Reference Architecture The current version of the Cisco UCS Integrated Infrastructure for Big Data offers the configurations listed in Table 1. The configuration used depends on the computing and storage requirements of Hadoop. For More Information For more information about Cisco UCS big data solutions, please visit http:// www.cisco.com/go/bigdata_design. For more information about Cisco UCS Integrated Infrastructure for Big Data, please visit http://blogs.cisco.com/ datacenter/cpav3/. For more information about MapR, please visit www.MapR.com. Table 1: Cisco UCS Integrated Infrastructure for Big Data Configuration Details Performance Optimized Capacity Optimized Connectivity: Connectivity: • 2 Cisco UCS 6296UP 96 Port Fabric Interconnects • 2 Cisco UCS 6296UP 96-Port Fabric Interconnects Scaling: Scaling: • Up to 80 servers per domain • Up to 80 servers per domain • Up to 160 servers per domain with Cisco Nexus 2232PP 10GE Fabric Extender • Up to 160 servers per domain with Cisco Nexus 2232PP 10GE Fabric Extender 16 Cisco UCS C240 M4 Rack Servers (SFF), each with: 16 Cisco UCS C240 M4 Rack Servers (LFF), each with: • 2 Intel Xeon processor E5-2680 v3 CPUs • 2 Intel Xeon processor E5-2620 v3 CPUs • 256 GB of memory • 128 GB of memory • Cisco 12-Gbps SAS modular RAID controller with 2-GB flash-based write cache (FBWC) • Cisco 12-Gbps SAS modular RAID controller with 2-GB FBWC • 24 x 1.2-TB 10,000-rpm SFF SAS drives (460 TB total) • 2 x 120-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for bootup • Cisco UCS VIC 1227 (with 2 x 10 Gigabit Ethernet SFP+ ports) • 12 x 4-TB 7200-rpm LFF SAS drives (768 TB total) • 2 x 120-GB 6-Gbps 2.5-inch Enterprise Value SATA SSDs for bootup • Cisco UCS VIC 1227 (with 2 x 10 Gigabit Ethernet SFP+ ports) Scale to tens of thousands of servers with Cisco ACI For more information about the Cisco® SmartPlay program, please visit http://www.cisco.com/go/smartplay. For more information on the Cisco Validated Design (CVD) for the solution, please visit: http://www.cisco.com/c/dam/en/us/td/docs/unified_computing/ucs/UCS_ CVDs/Cisco_UCS_Integrated_Infrastructure_for_Big_Data_with_MapR.pdf. Americas Headquarters Cisco Systems, Inc. San Jose, CA Asia Pacific Headquarters Cisco Systems (USA) Pte. Ltd. Singapore Europe Headquarters Cisco Systems International BV Amsterdam, The Netherlands Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco Website at www.cisco.com/go/offices. Cisco and the Cisco Logo are trademarks of Cisco Systems, Inc. and/or its affiliates in the U.S. and other countries. A listing of Cisco’s trademarks can be found at www. cisco.com/go/trademarks. Third party trademarks mentioned are the property of their respective owners. The use of the word partner does not imply a partnership relationship between Cisco and any other company. (1005R)