Preview only show first 10 pages with watermark. For full document please download

Competency - Media Development Authority

   EMBED


Share

Transcript

NATIONAL COMPETENCY STANDARD Framework : National Infocomm Competency Framework Competency Category : Big Data Analytics Competency Code : IT-BDA-302S-1 Competency Unit : Prepare data for big data analytics Competency Descriptor : The unit defines the competency to ingest and prepare the data for big data analytics. It involves reviewing the data requirements, ingesting and cleansing the data required for the analytics projects. Complexity Level : 3 (Entrant level) Credit Value : 3 Version Number : 1 Effective Date : 2013 Review Date : NA Developer : WDA & IDA Custodian : WDA & IDA Copyright © 2013 Singapore Workforce Development Agency & Infocomm Development Authority of Singapore All rights reserved. This document is provided for the explicit use and guidance of parties approved by WDA & IDA as information resource only. Any other use of this document or parts thereof, including reproduction, publication, distribution, transmission, re-transmission or public showing, or storage in a retrieval system in any form, electronic or otherwise, for purposes other than that expressly stated above without the express permission of WDA & IDA is strictly prohibited. National Competency Standard Competency Unit Code Complexity Level IT-BDA-302S-1 Competency Unit Title 3 (Entrant level) Prepare data for big data analytics Relevant Job Roles/Occupations The job role(s) / occupations that this unit would be relevant to may include: Specialist (Technical)  Big Data Analyst Assumed Skills and Knowledge Knowledge and skills that the individual should preferably have to confidently undertake the unit and to be successful subsequently on the job Learners are assumed to: 1. Possess basic knowledge of big data technology and programming language 2. Possess knowledge on databases and SQL 3. Possess knowledge on design extract, transform and load (ETL) process 4. Be aware of the data governance policies in the organisation Performance Statements The critical aspects of job performance, stating the evaluative criterion and expected outcome of tasks A competent individual must be able to successfully perform the following: 1. Review the data requirements required for the analytics project 2. Ingest data from different data sources into the analytics platform using the tools / programming language 3. Cleanse and transform the data according to the data requirements to support the analytics project 4. Resolve and follow up with any issues arising during the data preparation Copyright 2013 © Singapore Workforce Development Agency & Infocomm Development Authority of Singapore. All rights reserved. 2 National Competency Standard Competency Unit Code Complexity Level IT-BDA-302S-1 Competency Unit Title 3 (Entrant level) Prepare data for big data analytics Underpinning Knowledge Knowledge that is acquired during the course of training and is essential to support competent performance. May include principles, processes, methods, procedures, legislative/legal requirements, interactions with others A competent individual needs to know and understand: 1. Tools / programming languages for ingesting / transforming / cleansing big data 2. Nature of data and data sources of the data to be prepared 3. Organisation’s data collection process 4. The concepts of data quality 5. Data modelling Range of Application Types of contexts or circumstances under which competent performance may be demonstrated. It gives further references to specific areas or terms in the Performance Statements Analytics project Analytics project may include:  Web analytics  Mobile analytics  Social media analytics  Voice & image analytics  Digital marketing analytics  Predictive analytics Analytics platform Analytics platform may refer to:  Database (RDBS)  Non-relational systems (e.g. Hadoop)  Sandbox  Analytical tool Tools / programming language Tools / programming language for ingesting / transforming / cleansing big data may include:  Hadoop (HDFS, Mapreduce, Pig, Hive) Copyright 2013 © Singapore Workforce Development Agency & Infocomm Development Authority of Singapore. All rights reserved. 3 National Competency Standard Competency Unit Code Complexity Level IT-BDA-302S-1 Competency Unit Title 3 (Entrant level) Prepare data for big data analytics Range of Application Types of contexts or circumstances under which competent performance may be demonstrated. It gives further references to specific areas or terms in the Performance Statements        NoSQL / SQL Extensions (e.g. database engines, extension framework) Statistical packages (e.g. R, SAS, SPSS) Business intelligence (BI) tools R-based tools (e.g. RevoScaleR) ETL tools (e.g. EMC, Greenplum, Informatica) R programming language Nature of data Nature of data may include:  Volume of data  Volatility of data  Variety of data  Confidentiality of data Data sources Data sources may include:  Hbase  NoSQL database  MapReduce / HDFS  Relational database  Data warehouse  Web logs  Operational systems Cleanse and transform Cleanse and transform the data may refer to:  Populating blank values  Normalising the data  Making data type consistent Evidence Sources Types of proof (product, process and knowledge evidences) an individual may produce to demonstrate competent performance Product evidence: Copyright 2013 © Singapore Workforce Development Agency & Infocomm Development Authority of Singapore. All rights reserved. 4 National Competency Standard Competency Unit Code Complexity Level IT-BDA-302S-1 Competency Unit Title 3 (Entrant level) Prepare data for big data analytics Evidence Sources Types of proof (product, process and knowledge evidences) an individual may produce to demonstrate competent performance  Data loaded onto the analytics platform / analytical tool  Report documenting the issues identified and follow up actions taken Process evidence:    Demonstrate ability to review the data requirements required for the analytics project Demonstrate ability to ingest data from difference data sources into the analytics platform using the tools / programming language Demonstrate ability to cleanse and transform the data according to the data requirements to support the analytics project Knowledge evidence:  Written report describing the organisation’s data collection process  Test on the knowledge of the tools / programming languages for ingesting data Version Control Record Version 1.0 Effective Date Changes Initial version Author WDA & IDA Copyright 2013 © Singapore Workforce Development Agency & Infocomm Development Authority of Singapore. All rights reserved. 5