Hadoop Data Analytics Training
- Overview
- Course Content
- Drop us a Query
Hadoop Data Analytics training course explains how to apply data analytics and business intelligence skills to Big Data. This Big Data Analytics training lays emphasis on the usage of Apache Pig, Hive, and Cloudera Impala. It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.
By the end of Hadoop Data Analytics training course, the participants will exhibit the following skills:
- Explain the fundamentals of Apache Hadoop, Data ETL (extract, transform, load), data processing using Hadoop tools
- Performing data analysis and processing complex data using Pig
- Perform data management and text processing using Hive
- Extending, troubleshooting, and optimizing Pig and Hive performance
- Analyze data with Impala
- Comparative study of MapReduce, Pig, Hive, Impala, and Relational Databases
- Data architect
- Data integration architect
- Data scientist
- Data analyst
- Decision makers
- Hadoop administrators and developers
The candidates with working experience with SQL or basic LINUX commands are ideal for this training.
- 1. Introduction
- 2. Hadoop Fundamentals
- 3. Introduction to Pig
- 4. Basic Data Analysis with Pig
- 5. Processing Complex Data with Pig
- 6. Multi-Dataset Operations with Pig
- 7. Extending Pig
- 8. Pig Troubleshooting and Optimization
- 9. Introduction to Hive
- 10. Relational Data Analysis with Hive
- 11. Hive Data Management
- 12. Text Processing with Hive
- 13. Hive Optimization
- 14. Extending Hive
- 15. Introduction to Impala
- 16. Analyzing Data with Impala
- 17. Choosing the Best Tool for the Job
1. Introduction
- About this Course
- About Big Data
- Course Logistics
- Introductions
2. Hadoop Fundamentals
- The Motivation for Hadoop
- Hadoop Overview
- HDFS
- MapReduce
- The Hadoop Ecosystem
- Lab Scenario Explanation
- Hands-On Exercise: Data Ingest with Hadoop Tools
3. Introduction to Pig
- What Is Pig?
- Pig’s Features
- Pig Use Cases
- Interacting with Pig
4. Basic Data Analysis with Pig
- Pig Latin Syntax
- Loading Data
- Simple Data Types
- Field Definitions
- Data Output
- Viewing the Schema
- Filtering and Sorting Data
- Commonly-Used Functions
- Hands-On Exercise: Using Pig for ETL Processing
5. Processing Complex Data with Pig
- Storage Formats
- Complex/Nested Data Types
- Grouping
- Built-in Functions for Complex Data
- Iterating Grouped Data
- Hands-On Exercise: Analyzing Ad Campaign Data with Pig
6. Multi-Dataset Operations with Pig
- Techniques for Combining Data Sets
- Joining Data Sets in Pig
- Set Operations
- Splitting Data Sets
- Hands-On Exercise: Analyzing Disparate Data Sets with Pig
7. Extending Pig
- Adding Flexibility with Parameters
- Macros and Imports
- UDFs
- Contributed Functions
- Using Other Languages to Process Data with Pig
- Hands-On Exercise: Extending Pig with Streaming and UDFs
8. Pig Troubleshooting and Optimization
- Troubleshooting Pig
- Logging
- Using Hadoop’s Web UI
- Optional Demo: Troubleshooting a Failed Job with the Web UI
- Data Sampling and Debugging
- Performance Overview
- Understanding the Execution Plan
- Tips for Improving the Performance of Your Pig Jobs
9. Introduction to Hive
- What Is Hive?
- Hive Schema and Data Storage
- Comparing Hive to Traditional Databases
- Hive vs. Pig
- Hive Use Cases
- Interacting with Hive
10. Relational Data Analysis with Hive
- Hive Databases and Tables
- Basic HiveQL Syntax
- Data Types
- Joining Data Sets
- Common Built-in Functions
- Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue
11. Hive Data Management
- Hive Data Formats
- Creating Databases and Hive-Managed Tables
- Loading Data into Hive
- Altering Databases and Tables
- Self-Managed Tables
- Simplifying Queries with Views
- Storing Query Results
- Controlling Access to Data
- Hands-On Exercise: Data Management with Hive
12. Text Processing with Hive
- Overview of Text Processing
- Important String Functions
- Using Regular Expressions in Hive
- Sentiment Analysis and N-Grams
- Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis
13. Hive Optimization
- Understanding Query Performance
- Controlling Job Execution Plan
- Partitioning
- Bucketing
- Indexing Data
14. Extending Hive
- SerDes
- Data Transformation with Custom Scripts
- User-Defined Functions
- Parameterized Queries
- Hands-On Exercise: Data Transformation with Hive
15. Introduction to Impala
- What is Impala?
- How Impala Differs from Hive and Pig
- How Impala Differs from Relational Databases
- Limitations and Future Directions
- Using the Impala Shell
16. Analyzing Data with Impala
- Basic Syntax
- Data Types
- Filtering, Sorting, and Limiting Results
- Joining and Grouping Data
- Improving Impala Performance
- Hands-On Exercise: Interactive Analysis with Impala
17. Choosing the Best Tool for the Job
- Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
- Which to Choose?