Hadoop Data Analytics Training

  • Overview
  • Course Content
  • Drop us a Query

Hadoop Data Analytics training course explains how to apply data analytics and business intelligence skills to Big Data. This Big Data Analytics training lays emphasis on the usage of Apache Pig, Hive, and Cloudera Impala. It will drive you through the process of developing distributed processing of large data sets across clusters of computers and administering Hadoop. The participants will learn how to handle heterogeneous data coming from different sources. This data may be structured, unstructured, communication records, log files, audio files, pictures, and videos.

By the end of Hadoop Data Analytics training course, the participants will exhibit the following skills:

  • Explain the fundamentals of Apache Hadoop, Data ETL (extract,  transform,  load), data processing using Hadoop tools
  • Performing data analysis and processing complex data using Pig
  • Perform data management and text processing using Hive
  • Extending, troubleshooting, and optimizing Pig and Hive performance
  • Analyze data with Impala
  • Comparative study of MapReduce, Pig, Hive, Impala, and Relational Databases
Target audience
  • Data architect
  • Data integration architect
  • Data scientist
  • Data analyst
  • Decision makers
  • Hadoop administrators and developers
Prerequisites

The candidates with working experience with SQL or basic LINUX commands are ideal for this training.

1. Introduction

  • About this Course
  • About Big Data
  • Course Logistics
  • Introductions

2. Hadoop Fundamentals

  • The Motivation for Hadoop
  • Hadoop Overview
  • HDFS
  • MapReduce
  • The Hadoop Ecosystem
  • Lab Scenario Explanation
  • Hands-On Exercise: Data Ingest with Hadoop Tools

3. Introduction to Pig

  • What Is Pig?
  • Pig’s Features
  • Pig Use Cases
  • Interacting with Pig

4. Basic Data Analysis with Pig

  • Pig Latin Syntax
  • Loading Data
  • Simple Data Types
  • Field Definitions
  • Data Output
  • Viewing the Schema
  • Filtering and Sorting Data
  • Commonly-Used Functions
  • Hands-On Exercise: Using Pig for ETL Processing

5. Processing Complex Data with Pig

  • Storage Formats
  • Complex/Nested Data Types
  • Grouping
  • Built-in Functions for Complex Data
  • Iterating Grouped Data
  • Hands-On Exercise: Analyzing Ad Campaign Data with Pig

6. Multi-Dataset Operations with Pig

  • Techniques for Combining Data Sets
  • Joining Data Sets in Pig
  • Set Operations
  • Splitting Data Sets
  • Hands-On Exercise: Analyzing Disparate Data Sets with Pig

7. Extending Pig

  • Adding Flexibility with Parameters
  • Macros and Imports
  • UDFs
  • Contributed Functions
  • Using Other Languages to Process Data with Pig
  • Hands-On Exercise: Extending Pig with Streaming and UDFs

8. Pig Troubleshooting and Optimization

  • Troubleshooting Pig
  • Logging
  • Using Hadoop’s Web UI
  • Optional Demo: Troubleshooting a Failed Job with the Web UI
  • Data Sampling and Debugging
  • Performance Overview
  • Understanding the Execution Plan
  • Tips for Improving the Performance of Your Pig Jobs

9. Introduction to Hive

  • What Is Hive?
  • Hive Schema and Data Storage
  • Comparing Hive to Traditional Databases
  • Hive vs. Pig
  • Hive Use Cases
  • Interacting with Hive

10. Relational Data Analysis with Hive

  • Hive Databases and Tables
  • Basic HiveQL Syntax
  • Data Types
  • Joining Data Sets
  • Common Built-in Functions
  • Hands-On Exercise: Running Hive Queries on the Shell, Scripts, and Hue

11. Hive Data Management

  • Hive Data Formats
  • Creating Databases and Hive-Managed Tables
  • Loading Data into Hive
  • Altering Databases and Tables
  • Self-Managed Tables
  • Simplifying Queries with Views
  • Storing Query Results
  • Controlling Access to Data
  • Hands-On Exercise: Data Management with Hive

12. Text Processing with Hive

  • Overview of Text Processing
  • Important String Functions
  • Using Regular Expressions in Hive
  • Sentiment Analysis and N-Grams
  • Hands-On Exercise (Optional): Gaining Insight with Sentiment Analysis

    13. Hive Optimization

  • Understanding Query Performance
  • Controlling Job Execution Plan
  • Partitioning
  • Bucketing
  • Indexing Data

14. Extending Hive

  • SerDes
  • Data Transformation with Custom Scripts
  • User-Defined Functions
  • Parameterized Queries
  • Hands-On Exercise: Data Transformation with Hive

15. Introduction to Impala

  • What is Impala?
  • How Impala Differs from Hive and Pig
  • How Impala Differs from Relational Databases
  • Limitations and Future Directions
  • Using the Impala Shell

16. Analyzing Data with Impala

  • Basic Syntax
  • Data Types
  • Filtering, Sorting, and Limiting Results
  • Joining and Grouping Data
  • Improving Impala Performance
  • Hands-On Exercise: Interactive Analysis with Impala

17. Choosing the Best Tool for the Job

  • Comparing MapReduce, Pig, Hive, Impala, and Relational Databases
  • Which to Choose?

A Few Things You'll Love!

What our Students Speak

+