20775A- Performing Data Engineering on Microsoft HDInsight Training

  • Overview
  • Course Content
  • Drop us a Query

Multisoft Systems is introducing Performing Data Engineering on Microsoft HDInsight Course Training to all the professionals who are planning to implement big data engineering workflows on HDInsight. This training program gives an in-depth knowledge of the subjects so that candidates can get that ability to plan effortlessly and implement the big data workflows on HDInsight without any setback.

After completing the Performing Data Engineering on Microsoft HDInsight Certification course, you will be able to:

  • Organize HDInsight Clusters
  • Load the data into HDInsight
  • Troubleshoot HDInsight
  • Analyze Data with Hive, Spark SQL, and Pheonix
  • Create the Big Data Real-Time Processing Solutions by using Apache Storm.
  • Define Stream Analytics.
Target Audience
  • Data Engineers
  • Data Scientists
  • Data Architects
  • Data Developers
Prerequisites

Candidates who are interested should have the following skills to attend this training program:

  • Strong grasp over Relational databases
  • Basic knowledge of the Microsoft Windows Operating System and its main functionalities.
  • Experience of Programming using R and knowledge of the common R packages.
  • Understanding of common statistical techniques and knowledge of the best practices used in Data Analysis.

Module 1: Getting Started with HDInsight - This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

  • What is Big Data?
  • Introduction to Hadoop
  • Working with MapReduce Function
  • Introducing HDInsight

Module 2: Deploying HDInsight Clusters - This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters.

  • Identifying HDInsight cluster types
  • Managing HDInsight clusters by using the Azure portal
  • Managing HDInsight Clusters by using Azure PowerShell

Module 3: Authorizing Users to Access Resources - This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters.

  • Non-domain Joined clusters
  • Configuring domain-joined HDInsight clusters
  • Manage domain-joined HDInsight clusters

Module 4: Loading data into HDInsight - This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage.

  • Storing data for HDInsight processing
  • Using data loading tools
  • Maximizing value from stored data

Module 5: Troubleshooting HDInsight - In this module, you will learn how to interpret logs associated with the various services of the Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services.

  • Analyze HDInsight logs
  • YARN logs
  • Heap dumps
  • Operations management suite

Module 6: Implementing Batch Solutions - In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig.

  • Apache Hive storage
  • HD Insight data queries using Hive and Pig
  • Operationalize HDInsight

Module 7: Design Batch ETL solutions for big data with Spark - This module provides an overview of Apache Spark, describing its main characteristics and key features.

  • What is Spark?
  • ETL with Spark
  • Spark performance

Module 8: Analyze Data with Spark SQL - This module describes how to analyze data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence.

  • Implementing iterative and interactive queries
  • Perform exploratory data analysis

Module 9: Analyze Data with Hive and Phoenix - In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. 

  • Implement interactive queries for big data with interactive hive.
  • Perform exploratory data analysis by using Hive
  • Perform interactive processing by using Apache Phoenix

Module 10: Stream Analytics - The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud.

  • Stream analytics
  • Process streaming data from stream analytics
  • Managing stream analytics jobs

Module 11: Implementing Streaming Solutions with Kafka and HBase - In this module, you will learn how to use Kafka to build streaming solutions.

  • Building and Deploying a Kafka Cluster
  • Publishing, Consuming, and Processing data using the Kafka Cluster
  • Using HBase to store and Query Data

Module 12: Develop big data real-time processing solutions with Apache Storm - This module explains how to develop big data real-time processing solutions with Apache Storm.

  • Persist long term data
  • Stream data with Storm
  • Create Storm topologies
  • Configure Apache Storm

Module 13: Create Spark Streaming Applications - This module describes Spark Streaming; explains how to use discretized streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.

  • Working with Spark Streaming
  • Creating Spark Structured Streaming Applications
  • Persistence and Visualization

Partners

A Few Things You'll Love!

What our Students Speak

+