Module 1: AWS in Big Data introduction
- Introduction to Cloud Computing
- Cloud Computing Deployments Models
- Amazon Web Services Cloud Platform
- The Cloud Computing Difference
- AWS Cloud Economics
- AWS Virtuous Cycle
- AWS Cloud Architecture Design Principles
- Why AWS for Big Data - Reasons
- Why AWS for Big Data - Challenges
- Databases in AWS
- Relational vs Non-Relational Databases
- Data Warehousing in AWS
- Services for Collecting, Processing, Storing, and Analyzing Big Data
- Amazon Redshift
- Amazon Kinesis
- Amazon EMR
- Amazon DynamoDB
- Amazon Machine Learning
- AWS Lambda
- Amazon Elasticsearch Service
- Amazon EC2 (big data analytics software on EC2 instances)
- Amazon Redshift
- Amazon Kinesis
- Amazon EMR
- Amazon DynamoDB
- Amazon Machine Learning
- AWS Lambda
- Amazon Elasticsearch Service
- Amazon EC2 (big data analytics software on EC2 instances)
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 2 - Collection
- Objectives
- Amazon Kinesis Fundamentals
- Loading Data into Kinesis Stream
- Kinesis Data Stream High-Level Architecture
- Kinesis Stream Core Concepts
- Kinesis Stream Emitting Data to AWS Services
- Kinesis Connector Library
- Kinesis Firehose
- Transferring Data Using Lambda
- Amazon SQS
- IoT and Big Data
- IoT Framework
- AWS Data Pipeline
- AWS Data Pipeline Components
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 3 - Storage
- Objectives
- Introduction to AWS Big Data Storage Services
- Amazon Glacier
- Glacier and Big Data
- DynamoDB Introduction
- The Architecture of the DynamoDB Table
- DynamoDB in AWS Ecosystem
- DynamoDB Partitions
- Data Distribution
- Local Secondary Index (LSI) **
- Global Secondary Index (GSI) **
- DynamoDB GSI vs LSI
- DynamoDB Stream
- Cross-Region Replication in DynamoDB
- Partition Key Selection
- Snowball & AWS Big Data
- AWS DMS
- AWS Aurora in Big Data
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 4 - Processing I
- Objectives
- Introduction to AWS Big Data Processing Services
- Amazon Elastic MapReduce (EMR)
- Apache Hadoop
- EMR Architecture
- Storage Options
- EMR File Storage and Compression
- Supported File Format and File Size
- Single-AZ Concept
- EMR Operations
- EMR Releases
- AWS Cluster
- Launching a Cluster
- Advanced EMR Setting Option
- Choosing Instance Type
- Number of Instances
- Monitoring EMR
- Resizing of Cluster
- Using Hue with EMR
- Setup Hue for LDAP
- Hive on EMR
- Hive Use Cases
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 5: Processing II
- HBase with EMR
- HBase Use Cases
- Comparison of HBase with Redshift and DynamoDB
- HBase Architecture HBase on S3
- HBase and EMRFS
- HBase Integration
- HCatalog
- Presto with EMR
- Advantages of Presto
- Presto Architecture
- Spark with EMR
- Spark Use Cases
- Spark Components
- Spark Integration With EMR
- AWS Lambda in AWS Big Data Ecosystem
- Limitations of Lambda
- Lambda and Kinesis Stream
- Lambda and Redshift
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 6: Analysis I
- Objectives
- Introduction to AWS Big Data Analysis Services
- RedShift
- RedShift Architecture
- RedShift in the AWS Ecosystem
- Columnar Databases
- RedShift Table Design
- RedShift Workload Management
- RedShift Loading Data
- RedShift Maintenance and Operations
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 7: Analysis II
- Machine Learning
- Machine Learning - Use Cases
- Algorithms
- Amazon SageMaker
- Elasticsearch
- Amazon Elasticsearch Service
- Loading of Data into Elasticsearch
- Logstash
- Kibana
- RStudio
- Characteristics
- Athena
- Presto and Hive
- Integration with AWS Glue
- Comparison of Athena with Other AWS Services
- Lab Run Query on S3 Using Serverless Athena
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 8: Visualisation
- Objectives
- Introduction to AWS Big Data Visualization Services
- Amazon QuickSight
- Amazon QuickSight - Use Cases
- LAB Create an Analysis with a Single Visual Using Sample Data
- Working with Data
- Assisted Practice: TBD
- QuickSight Visualization
- Big Data Visualization
- Apache Zeppelin
- Jupyter Notebook
- Comparison Between Notebooks
- D3.js (Data-Driven Documents)
- MicroStrategy
- Key Takeaway
- Knowledge Checks
- Lesson End Project
Module 9: Security
- Objectives
- Introduction to AWS Big Data Security Services
- EMR Security
- Roles
- Private Subnet
- Encryption At Rest and In Transit
- RedShift Security
- KMS Overview
- SloudHSM
- Limit Data Access
- STS and Cross Account Access
- Cloud Trail
- Key Takeaway
- Knowledge Checks
- Lesson End Project