Module 1: Data Engineering Roles and Key Concepts
- The role of a data engineer
- Data discovery for a data analytics system
- AWS services for data workflows
- Continuous integration and continuous delivery
- Networking considerations
Module 2:Designing and Implementing Data Lakes
- Data lake introduction
- Data lake storage
- Ingest data
- Catalog data
- Transform data
- Serve data for consumption
- Lab: Setting up a Data Lake on AWS
Module 3: Optimizing and Securing Data Lake Solutions
- Optimizing performance
- Security using Lake Formation
- Setting permissions with Lake Formation
- Security and governance
- Troubleshooting
- Lab: Automating Data Lake Creation using AWS Lake Formation Blueprints
Module 4: Data Warehouse Architecture and Design Principles
- Introduction to data warehouses
- Amazon Redshift overview
- Ingesting data into Amazon Redshift
- Processing data
- Serving data for consumption
- Lab: Setting up a Data Warehouse using Amazon Redshift Serverless
Module 5: Performance Optimization Techniques for Data Warehouses
- Monitoring and optimization options
- Data optimization in Amazon Redshift
- Query optimization in Amazon Redshift
- Data orchestration
Module 6: Security and Access Control for Data Warehouses
- Authentication and access control in Amazon Redshift
- Data security in Amazon Redshift
- Lab: Working with Amazon Redshift
Module 7: Designing Batch Data Pipelines
- Introduction to batch data pipelines
- Designing a batch data pipeline
- Ingesting batch data
Module 8: Implementing Strategies for Batch Data Pipelines
- Processing and transforming data
- Transforming data formats
- Integrating your data
- Cataloging data
- Serving data for consumption
- Lab: A Day in the Life of a Data Engineer
Module 9: Optimizing, Orchestrating, and Securing Batch Data Pipelines
- Optimizing the batch data pipeline
- Orchestrating the batch data pipeline
- Securing the batch data pipeline
- Lab: Orchestrating Data Processing in Spark using AWS Step Functions
Module 10: Streaming Data Architecture Patterns
- Introduction to streaming data pipelines
- Ingesting data from stream sources
- Storing streaming data
- Processing streaming data
- Analyzing streaming data
- Lab: Streaming Analytics with Amazon Managed Service for Apache Flink
Module 11: Optimizing and Securing Streaming Solutions
- Optimizing a streaming data solution
- Securing a streaming data pipeline
- Lab: Access Control with Amazon Managed Streaming for Apache Kafka
Module 12: Compliance and Cost Optimization
- Compliance considerations
- Cost optimization tools
Module 13: Course Wrap-Up