Introduction to Data Engineering on Google Cloud
- Código del Curso GO9093
- Duración 1 Día
Otros Métodos de Impartición
Método de Impartición
Este curso está disponible en los siguientes formatos:
-
Clase de calendario
Aprendizaje tradicional en el aula
-
Aprendizaje Virtual
Aprendizaje virtual
Solicitar este curso en un formato de entrega diferente.
Temario
Parte superiorCalendario
Parte superiorDirigido a
Parte superior
- Data engineers
- Database administrators
- System administrators
Objetivos del Curso
Parte superior- Understand the role of a data engineer.
- Identify data engineering tasks and core components used on Google Cloud.
- Understand how to create and deploy data pipelines of varying patterns on Google Cloud.
- Identify and utilize various automation techniques on Google Cloud.
Contenido
Parte superiorModule 01: Data Engineering Tasks and Components
Topics M01:
- The role of a data engineer
- Data sources versus data sinks
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Sharing datasets using Analytics Hub
Objectives M01:
- Explain the role of a data engineer.
- Understand the differences between a data source and a data sink.
- Explain the different types of data formats.
- Explain the storage solution options on Google Cloud.
- Learn about the metadata management options on Google Cloud.
- Understand how to share datasets with ease using Analytics Hub.
- Understand how to load data into BigQuery using the Google Cloud console or the gcloud CLI.
Activities M01:
- Lab: Loading Data into BigQuery
- Quiz
Module 02: Data Replication and Migration
Topics M02:
- Replication and migration architecture
- The gcloud command-line tool
- Moving datasets
- Datastream
Objectives M02:
- Explain the baseline Google Cloud data replication and migration architecture.
- Understand the options and use cases for the gcloud command-line tool.
- Explain the functionality and use cases for Storage Transfer Service.
- Explain the functionality and use cases for Transfer Appliance.
- Understand the features and deployment of Datastream.
Activities M02:
- Lab: Datastream: PostgreSQL Replication to BigQuery (optional for ILT)
- Quiz
Module 03: The Extract and Load Data Pipeline Pattern
Topics M03:
- Extract and load architecture
- The bq command-line tool
Objectives M03:
- Explain the baseline extract and load architecture diagram.
- Understand the options of the bq command-line tool.
- Explain the functionality and use cases for BigQuery Data Transfer Service.
- Explain the functionality and use cases for BigLake as a non-extract-load pattern
Activities M03:
- Lab: BigLake: Qwik Start
- Quiz
Module 04: The Extract, Load, and Transform Data Pipeline Pattern
Topics M04:
- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
Objectives M04:
- Explain the baseline extract, load, and transform architecture diagram.
- Understand a common ELT pipeline on Google Cloud.
- Learn about BigQuery’s SQL scripting and scheduling capabilities.
- Explain the functionality and use cases for Dataform.
Activities M04:
- Lab: Create and Execute a SQL Workflow in Dataform
- Quiz
Module 05: The Extract, Transform, and Load Data Pipeline Pattern
Topics M05:
- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
Objectives M05:
- Explain the baseline extract, transform, and load architecture diagram.
- Learn about the GUI tools on Google Cloud used for ETL data pipelines.
- Explain batch data processing using Dataproc.
- Learn how to use Dataproc Serverless for Spark for ETL.
- Explain streaming data processing options.
- Explain the role Bigtable plays in data pipelines.
Activities M05:
- Lab: Use Dataproc Serverless for Spark to Load BigQuery (optional for ILT)
- Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow
- Quiz
Module 06: Automation Techniques
Topics M06:
- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run Functions
- Eventarc
Objectives M06:
- Explain the automation patterns and options available for pipelines.
- Learn about Cloud Scheduler and Workflows.
- Learn about Cloud Composer.
- Learn about Cloud Run functions.
- Explain the functionality and automation use cases for Eventarc.
Activities M06:
- Lab: Use Cloud Run Functions to Load BigQuery (optional for ILT)
- Quiz
Pre-requisitos
Parte superior- Prior Google Cloud experience at the fundamental level using Cloud Shell and accessing products from the Google Cloud console.
- Basic proficiency with a common query language such as SQL.
- Experience with data modeling and ETL (extract, transform, load) activities.
- Experience developing applications using a common programming language such as Python