Course
digicode: IDEGC
Introduction to Data Engineering on Google Cloud («IDEGC»)
Learn about data engineering on Google Cloud, the roles and responsibilities of data engineers, and how those map to offerings provided by Google Cloud. You will also learn about ways to address data engineering challenges.
Duration
1 day
Price
850.–
Course documents
Official Google Cloud courseware
Course facts
- Understanding the role of a data engineer
- Identifying data engineering tasks and core components used on Google Cloud
- Understanding how to create and deploy data pipelines of varying patterns on Google Cloud
- Identifying and utilizing various automation techniques on Google Cloud
- The role of a data engineer
- Data sources versus data sinks
- Data formats
- Storage solution options on Google Cloud
- Metadata management options on Google Cloud
- Sharing datasets using Analytics Hub
- Explaining the role of a data engineer
- Understanding the differences between a data source and a data sink
- Explaining the different types of data formats
- Explaining the storage solution options on Google Cloud
- Learning about the metadata management options on Google Cloud
- Understanding how to share datasets with ease using Analytics Hub
- Understanding how to load data into BigQuery using the Google Cloud console or the gcloud CLI
- Lab: Loading Data into BigQuery
- Replication and migration architecture
- The gcloud command-line tool
- Moving datasets
- Datastream
- Explaining the baseline Google Cloud data replication and migration architecture
- Understand the options and use cases for the gcloud command-line tool
- Explaining the functionality and use cases for Storage Transfer Service
- Explaining the functionality and use cases for Transfer Appliance
- Understanding the features and deployment of Datastream
- Lab: Datastream: PostgreSQL Replication to BigQuery (optional)
- Extract and load architecture
- The bq command-line tool
- BigQuery Data Transfer Service
- BigLake
- Explaining the baseline extract and load architecture diagram
- Understanding the options of the bq command-line tool
- Explaining the functionality and use cases for BigQuery Data Transfer Service
- Explaining the functionality and use cases for BigLake as a non-extract-load pattern
- Lab: BigLake: Qwik Start
- Extract, load, and transform (ELT) architecture
- SQL scripting and scheduling with BigQuery
- Dataform
- Explaining the baseline extract, load, and transform architecture diagram
- Understanding a common ELT pipeline on Google Cloud
- Learning about BigQuery’s SQL scripting and scheduling capabilities
- Explaining the functionality and use cases for Dataform
- Lab: Creating and Executing a SQL Workflow in Dataform
- Extract, transform, and load (ETL) architecture
- Google Cloud GUI tools for ETL data pipelines
- Batch data processing using Dataproc
- Streaming data processing options
- Bigtable and data pipelines
- Explaining the baseline extract, transform, and load architecture diagram
- Learning about the GUI tools on Google Cloud used for ETL data pipelines
- Explaining batch data processing using Dataproc
- Learning how to use Dataproc Serverless for Spark for ETL
- Explaining streaming data processing options
- Explaining the role Bigtable plays in data pipelines
- Lab: Using Dataproc Serverless for Spark to Load BigQuery (optional)
- Lab: Creating a Streaming Data Pipeline for a Real-Time Dashboard with Dataflow
- Automation patterns and options for pipelines
- Cloud Scheduler and Workflows
- Cloud Composer
- Cloud Run Functions
- Eventarc
- Explaining the automation patterns and options available for pipelines
- Learning about Cloud Scheduler and Workflows
- Learning about Cloud Composer
- Learning about Cloud Run functions
- Explaining the functionality and automation use cases for Eventarc
- Lab: Using Cloud Run Functions to Load BigQuery (optional)
- Data engineers
- Database administrators
- System administrators
- Prior Google Cloud experience at the fundamental level using Cloud Shell and accessing products from the Google Cloud console
- Basic proficiency with a common query language such as SQL
- Experience with data modeling and ETL (extract, transform, load) activities
- Experience developing applications using a common programming language such as Python
- Analytics Hub
- BigQuery
- Storage Transfer Service
- Transfer Appliance
- Datastream
- BigQuery Data Transfer Service
- BigLake
- Dataform
- Dataproc
- Bigtable
- Dataflow
- Cloud Scheduler
- Workflows
- Cloud Composer
- Cloud Run functions
- Eventarc