Course
digicode: DP750V
Implement Data Engineering Solutions Using Azure Databricks – Flexible Training
DP-750
Course facts
Download as PDF- Provisioning the Databricks workspace and establish comprehensive data governance using Unity Catalog and Microsoft Purview
- Organizing data assets (tables, views, and volumes) using catalogs and schemas within Unity Catalog, applying effective naming conventions
- Implementing access strategies, including fine-grained control (row filtering/column masking), and securely managing credentials via service principals and managed identities
- Selecting and configuring compute types, enabling performance features like Photon acceleration, and managing autoscaling and Databricks Runtime versions for various workloads
- Designing ingestion for batch and streaming data using tools like Lakeflow Connect, SQL commands (COPY INTO), Auto Loader, or Spark Structured Streaming
- Profiling and transforming data (joins, aggregations), managing data types, enforcing schema, and validating data quality using pipeline expectations
- Building and scheduling data pipelines using Lakeflow Spark Declarative Pipelines or notebooks, managed by Lakeflow Jobs with triggers, dependencies, and error handling
- Using Git for version control, automating deployment with Databricks Asset Bundles, and monitoring performance via Spark UI and centralized logging (Azure Log Analytics)
1 Explore Azure Databricks
Azure Databricks is a cloud service that provides a scalable platform for data analytics using Apache Spark.
2 Understand Azure Databricks architecture
This module details the hierarchical architecture of Azure Databricks, covering the separation of control and compute planes, account hierarchy, and various storage options, including Unity Catalog managed storage.
3 Understand Azure Databricks Integrations
Learn how Azure Databricks integrates with multiple Microsoft services, such as Fabric, Power BI, and Copilot Studio, to provide end-to-end data engineering, analytics, and AI solutions.
4 Select and Configure Compute in Azure Databricks
Explore selecting and configuring Azure Databricks compute options to optimize for different workloads, manage performance settings and access permissions, and secure serverless and classic compute resources.
5 Create and organize objects in Unity Catalog
This module covers using Unity Catalog's three-layer namespace (catalogs, schemas, and objects) to organize data assets, create tables and volumes, and configure AI/BI Genie instructions to enhance data discoverability.
6 Secure Unity Catalog objects
Explore securing Unity Catalog objects using centralized governance and security features like access control, fine-grained permissions, row/column filtering, and authenticating data access with service principals.
7 Govern Unity Catalog objects
Essential governance practices in Unity Catalog are covered here, including implementing fine-grained access control, tracking data lineage, configuring audit logs, and securely sharing data to monitor and manage your data estate.
8 Design and implement data modeling with Azure Databricks
This module explores effective data modeling in Azure Databricks with Unity Catalog, covering ingestion logic design, selection of tools/formats, implementation of partitioning and clustering, and management of slowly changing dimensions.
9 Ingest data into Unity Catalog
Explore comprehensive data ingestion techniques in Azure Databricks for loading data into Unity Catalog tables, including managed connectors, custom code, SQL batch loading, streaming ingestion, Auto Loader, and orchestration with Lakeflow Spark Declarative Pipelines.
10 Cleanse, transform, and load data into Unity Catalog
This module explores essential data engineering techniques for cleansing and transforming raw data, covering data quality profiling, value resolution, filtering, aggregation, combining/reshaping datasets, and loading transformed data using append, overwrite, and merge strategies.
11 Implement and manage data quality constraints with Azure Databricks
Strategies for maintaining high data quality in Azure Databricks are explored, focusing on implementing validation checks, enforcing schemas, managing schema drift, and using pipeline expectations for data integrity.
12 Design and implement data pipelines with Azure Databricks
Learn to design and implement robust data pipelines in Azure Databricks using notebooks and Lakeflow Spark Declarative Pipelines, covering orchestration, error handling, and task logic.
13 Implement Lakeflow Jobs with Azure Databricks
Implementing Lakeflow Jobs in Azure Databricks is the focus of this module, which guides you through creating jobs, configuring triggers/schedules, setting up alerts, and managing automatic restarts for reliable data pipeline execution.
14 Implement development lifecycle processes in Azure Databricks
This module explores implementing development lifecycle processes in Azure Databricks using Git folders for version control and Databricks Asset Bundles for infrastructure-as-code deployments, including branching workflows, testing, and CLI-based deployment.
15 Monitor, troubleshoot and optimize workloads in Azure Databricks
Discover how to monitor, troubleshoot, and optimize data workloads in Azure Databricks for reliability and cost-effectiveness by analyzing cluster consumption, diagnosing Spark jobs, optimizing performance, and streaming logs to Azure Log Analytics.
Component of the following courses
- Implement Data Engineering Solutions Using Azure Databricks – Flexible Training
This course is aimed at data engineers who have fundamental knowledge of data analytics concepts, a basic understanding of cloud storage, and familiarity with data organization principles.
- Experience working with SQL and Python programming, including the use of notebooks, and familiarity with SQL for data organization and access patterns
- Good understanding of Azure Databricks workspaces and Unity Catalog concepts
- Foundational knowledge of Azure security, including Microsoft Entra ID (Entra ID), and a basic understanding of cloud storage concepts
- Fundamental knowledge of data analytics and data engineering concepts
- Familiarity with Git version control fundamentals