0% found this document useful (0 votes)

440 views84 pages

ADB Course Catalog

Uploaded by

SantoshJammi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

440 views84 pages

ADB Course Catalog

Uploaded by

SantoshJammi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 84

February 2022

Course Catalog
UPDATED: FEBRUARY 2022

Welcome to the Databricks Academy 6

About the Databricks Academy 6

About this course catalog 6

What’s new this quarter 7

What’s being retired this quarter 7

Databricks Academy offerings 7

Training 7

Credentials 8

Learning paths 9

Business Leaders / Databricks Overview 10

Platform administration 11

Data analysis 12

Data science / machine learning 13

Data engineering 13

Instructor-led course descriptions 14

Self-paced course descriptions 14

Apache Spark Programming with Databricks 14

Applications of SQL on Databricks 14

AWS Databricks Cloud Architecture and System Integration Fundamentals 15

AWS Databricks Cluster Usage Management 16

AWS Databricks Data Access Management 17

AWS Databricks Identity Access Management 18

AWS Databricks Security Fundamentals 18

AWS Databricks SQL Administration 19

AWS Databricks Workspace Deployment 20

Azure Databricks Cloud Architecture and System Integration Fundamentals 21

Azure Databricks Cluster Usage Management 22

Azure Databricks Data Access Management 22

Azure Databricks Identity Access Management 23

Azure Databricks Security Fundamentals 24

Azure Databricks SQL Administration 25

Azure Databricks Workspace Deployment 26

Basic SQL for Databricks SQL 26

Certification Prep Course for the Databricks Certified Associate Developer for Apache
Spark Exam 27

Configuring Workspace Access Control Lists (ACLs) 28

Data Engineering with Databricks 29

Data Science on Databricks Rapidstart 29

Data Science on Databricks: The Bias-Variance Tradeoff 30

Data Visualization with Databricks SQL 31

Databases, Tables, and Views on Databricks 31

Databricks Command Line Interface (CLI) Fundamentals 32

Databricks Datadog Integration 33

Databricks on Google Cloud: Architecture and Security Fundamentals 34

Databricks on Google Cloud: Cloud Architecture and System Integration 35

Databricks on Google Cloud: Cluster Usage Management 35

Databricks on Google Cloud: Workspace Deployment 36

Databricks with R 37

Delta Lake Rapid Start with Python 37

Delta Lake Rapid Start with Spark SQL 38

Deploying a Machine Learning Project with MLflow Projects 39

Easy ETL with Auto Loader 40

ELT with Spark SQL 40

Enterprise Architecture with Databricks 41

Getting Started with Databricks Data Science & Engineering Workspace 42

Getting Started with Databricks Machine Learning 43

Getting Started with Databricks SQL 44

Google Cloud Fundamentals 45

How to Ingest Data for Databricks SQL 45

Introduction to Apache Spark Architecture 46

Introduction to Applied Linear Models 47

Introduction to Applied Statistics 47

Introduction to Applied Tree-Based Models 48

Introduction to Applied Unsupervised Learning 49

Introduction to Cloning with Delta Lake 50

Introduction to Databricks Connect 51

Introduction to Databricks Repos 51

Introduction to Delta Lake 52

Introduction to Delta Live Tables 53

Introduction to Feature Engineering and Selection with Databricks 53

Introduction to Files in Databricks Repos 54

Introduction to Hyperparameter Optimization 55

Introduction to Jobs 56

Introduction to MLflow Model Registry 57

Introduction to MLflow Tracking 57

Introduction to Multi-Task Jobs 58

Introduction to Natural Language Processing 59

Introduction to Photon 59

Introduction to SQL on Databricks 60

Just Enough Python for Apache Spark 61

Lakehouse with Delta Lake Deep Dive 61

Migrating SAS Procedures to Databricks 62

Natural Language Processing at Scale with Databricks 63

New Capability Overview: Feature Store 63

Optimizing Apache Spark on Databricks 64

Propagating Changes with Delta Change Data Feed 65

Quick Reference: CI/CD 65

Quick Reference: Spark Architecture 66

Scaling Machine Learning Pipelines 66

Scalable Machine Learning with Apache Spark 67

SQL Coding Challenges 68

Structured Streaming 68

Tracking Experiments with MLflow 69

What are Enterprise Data Management Systems? 70

What is Big Data? 71

What is Cloud Computing? 71

What is Databricks Machine Learning? 72

What is Databricks SQL? 73

What is Delta Lake? 73

What is Machine Learning? 74

What is Structured Streaming? 75

What is the Databricks Lakehouse Platform? 76

What’s New in Apache Spark 3.0 77

Credential descriptions 77

Azure Databricks Certified Associate Platform Administrator 77

Databricks Certified Associate Developer for Apache Spark 3.0 78

Databricks Certified Associate Data Engineer 79

Databricks Certified Professional Data Engineer 80

Databricks Certified Professional Data Scientist 82

Fundamentals of the Databricks Lakehouse Platform Accreditation 83

SQL Analyst Associate Accreditation 84

Welcome to the Databricks Academy

About the Databricks Academy

Our mission at the Databricks Academy is to help our customers achieve their big
data and analytics goals through engaging learning experiences. At Databricks,
professionals from a wide variety of disciplines come together and use modern
pedagogical techniques to develop training that showcases Databricks best
practices. We offer our customers a wide range of materials to meet their diverse
training needs - whether they want to study at home, participate in a traditional
classroom setting, or engage with other Databricks users in public online courses -
to grow professionally with cloud-native skills.

About this course catalog

This course catalog is broken into the following categories:

● Welcome to the Databricks Academy: information about the Databricks

Academy and the students we serve
● What’s new/being retired this quarter: a list of the recently released training
materials / materials being retired and removed from the Academy
● Databricks Academy offerings: an explanation of the types of learning
content we offer
● Course descriptions: short descriptions for each course available through
the Databricks Academy
What’s new this quarter
February 2022

Coming soon!

What’s being retired this quarter

February 2022

Coming soon!

Databricks Academy offerings

Training
Self-paced online courses - asynchronous virtual training available to individuals
through the Databricks Academy website. This training is free for Databricks
customers. Each course is typically 1-2 hours in length.

Workshops - live 1-3 hour trainings made available to groups, typically in a virtual
format. Please reach out to a CSE / Databricks Account manager to request a
Workshop.

Instructor-led trainings - one to two days of content offered over 2 or 4 half-days.

Available to everyone - customers and the public, for a fee. Delivered virtually.

Accreditations/Certifications - 30 minute unproctored quizzes to 2 hour

proctored exams
Credentials
Accreditations - low stakes credentials resulting from an unproctored
online exam administered through the Databricks Academy website. They
are earned after demonstrating mastery of technology areas at the
introductory level, and are in alignment with self-paced training.

Certifications - higher stakes credentials resulting from a proctored exam

administered through a testing vendor. They are earned after demonstrating
mastery of intermediate and advanced technical areas. They are in
alignment with instructor-led training, and are role-based. Unlike
accreditations, which are prepared for a general audience, certifications are
designed to align with data practitioner roles (for example, a data engineer
or a data analyst role).
Learning paths
The learning paths included below are designed to help guide users to the courses
most relevant to them.

Current pathways are available for business leaders, data analysts, data engineers,
data scientists, and platform administrators. The credential milestones for each step
within these pathways are shown in the image below.

Please note that we are actively making updates to the data analyst, data scientist,
and platform administration pathways. These updates will result in new certification
exams, as shown in the image below:
Below, you’ll find a breakdown of the courses required for each of these steps. We
will update these regularly, as new courses are released.

Business Leaders / Databricks Overview

Platform administration
Data analysis
Data science / machine learning

Data engineering
Instructor-led course descriptions
For a full list of available instructor-led courses, along with their descriptions, please click
here.

Self-paced course descriptions

Note: All self-paced courses are free for Databricks customers. Non-customers can
purchase some courses through role-based learning plans available via the Databricks
Academy.

Apache Spark Programming with

Databricks
Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Apache Spark Programming with

Databricks instructor-led course. It is an on-demand recording available via the
Databricks Academy and covers the same content as the instructor-led course. For
more information about what’s in the course itself, please visit this link.

Applications of SQL on Databricks

Click here for the customer enrollment link.

Duration: 1 hour

Course description: In the course Introduction to SQL on Databricks, we introduce

Spark and Spark SQL as a solution for using common SQL syntax when working with
structured or semi-structured data. In this course, you will use Spark SQL on
Databricks to practice common design patterns for efficiently creating new tables,
explore built-in functions that can help you explore, manipulate, and aggregate
nested data.
Prerequisites & Requirements

● Prerequisites
○ Basic SQL commands
○ Experience working with SQL in a Databricks notebook

Learning objectives

● Use optional arguments in CREATE TABLE to define data format and location
in a Databricks database
● Efficiently copy, modify and create new tables from existing ones
● Use built-in functions and features of Spark SQL to Manage and manipulate
nested objects
● Use roll-up, cube, and window functions to aggregate data and pivot tables

AWS Databricks Cloud Architecture and

System Integration Fundamentals
Click here for the customer enrollment link.

Duration: 1 hour

Course description: While the Databricks Unified Analytics Platform provides a

broad range of functionality to many members of data teams, it is through
integrations with other services that most cloud-native applications will achieve
results desired by customers. This course is a series of demos designed to help
students understand the portions of cloud workloads appropriate for Databricks.
Within these demos, we'll highlight integrations with first-party services in the AWS
cloud to build scalable and secure applications.

Prerequisites:

● Beginning knowledge of Spark programming (reading/writing data, batch and

streaming jobs, transformations and actions)
● Beginning-level experience using Python or Scala to perform basic control
flow operations.
● Familiarity with navigation and resource configuration in the AWS Console.

Learning objectives:
● Describe use cases for Databricks in an enterprise cloud architecture.
● Configure secure connections from Databricks to data in S3.
● Configure connections between Databricks and various first-party tools in an
enterprise cloud architecture, including Redshift and Kinesis.
● Deploy an MLflow model to a Sagemaker endpoint for serving online model
predictions.
● Configure Glue as an enterprise data catalog

AWS Databricks Cluster Usage

Management
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: In this course, you will first define computation resources
(clusters, jobs, and pools) and determine which resources to use for different
workloads. Then, you will learn cluster provisioning strategies for several use cases to
maximize usability and cost-effectiveness. You will also identify best practices for
cluster governance, including cluster policies. This course also covers capacity
limits, cost management, and chargeback analysis.

Prerequisites:

● Beginning experience using the Databricks workspace

● Beginning experience with Databricks administration

Learning objectives:

● Define computation resources (clusters, jobs, and pools) and determine

which resources to use for different workloads.
● Describe cluster provisioning strategies for several use cases to maximize
usability and cost effectiveness.
● Identify best practices for cluster governance, including cluster policies.
● Describe capacity limits on Azure Databricks.
● Describe how to manage costs and perform chargeback analysis.
AWS Databricks Data Access
Management
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will learn about the Databricks File System
and Hive Metastore concepts. Then, you will apply best practices to secure access
to Amazon S3 from Databricks. Next, you will configure access control for data
objects including tables, databases, views, and functions. You will also apply column
and row-level permissions and data masking with dynamic views for multiple users
and groups. Lastly, you will identify methods for data isolation within your
organization on Databricks.

Prerequisites:

● Beginning experience with AWS Databricks security, including deployment

architecture and encryptions
● Beginning experience with AWS Databricks administration, including identity
management and workspace access control
● Beginning experience using the Databricks workspace
● Databricks Premium Plan

Learning objectives:

● Describe fundamental concepts about the Databricks File System and Hive
Metastore.
● Apply best practices to secure access to Amazon S3 from Databricks.
● Configure access control for data objects including tables, databases, views,
and functions.
● Apply column and row-level permissions and data masking with dynamic
views for multiple users and groups.
● Identify methods for data isolation within your organization on Databricks.
AWS Databricks Identity Access
Management
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description:In this course, you will learn how to manage user accounts and
groups in the Admin Console. You will also learn how to manage token-based
authentication and settings for your workspace, such as workspace storage and
additional cluster configurations. Lastly, this course covers access control for
workspace objects, such as notebooks and folders, in addition to clusters, pools, and
jobs.

Prerequisites:

● Experience using a web browser.

● Note: To perform the tasks shown in this course, you will need a Databricks
workspace deployment with administrator rights.

Learning objectives:

● Manage user accounts and groups in the Admin Console.

● Generate and manage personal access tokens for authentication.
● Enable additional cluster configurations and purge deleted objects from
workspace storage.
● Configure access control for workspace objects, such as notebooks and
folders.
● Configure access control for clusters, pools, and jobs.

AWS Databricks Security Fundamentals

Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course provides an overview of essential security features

to consider when managing your AWS Databricks workspace. You will start by
identifying components of the Databricks platform architecture and deployment
models. Then, you will define several features regarding network security including
no public IPs, Bring Your Own VPC, VPC peering, and IP access lists. After recognizing
IdP integrations, you will explore access control configurations for different
workspace assets. You will then identify encryptions and permissions available for
data protection, such as IdP authentication, secrets, and table access control. Lastly,
you will describe security standards and configurations for compliance, including
cluster policies, Bring Your Own Key, and audit logs.

Prerequisites:

● Beginning-level knowledge of basic AWS cloud computing terms (ex. S3, VPC,
IAM, etc.)
● Beginning-level knowledge of basic Databricks concepts (ex. workspace,
clusters, notebooks, etc.)

Learning objectives:

● Describe components of the AWS Databricks platform architecture and

deployment model.
● Explain network security features including no public IP address, Bring Your
Own VPC, VPC peering, and IP access lists.
● Describe identity provider integrations and access control configurations for
an AWS Databricks workspace.
● Explain encryptions and permissions available for data protection, such as
identity provider authentication, secrets, and table access control.
● Describe security standards and configurations for compliance, including
cluster policies, Bring Your Own Key, and audit logs.

AWS Databricks SQL Administration

Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will learn how to set up and configure access
to the Databricks SQL Analytics user interface. The administrative tasks in this
course will be done using the Databricks Workspace and Databricks SQL Analytics
UI, and will not include instruction for API access. By the end of this course, you will
be able to set up computational resources for users, grant and revoke access to
specific data, manage users and groups, and set up alert destinations.

Prerequisites:

● Intermediate knowledge of Databricks

● Databricks account on the Premium plan (with SQL Analytics enabled)
● Administrator credentials to your organization’s Databricks Workspace

Learning objectives:

● Describe how Databricks SQL Analytics is used by data practitioners.

● Manage user and group access to Databricks SQL Analytics.
● Configure and monitor SQL Endpoints to maximize performance, control
costs, and track usage on Databricks SQL Analytics.
● Set up access to data storage through SQL endpoints or external data stores
in order for users to access data on Databricks SQL Analytics.
● Control user access to data objects (e.g. tables, databases, and views) by
programmatically setting privileges for specific users and/or groups on
Databricks SQL Analytics.
● Create and configure Databricks SQL Analytics alert destinations for users.

AWS Databricks Workspace Deployment

Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will walk you through setting up your Databricks
account including setting up billing, configuring your AWS account, and adding users
with appropriate permissions. At the end of this course, you'll find guidance and
resources for additional setup options and best practices.

Prerequisites:

● Experience using a web browser.

● Note: To follow along with this course, you will need access to a Databricks
account with Account Owner permissions.

Learning objectives:

● Access the Databricks account console and set up billing.

● Configure an AWS account using cross-account role or access keys.
● Configure AWS storage and deploy the Databricks workspace.
● Add users and assign admin or cluster creation rights.
● Identify resources for additional setup options and best practices.

Azure Databricks Cloud Architecture and

System Integration Fundamentals
Click here for the customer enrollment link.

Duration: 1 hour

Course description: While the Databricks Unified Analytics Platform provides a

broad range of functionality to many members of data teams, it is through
integrations with other services that most cloud-native applications will achieve
results desired by customers. This course is designed to help students understand
the portions of cloud workloads appropriate for Databricks, and highlight
integrations with first-party services in the Azure cloud to build scalable and secure
applications.

Prerequisites:

● Beginning knowledge of Spark programming (reading/writing data, batch and

Learning objectives:

● Describe use-cases for Azure Databricks in an enterprise cloud architecture.

● Configure secure connections to data in an Azure storage account.
● Configure connections from Databricks to various first-party tools, including
Synapse, Key Vault, Event Hubs, and CosmosDB.
● Configure Azure Data Factory to trigger production jobs on Databricks.
● Trigger CI/CD workloads on Databricks assets using Azure DevOps.
Azure Databricks Cluster Usage
Management
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will first define computation resources
(clusters, jobs, and pools) and determine which resources to use for different
workloads. Then, you will learn cluster provisioning strategies for several use cases to
maximize usability and cost effectiveness. You will also identify best practices for
cluster governance, including cluster policies. This course also covers capacity
limits, cost management, and chargeback analysis.

Prerequisites:

● Beginning experience with the Databricks workspace UI

● Beginning experience with Databricks administration

Learning objectives:

● Define computation resources (clusters, jobs, and pools) and determine

Azure Databricks Data Access

Management
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will learn about the Databricks File System
and Hive Metastore concepts. Then, you will apply best practices to secure access
to Azure data storage from Azure Databricks. Next, you will configure access control
for data objects including tables, databases, views, and functions. You will also apply
column and row-level permissions and data masking with dynamic views for
multiple users and groups. Lastly, you will identify methods for data isolation within
your organization on Azure Databricks.

Prerequisites:

● Beginning experience with Azure Databricks security, including deployment

architecture and encryptions
● Beginning experience with Azure Databricks administration, including identity
management and workspace access control
● Beginning experience using the Azure Databricks workspace
● Azure Databricks Premium Plan

Learning objectives:

● Describe the Databricks File System and Hive Metastore concepts.

● Apply best practices to secure access to Azure data storage from Azure
Databricks.
● Configure access control for data objects including tables, databases, views,
and functions.
● Apply column and row-level permissions and data masking with dynamic
views for multiple users and groups.
● Identify methods for data isolation within your organization on Azure
Databricks.

Azure Databricks Identity Access

Management
Click here for the customer enrollment link.

Duration: 45 minutes

Course description: In this course, you will learn how to manage user accounts and
groups in the Admin Console. You will also learn how to manage token-based
authentication and settings for your workspace, such as workspace storage and
additional cluster configurations. Lastly, this course covers access control for
workspace objects, such as notebooks and folders, in addition to clusters, pools, and
jobs.

Prerequisites:

● Beginning experience using the Databricks workspace.

Learning objectives:

● Manage user accounts and groups in the Admin Console.

Azure Databricks Security Fundamentals

Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: This course provides an overview of essential security features

to consider when managing your Azure Databricks workspace. You will start by
identifying components of the Azure Databricks platform architecture and
deployment model. Then, you will define several features regarding network security
including no public IPs, Bring Your Own VNET, VNET peering, and IP access lists. After
recognizing IdP and AAD integrations, you will explore access control configurations
for different workspace assets. You will then identify encryptions and permissions
available for data protection, such as IdP authentication, secrets, and table access
control. Lastly, you will describe security standards and configurations for
compliance, including cluster policies, Bring Your Own Key, and audit logs.

Prerequisites:

● Beginning-level knowledge of basic Azure cloud computing terms (ex. Blob

storage, ADLS, VNET, Azure Active Directory, etc.)
● Beginning-level knowledge of basic Databricks concepts (ex. workspace,
clusters, notebooks, etc.)
Learning objectives:

● Describe components of the Azure Databricks platform architecture and

deployment model.
● Explain network security features including no public IP address, Bring Your
Own VNET, VNET peering, and IP access lists.
● Describe identity provider and Azure Active Directory integrations and access
control configurations for an Azure Databricks workspace.
● Explain encryptions and permissions available for data protection, such as
identity provider authentication, secrets, and table access control.
● Describe security standards and configurations for compliance, including
cluster policies, Bring Your Own Key, and audit logs.

Azure Databricks SQL Administration

Click here for the customer enrollment link.

Duration: 1 hour

Prerequisites:

● Intermediate knowledge of Databricks

● Databricks account on the Premium plan (with SQL Analytics enabled)
● Administrator credentials to your organization’s Databricks Workspace

Learning objectives:

● Describe how Databricks SQL Analytics is used by data practitioners.

Azure Databricks Workspace Deployment

Click here for the customer enrollment link.

Duration: 10 minutes

Course description: In this course, you will identify the prerequisites for creating an
Azure Databricks workspace, deploy an Azure Databricks workspace in the Azure
portal, launch the workspace, and access the Admin Console.

Prerequisites:

● To complete the actions outlined in this course, you must have access to an
Azure subscription.

Learning objectives:

● Identify prerequisites for launching an Azure Databricks workspace.

● Deploy an Azure Databricks workspace in the Azure portal.
● Launch the deployed Azure Databricks workspace.
● Access the Admin Console in the deployed Azure Databricks workspace.

Basic SQL for Databricks SQL

Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course provides data analysts an introduction to SQL

queries on Databricks SQL. It focuses on subsetting tables, joining tables, and
aggregating data in tables. Throughout the course, learners will be answering
business intelligence questions in order to see how Databricks SQL can be used in
their day-to-day work.

Prerequisites & Requirements

● Prerequisites
○ Access to Databricks SQL
○ Access to a SQL endpoint setup by an administrator
○ Administrator (or administrator access) for initial course setup and final
tear down

Learning objectives

● Write basic SQL queries to subset gold-level tables using Databricks SQL
Queries
● Join multiple tables together to create a new table.
● Aggregate data columns using SQL functions to answer defined business
questions.

Certification Prep Course for the

Databricks Certified Associate Developer
for Apache Spark Exam
Click here for the customer enrollment link.

Duration: 2 hours

Course description: Prepare to take the Databricks Certified Associate Developer for
Apache Spark Exam. This course will cover the format and structure of the exam,
skills needed for the exam, tips for exam preparation, and the parts of the
DataFrame API and Spark architecture covered in the exam.

Prerequisites:

● Describe the basics of the Apache Spark architecture.

● Perform basic data transformations using the Apache Spark DataFrame API
using Python or Scala.
● Perform basic data input and output using the Apache Spark DataFrame API
using Python or Scala.
● Perform custom data actions using user-defined functions using Python or
Scala.
● Perform data transformations using Spark SQL.
● Note: While the above skills are not necessary for this course, the course will
be far more helpful in preparing students if they have these skills.

Learning objectives:

● Summarize the learning context behind the Databricks Certified Associate

Developer for Apache Spark exam.
● Describe the topics covered in the Databricks Certified Associate Developer
for Apache Spark exam.
● Describe the format and structure of the Databricks Certified Associate
Developer for Apache Spark exam.
● Apply practical test-taking strategies to answer example questions similar to
those of the Databricks Certified Associate Developer for Apache Spark
exam.
● Identify resources that can be used to learn the material covered in the
Databricks Certified Associate Developer for Apache Spark exam.

Configuring Workspace Access Control

Lists (ACLs)
Click here for the customer enrollment link.

Duration: 1 hour

Course description: Databricks has extensive access control lists (ACLs) for
workspace assets to help administrators restrict and grant access to appropriate
users. This course includes a set of instructions and caveats for configuring many of
these settings, as well as a video walkthrough showing this configuration and the
resultant user experience.

Prerequisites:

● Basic knowledge of the Databricks workspace

Learning objectives:

● Manage permissions for groups of users.

● Control access to notebooks and folders.
● Restrict access for cluster creation and editing.
● Change ownership of configured jobs.

Data Engineering with Databricks

Click here for the customer enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Data Engineering with Databricks

Data Science on Databricks Rapidstart

Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course will provide an overview of the features and
functionality within the Unified Data Analytics Platform that enable data
practitioners to follow data science and machine learning workflows. Aside from an
overview of features and functionality, the course will provide learners with
hands-on experience using the Unified Data Analytics Platform to execute basic
tasks and solve a real-world problem.

Prerequisites:

● Beginning experience with Python as applied to data science and analysis.

● Beginning experience with popular data science tools such as Pandas,
charting libraries
● Beginning experience working with notebooks (not necessarily Databricks
notebooks)

Learning objectives:

● Summarize Databricks functionality that enables data practitioners to work

with data through the data science workflow.
● Summarize Databricks functionality that enables data practitioners to run
machine learning experiments on data.
● Solve a given real-world problem by executing basic data science tasks in the
Unified Data Analytics Platform.

Data Science on Databricks: The

Bias-Variance Tradeoff
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, we’ll show you how to use scikit-learn on
Databricks, along with some core statistical and data science principles, to select a
family of machine learning models for deployment.

This course is the first in a series of three courses developed to show you how to
use Databricks to work with a single data set from experimentation to
production-scale machine learning model deployment. The other courses in this
series include:

● Tracking Experiments with MLflow

● Deploying a Machine Learning Project with MLflow Project

Prerequisites:

● Beginning-level experience running data science workflows in the Databricks

Workspace
● Beginner-level experience with Apache Spark
● Intermediate-level experience with the Scipy Numerical Stack

Learning objectives:

● Create and explore an aggregate sample from user event data.

● Design an MLflow experiment to estimate model bias and variance.
● Use exploratory data analysis and estimated model bias and variance to
select a family of models for model development.
Data Visualization with Databricks SQL
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will learn how to use Databricks SQL, an
integrated SQL editing and dashboarding tool, from your Databricks workspace.
Working with Databricks SQL allows you to easily query your data lake, or other data
sources, and build dashboards that can be easily shared across your organization.
You will learn how to parameterize queries so that users can easily modify
dashboard views to target specific results. Also, we will make use of alerts for
ongoing monitoring so that you can be notified when certain events occur or when
particular attributes of a data set reach a certain threshold.

Prerequisites:

● Access to the Databricks SQL interface

● Intermediate experience using the Databricks platform
● Intermediate experience with SQL
● Intermediate experience with data analysis concepts

Learning objectives:

● Describe how you can use SQL from your Databricks workspace.
● Execute queries and create visualizations using Databricks SQL.
● Write parameterized queries so that users can easily customize their results
and visualizations.
● Create and share dashboards that hold a collection of visualizations.

Databases, Tables, and Views on

Databricks
Click here for the customer enrollment link.

Duration: 35 minutes

Course description: In this short course, you’ll learn how to create databases, tables,
and views on Databricks. Special attention will be given to differences in scope and
persistence for these various entities, allowing any user responsible for creating or
managing databases, tables, or views to make informed decisions for their
organization. While the syntax for creating and working with databases, tables, and
views will be familiar to most SQL users, some default behaviors may surprise users
new to Databricks.

Prerequisites:

● Beginning knowledge of SQL

● Beginning knowledge of loading and interacting with sample data from
Databricks.
● Beginning knowledge of using Databricks notebooks

Learning objectives:

● Describe persistence and scope of databases, tables, and views on

Databricks.
● Compare and contrast the behavior of managed and unmanaged tables.
● Summarize best practices for creating and managing databases, tables, and
views on Databricks.

Databricks Command Line Interface (CLI)

Fundamentals
Click here for the customer enrollment link.

Duration: 45 minutes

Course description: While the Databricks platform web-based graphical user

interface provides powerful functionality for data teams, many use cases call for
programmatic command line access. The Databricks command line interface (CLI)
provides access to a variety of powerful workspace features. This module is not
intended as a comprehensive overview of all the CLI can do, but rather an
introduction to some of the common features users may desire to leverage in their
workloads.

Prerequisites:

● Familiarity with Apache Spark concepts

● Familiarity with the data engineering capabilities of the Databricks Platform
● Intermediate experience using the Databricks platform for data engineering
(creating clusters, loading notebooks, scheduling jobs, etc.)

Learning objectives:

● Install and configure the Databricks CLI to securely interact with the
Databricks Workspace.
● Configure workspace secrets using the CLI for more secure sharing and use of
string-based credentials in notebooks.
● Sync notebooks and libraries between the Databricks workspace and other
environments using the CLI.
● Perform a variety of tasks including interacting with clusters, jobs, and runs
using the CLI.

Databricks Datadog Integration

Click here for the customer enrollment link.

Duration: 1 hour

Course description: Datadog provides customizable integration scripts and

dashboards to integrate your Databricks logs into your larger monitoring ecosystem.
This lesson goes through basic configuration, as well as extending this configuration
to add additional security and custom tagging.

Prerequisites:

● Basic familiarity with the Databricks workspace

● Basic familiarity with cluster configuration

Learning objectives:

● Configure both ends of the Databricks Datadog integration.

● Add custom variables to your monitored clusters.
● Use Databricks secrets to redact API tokens.
Databricks on Google Cloud: Architecture
and Security Fundamentals
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: This course dives into the platform architecture and key
security features of Databricks on Google Cloud. You will start with an overview of
Databricks on Google Cloud and how it integrates with the Google Cloud ecosystem.
Then, you will define core components of the platform architecture and deployment
model on Databricks on Google Cloud. You will also learn about key security features
to consider when provisioning and managing workspaces, as well as guidelines on
network security, identity and access management, and data protection.

Prerequisites & Requirements

● Prerequisites
○ Basic familiarity with Databricks concepts (workspace, notebooks,
clusters, DBFS, etc)
○ Basic familiarity with Google Cloud concepts (projects, IAM, GCS, VPC,
subnets, GKE, etc)

Learning objectives

● Describe how Databricks integrates with the Google Cloud ecosystem.

● Identify components of the Databricks on Google Cloud platform architecture
and deployment model.
● Recognize best practices for network security when deploying workspaces.
● Describe identity management and access control features in Databricks on
Google Cloud.
● Identify storage locations and data protection features in Databricks on
Google Cloud.
Databricks on Google Cloud: Cloud
Architecture and System Integration
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: This course is a series of demos designed to help students

understand the portions of cloud workloads appropriate for Databricks. Within these
demos, we'll highlight integrations with first-party services in Google Cloud to build
scalable and secure applications.

Prerequisites & Requirements

● Prerequisites
○ Familiarity with the Databricks on Google Cloud workspace
○ Beginning knowledge of Spark programming (reading/writing data,
batch and streaming jobs, transformations and actions)
○ Beginning-level experience using Python or Scala to perform basic
control flow operations.
○ Familiarity with navigation and resource configuration in the Databricks
on Google Cloud Console.

Learning objectives

● Describe where Databricks fits into a cloud-based architecture on Google

Cloud.
● Authenticate to Google Cloud resources with service account credentials.
● Read and write data to Cloud Storage using Databricks secrets.
● Mount a GCS bucket to DBFS using cluster-wide service accounts.
● Configure a cluster to read and write data to BigQuery using credentials in
DBFS.

Databricks on Google Cloud: Cluster

Usage Management
Click here for the customer enrollment link.
Duration: 30 minutes

Course description: This course covers essential cluster configuration features and
provisioning guidelines for Databricks on Google Cloud. In this course, you will start
by defining core computation resources (clusters, jobs, and pools) and determine
which resources to use for different workloads. Then, you will learn cluster
provisioning strategies for several use cases to maximize manageability. Lastly, you
will learn how to manage cluster usage and cost for your Databricks on Google Cloud
workspaces.

Prerequisites & Requirements

● Prerequisites
○ Beginning experience using the Databricks workspace
○ Beginning experience with Databricks administration

Learning objectives

● Describe the core computation resources in Databricks, clusters, jobs, and

pools.
● Recognize best practices for configuring cluster resources for different
workloads.
● Identify cluster provisioning use cases and strategies for manageability.
● Describe how to manage cluster usage and cost for Databricks on Google
Cloud.

Databricks on Google Cloud: Workspace

Deployment
Click here for the customer enrollment link.

Duration: 20 minutes

Course description: This is a short course that shows new customers how to set up
a Databricks account and deploy a workspace on Google Cloud. This will cover
accessing the Account Console and adding account admins, provisioning and
accessing workspaces, and adding users and admins to a workspace.

Prerequisites & Requirements

● Prerequisites
○ Basic familiarity with Databricks concepts (Databricks account,
workspace, DBFS, etc)
○ Basic familiarity with Google Cloud concepts (Cloud console, project,
GCS, IAM, VPC, etc)

Learning objectives

● Access the Databricks Account Console.

● Add account admins in the Account Console.
● Provision and access a Databricks workspace.
● Access the Admin Console for a Databricks workspace.
● Add workspace users and admins in the Admin Console.

Databricks with R
Click here for the customer enrollment link.

Duration: 7 hours

Course description: In this seven-hour course, you will analyze clickstream data from
an imaginary mattress retailer called Bedbricks. In this case study, you'll explore the
fundamentals of Spark Programming with R on Databricks, including Spark
architecture, the DataFrame API, and Machine Learning.

Prerequisites & Requirements

● Prerequisites
○ Beginning experience working with R.

Learning objectives

● Identify core features of Spark and Databricks.

● Describe how DataFrames are created and evaluated in Spark.
● Apply the DataFrame transformation API to process and analyze data.

Delta Lake Rapid Start with Python

Click here for the customer enrollment link.
Duration: 2 hours

Course description: Apache Spark™ is the dominant processing framework for big
data. Delta Lake is a robust storage solution designed specifically to work with
Apache SparkTM. It adds reliability to Spark so your analytics and machine learning
initiatives have ready access to quality, reliable data. Delta Lake makes data lakes
easier to work with and more robust. It is designed to address many of the problems
commonly found with data lakes. This course covers the basics of working with
Delta Lake, specifically with Python, on Databricks.

Prerequisites:

● Beginning level experience using Databricks to upload and visualize data

● Intermediate level experience using Apache Spark including the CTAS pattern
and use of popular pyspark.sql functions
● Beginning level knowledge of Delta Lake

Learning objectives:

● Use Delta Lake to create a new Delta table.

● Convert an existing Parquet-based data lake table into a Delta table.
● Differentiate between a batch update and an upsert to a Delta table.
● Use Delta Lake Time Travel to view different versions of a Delta table.
● Execute a MERGE command to upsert data into a Delta table.

Delta Lake Rapid Start with Spark SQL

Click here for the customer enrollment link.

Duration: 2 hours

Course description: Apache Spark™ is the dominant processing framework for big
data. Delta Lake is a robust storage solution designed specifically to work with
Apache Spark™. It adds reliability to Spark so your analytics and machine learning
initiatives have ready access to quality, reliable data. Delta Lake makes data lakes
easier to work with and more robust. It is designed to address many of the problems
commonly found with data lakes. This course covers the basics of working with
Delta Lake, specifically with Spark SQL, on Databricks.

Prerequisites:
● How to upload data into a Databricks Workspace
● How to visualize data using Databricks
● Intermediate level Spark SQL usage including the CTAS pattern, use of Spark
SQL functions such as from_unixtime, lag, lead, and partitioning.

Learning objectives:

● Use Delta Lake to create a new Delta table and to convert an existing
Parquet-based data lake table
● Differentiate between a batch append and an upsert to a Delta table
● Use Delta Lake Time Travel to view different versions of a Delta tables
● Execute a MERGE command to upsert data into a Delta table

Deploying a Machine Learning Project with

MLflow Projects
Click here for the customer enrollment link.

Duration: 2 hours

Course description: In this course, we’ll show you how to train and deploy a large
scale machine learning model using MLflow and Apache Spark. This course is the
third in a series of three courses developed to show you how to use Databricks to
work with a single data set from experimentation to production-scale machine
learning model deployment. We recommend taking the first two courses in this
series before continuing with this course:

● Building and Deploying Machine Learning Models: The Bias-Variance Tradeoff

● Tracking Experiments with MLflow

Prerequisites:

● Beginning-level experience running data science workflows in the Databricks

Workspace
● Beginner-level experience with Apache Spark
● Intermediate-level experience with the Scipy Numerical Stack
● Intermediate-level experience with the command line

Learning objectives:
● Summarize Databricks best practices for deploying machine learning projects
with MLflow.
● Explain local development strategies for writing software with Databricks.
● Use Databricks to write production-grade machine learning software.

Easy ETL with Auto Loader

Click here for the customer enrollment link.

Duration: 1 hour

Course description: Databricks Auto Loader is the preferred method for ingesting
incremental data landing in cloud object storage into your Lakehouse. This course
introduces Auto Loader and demonstrates some of the newer features added to this
product. Included are recommended patterns for data ingestion with Auto Loader.

Prerequisites:

● Basic experience with Spark APIs

● Basic knowledge of Delta Lake
● Basic experience with Structured Streaming

Learning objectives:

● Describe the basic functionality and features of Auto Loader.

● Use Auto Loader to ingest data to Delta Lake without losing data.
● Configure automatic schema detection and evolution.
● Rescue unexpected data arriving in well-structured datasets.

ELT with Spark SQL

Click here for the customer enrollment link.

Duration: 2.5 hours

Course description: This course teaches experienced SQL analysts and engineers
how to complete common ELT tasks using Spark SQL on Databricks. Students will
extract data from multiple data sources, load data into Delta Lake tables, and apply
data quality checks and transformations. Students will also learn how to leverage
existing tables in a Lakehouse for last-mile ETL to support dashboards and
reporting.

Prerequisites:

● Students should be able to navigate the Databricks workspace (creating and

loading notebooks, connecting to clusters)
● Students should have intermediate fluency in SQL
● Students should be familiar with relational entities on Databricks
● Students should be familiar with the high-level architecture of the Lakehouse

Learning objectives:

● Extract data from a variety of common data sources using Spark SQL in the
Databricks Data Science and Engineering workspace
● Load data into Delta Lake tables using the Databricks Data Science and
Engineering workspace
● Apply transformations to complete common cleaning tasks and data quality
checks using the Databricks Data Science and Engineering workspace
● Reshape datasets with advanced functions to derive analytical insights using
the Databricks Data Science and Engineering workspace

Enterprise Architecture with Databricks

Click here for the customer enrollment link.

Duration: 7 hours

Course description: In this course you’ll learn about how business leaders, admins,
and architects use Databricks in their architecture . We’ll cover fundamental
concepts about key players: Data Engineers, Data Scientists, Platform Administrator;
raw data forms: structured and unstructured data, batch and streaming data, to help
set the stage for our discussion on how end users help businesses create data
assets like machine learning models, reports, and dashboards. Then, we’ll discuss
where components of Databricks Azure fit into an organization’s big data ecosystem.
Finally, we’ll review real-world business use cases and create enterprise level
architecture infrastructure diagrams.

Prerequisites:
● Beginning knowledge about characteristics that define big data (3 of the Vs of
big data - velocity, volume, variety)
● Beginning knowledge about how organizations process and manage big data
(Relational/SQL vs NoSQL, cloud vs. on-premise, open-source database vs.
closed-source database as a service)
● Beginning knowledge about the roles that data practitioners play on data
science teams (can distinguish between database administrators and data
scientists, data analysts and machine learning engineers, data engineers and
platform administrators)

Learning objectives:

● Create a requirements document which profiles the data needs of an

organization.
● Translate business needs related to data analytics into technical
requirements used for drawing an architectural diagram.
● Translate the Databricks Lakehouse Architecture with Delta to a technical
requirements document.
● Design Azure Databricks architectures that includes integration with Azure
services, for real-world scenarios.
● Evaluate, analyze, and validate detailed infrastructure designs.
● Create infrastructure designs.

Getting Started with Databricks Data

Science & Engineering Workspace
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: The Databricks Data Science and Engineering Workspace

(Workspace) provides a collaborative analytics platform to help data practitioners
get the most out of Databricks when it comes to data science and engineering
tasks. This course guides practitioners through fundamental Workspace concepts
and components necessary to achieve a basic development workflow.

Prerequisites & Requirements

● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
(high-level knowledge the structure and benefits of the Lakehouse
platform)
○ Intermediate-level knowledge of Python (good understanding of the
language as well as ability to read and write code)
○ Beginning-level knowledge of SQL (ability to understand and construct
basic queries)

Learning objectives

● Describe the Databricks architecture and the services it provides.

● Navigate the Databricks Data Science and Engineering Workspace.
● Create and manage Databricks clusters for running code.
● Manage data using the Databricks File System and Delta Lake.
● Create and run Databricks Notebooks.
● Schedule non-interactive execution of Databricks Notebooks using
Databricks Jobs.
● Integrate a hosted Git service for revision control using Databricks Repos.

Getting Started with Databricks Machine

Learning
Click here for the customer enrollment link.

Duration: 1 hour

Course description: Databricks Machine Learning offers data scientists and other
machine learning practitioners a platform for completing and managing the
end-to-end machine learning lifecycle. This course guides practitioners through a
basic machine learning workflow using Databricks Machine Learning. Along the way,
students will learn how each of Databricks Machine Learning’s features better enable
data scientists and machine learning engineers to complete their work effectively
and efficiently.

Prerequisites & Requirements

● Prerequisites
○ Beginning-level knowledge of the Databricks Lakehouse platform
○ Intermediate-level knowledge of Python
○ Intermediate-level knowledge of machine learning workflows

Learning objectives

● Describe a basic overview of Databricks Machine Learning.

● Create a feature table for downstream modeling using Feature Store.
● Automatically develop a baseline model using AutoML.
● Manage the model lifecycle using Model Registry.
● Perform batch inference using the registered model and feature table.
● Schedule a monthly model refresh using Databricks Jobs and AutoML.

Getting Started with Databricks SQL

Click here for the customer enrollment link.

Duration: 1 hour

Course description: This course is an introductory course for SQL analysts that
demonstrates the entire data analysis process on Databricks SQL, from introducing
the Databricks SQL workspace (Workspace) to creating a dashboard. The course will
focus on what analysts can do, as opposed to what administrators can do, and it will
use the Workspace without administrator permissions.

Prerequisites & Requirements

● Prerequisites
○ Beginning knowledge of SQL
○ Access to Databricks SQL
○ Access to an empty database set up by an administrator
○ Access to a SQL endpoint set up by an administrator

Learning objectives

● Describe the basics of the Databricks SQL service.

● Describe the benefits of using Databricks SQL to perform data analyses.
● Describe how to complete a basic query, visualization, and dashboard
workflow using Databricks SQL.
Google Cloud Fundamentals
Click here for the customer enrollment link.

Duration: 1.5 hours

Course description: Learn the basics of Google Cloud and how to configure various
resources using the Cloud Console. This course begins with an overview of the
platform, key terminology, and core services. You will then learn essential IAM
concepts and how service accounts can be used to manage resources. You will also
learn about the function and use cases of several storage services, such as Cloud
Storage, Cloud SQL, and BigQuery. This course also covers virtual machine and
networking concepts in Compute Engine and VPC services. The course ends with an
overview of GKE clusters and Kubernetes concepts.

Prerequisites:

● Familiarity with basic cloud computing concepts (cloud computing, cloud

storage, virtual machine, database, data warehouse)

Learning objectives:

● Define basic concepts and core services in the Google Cloud Platform.
● Describe IAM concepts and how service accounts can be used to manage
resources.
● Identify use cases for storage services, such as Cloud Storage, Cloud SQL,
and BigQuery.
● Define virtual machine and networking concepts in Compute Engine and VPC
services.
● Describe Google Kubernetes Engine and the core components of Kubernetes
clusters.

How to Ingest Data for Databricks SQL

Click here for the customer enrollment link.

Duration: .5 hour
Course description: Before an analyst can analyze data, that data needs to be
ingested into the Lakehouse. This course shows three different ways to ingest data: 1.
Using the Data Science & Engineering UI, 2. Using SQL, and 3. Using Partner Connect.

Prerequisites & Requirements

● Prerequisites
○ Intermediate knowledge of Databricks SQL
○ Administrator privileges

Learning objectives

● Upload data using the Data Science & Engineering UI

● Import data using Databricks SQL
● Provide proper data access privileges to users
● Import data using Partner Connect

Introduction to Apache Spark

Architecture
Click here for the customer enrollment link.

Duration: 1 hour

Course description: In this course, you will explore how Apache Spark executes a
series of queries. Examples will include simple narrow transformations and more
complex wide transformations.

This course will give developers a working understanding of how to write code that
leverages the power of Apache Spark for even the simplest of queries.

Prerequisites:

● Familiarity with basic information about Apache Spark (what it is, what it is
used for)

Learning objectives:

● Explain how Apache Spark applications are divided into jobs, stages, and
tasks.
● Explain the major components of Apache Spark's distributed architecture.

Introduction to Applied Linear Models

Click here for the customer enrollment link.

Duration: 1 hour

Course description: Linear modeling is a popular starting point for machine learning
studies for a number of reasons. Generally, these models are relatively easy to
interpret and explain, and they can be applied to a broad range of problems. In this
course, you will learn how to choose, apply, and evaluate commonly used linear
modeling techniques. As you work through the course, you can put your new skills to
practice in 5 hands-on labs.

Prerequisites & Requirements

● Prerequisites
○ Intermediate experience with machine learning (experience using
machine learning and data science libraries like scikit-learn and
Pandas, knowledge of linear models).
○ Intermediate experience using the Databricks Workspace to perform
data analysis (using Spark DataFrames, Databricks notebooks, etc.).
○ Beginning experience with statistical concepts commonly used in data
science.

Learning objectives

● Describe and evaluate linear regression for regression problems.

● Describe how to ensure machine learning models generalize to out-of-sample
data.
● Describe and evaluate logistic regression for classification problems.
● Practice using linear modeling techniques using the Databricks Data Science
Workspace.

Introduction to Applied Statistics

Click here for the customer enrollment link.
Duration: 1 hour

Course description: In this course you’ll learn, both in theory and in practice, about
statistical techniques that are fundamental to many data science projects.
Throughout the course, videos will guide you through the conceptual information
you need to know about these statistical concepts, and hands-on lab activities will
give you the chance to apply the concepts you learn using the Databricks
Workspace. This course is divided into three modules: Introduction to Statistics and
Probability, Probability Distributions, and Applying Statistics to Learn from Data.

Prerequisites & Requirements

● Prerequisites
○ Beginning experience using the Databricks Data Science Workspace
(familiarity with Spark SQL, experience importing files into the
Databricks Data Science Workspace)
○ Beginning experience using Python (ability to follow guided use of the
SciPy library)

Learning objectives

● Contrast descriptive statistics and inferential statistics.

● Explain fundamental concepts behind discrete probability.
● Compare and contrast discrete and continuous probability distributions.
● Explain how discrete and continuous probability distributions can be used to
model data.
● Apply hypothesis testing techniques to learn from data.

Introduction to Applied Tree-Based

Models
Click here for the customer enrollment link.

Duration: 3 hours

Course description: In this course, you’ll learn how to solve complex supervised
learning problems using tree-based models. First, we’ll explain how decision trees
can be used to identify complex relationships in data. Then, we’ll show you how to
develop a random forest model to build upon decision trees and improve model
generalization. Finally, we’ll introduce you to various techniques that you can use to
account for class imbalances in a dataset. Throughout the course, you’ll have the
opportunity to practice concepts learned in hands-on labs.

Prerequisites:

● Intermediate level knowledge about machine learning/machine learning

workflows (feature engineering and selection, applying tree-based models,
etc.)
● We recommend that you take the following courses prior to taking this
course: Fundamentals of Machine Learning, Introduction to Feature
Engineering and Selection with Databricks, Applied Unsupervised Learning
with Databricks.

Learning objectives:

● Describe how decision trees are used to solve supervised learning problems.
● Identify complex relationships in data using decision trees.
● Develop a random forest model to build upon decision trees and improve
model generalization.
● Employ common techniques to account for class imbalances in a dataset.

Introduction to Applied Unsupervised

Learning
Click here for the customer enrollment link.

Duration: 3 hours

Course description: In this course, we will describe and demonstrate how to learn
from data using unsupervised learning techniques during exploratory data analysis.
The course is divided into two sections – one of which will focus on K-means
clustering and the other will describe principal components analysis, commonly
referred to as PCA. Each section includes demonstrations of important concepts, a
quiz to solidify your understanding, and a lab to practice your skills.

Prerequisites:
● Intermediate experience with machine learning (experience using machine
learning and data science libraries like scikit-learn and Pandas, knowledge of
linear models)
● Intermediate experience using the Databricks Workspace to perform data
analysis (using Spark DataFrames, Databricks notebooks, etc.)
● Beginning experience with machine learning concepts.

Learning objectives:

● Identify relationships between data records using K-means clustering.

● Identify patterns in a high-dimensional feature space using principal
components analysis.
● Learn from data using unsupervised learning techniques during exploratory
data analysis.

Introduction to Cloning with Delta Lake

Click here for the customer enrollment link.

Duration: 30 minutes

Course description: The addition of clone to Delta Lake empowers data engineers
and administrators to easily replicate data stored in the Lakehouse. Organizations
can use deep clone to archive versions of their production tables for regulatory
compliance. Developers can easily create development datasets isolated from
production data with shallow clone. In this course, you’ll learn the basics of cloning
with Delta Lake and get hands-on experience working with the syntax.

Prerequisites:

● Hands-on experience working with Delta Lake

● Intermediate experience with Spark and Databricks

Learning objectives:

● Describe the basic execution of deep and shallow clones with Delta Lake.
● Use deep clones to create full incremental backups of tables.
● Use shallow clones to create development datasets.
● Describe strengths, limitations, and caveats of each type of clone.
Introduction to Databricks Connect
Click here for the customer enrollment link.

Duration: 40 minutes

Course description: In this course, participants will be introduced to DB Connect

through various presentations and demos. Participants will start by contrasting how
DB Connect works to other development patterns. Then we will explore the
simplicity by which DB Connect is installed and configured. And then we will
conclude with a real-time demonstration of an application running on a developer’s
local machine while executing its Spark jobs against a cluster in the Databricks
workspace.

Prerequisites:

● Intermediate experience using the Databricks Workspace

Learning objectives:

● Explain how Databricks Connect is used by data practitioners working with

Databricks.
● Install and configure Databricks Connect.

Introduction to Databricks Repos

Click here for the customer enrollment link.

Duration: 30 minutes

Course description: Repos aims to make Databricks simple to use by giving data
scientists and engineers the familiar tools of git repositories and file systems. These
tools enable a more laptop-like developer experience for customers. Repos is the
new, top-level, customer-facing feature that packages these tools together in the
Databricks user interface. This course teaches how to get started with Repos.

Prerequisites:

● Familiarity with Git and Git commands

● Familiarity with Databricks workspaces
Learning objectives:

● Describe the motivations for Databricks Repos.

● Configure workspace integration with Github.
● Sync local and remote notebook changes using Repos

Introduction to Delta Lake

Click here for the customer enrollment link.

Duration: 1 hour, 15 minutes

Course description: Delta Lake is a powerful tool created by Databricks. Delta Lake
is an open, reliable, performant and secure data storage and management layer for
your data lake that enables you to create a true single source of truth. Since it is
built upon Apache Spark, you’re able to build high performance data pipelines to
clean your data from raw ingestion to business level aggregates. Finally, given the
open format - it allows you to avoid unnecessary replication and proprietary lock-in.

Ultimately - it provides the reliability, performance, and security you need to serve
your downstream data use cases.

Prerequisites:

● Intermediate SQL skills (e.g. can do CRUD statements in SQL)

● Beginner experience with working on Databricks in the Data Science &
Engineering workspace or the Machine Learning workspace (e.g. can import
DBC files, can access workspaces). Also note: although this course relies
heavily on SQL as a language, this is not intended for learners who primarily
use the Databricks SQL workspace products.
● Beginner experience with working with data pipelines is helpful

Learning objectives:

● Describe the basic features and technical implementation of Delta Lake.

● Ingest data and manage Delta tables to keep data complete, up-to-date, and
organized.
● Optimize Delta performance using common strategies.
Introduction to Delta Live Tables
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: Delta Live Tables enables data teams to innovate rapidly with
simple development, using declarative tools to build and manage batch or streaming
data pipelines. Built-in quality controls and data quality monitoring ensure accurate
and useful BI, Data Science, and ML built on top of quality data. Delta Live Tables is
designed to scale with rapidly growing companies and provides clear observability
into pipeline operations and automatic error handling. This course will cover the
basics of this new product, including syntax, configuration, and deployment.

Prerequisites:

● Beginner experience working with PySpark or Spark SQL

● Basic familiarity with the Databricks workspace

Learning objectives:

● Describe the motivations for Delta Live Tables.

● Use PySpark or SQL syntax to declare Delta Live Tables.
● Schedule and deploy pipelines with the Databricks UI.
● Review pipeline logs and metrics.

Introduction to Feature Engineering and

Selection with Databricks
Click here for the customer enrollment link.

Duration: 2.5 hours

Course description: As data practitioners work on supervised machine learning

solutions, they often need to manipulate data to ensure that it is compatible with
machine learning algorithm requirements and the model is meeting its objective.
This process is known as feature engineering, and the end result is to improve the
output of machine learning solutions. Once features are engineered, data
practitioners also commonly need to determine the best way to select the best
features to use in their machine learning projects.

In this course, you’ll learn how to perform both of these tasks. This course is divided
into two modules - in the first, you’ll explore feature engineering. In the second, you’ll
explore feature selection. Both modules will start with an introduction to these
topics - what they are and why they’re used. Then, you’ll review techniques that help
data practitioners perform these tasks. Finally, you’ll have the chance to perform two
hands-on lab activities - one where you will engineer features and another where
you will select features for a fictional machine learning scenario.

Prerequisites:

● Intermediate experience with machine learning (experience using machine

learning and data science libraries like scikit-learn and Pandas, knowledge of
linear models)
● Intermediate experience using the Databricks Workspace to perform data
analysis (using Spark DataFrames, Databricks notebooks, etc.)
● Beginning experience with statistical concepts commonly used in data
science

Learning objectives:

● Explain popular feature engineering techniques used to improve supervised

machine learning solutions.
● Explain popular feature selection techniques used to improve supervised
machine learning solutions.
● Engineer meaningful features for use in a supervised machine learning project
using the Databricks Data Science Workspace.
● Select meaningful features for use in a supervised machine learning project
using the Databricks Data Science Workspace.

Introduction to Files in Databricks Repos

Click here for the customer enrollment link.

Duration: 30 minutes

Course description: This course teaches how to add non-notebook files to

Databricks Repos. Learners will connect a Databricks workspace to a hosted Git
repository. Next, they will import and store non-DBC and non-notebook files using
Databricks Repos. Then, they will import a markdown file and sync changes between
a Databricks Repo and a Git provider.

Prerequisites:

● Familiarity with Git and Git commands

● Familiarity with Databricks workspaces
● Completion of of the Introduction to Databricks Repos course

Learning objectives:

● Connect a Databricks workspace to a hosted Git repository using Databricks

Repos
● Import and store non-DBC and non-notebook files using already-configured
Databricks Repos with a Git provider
● Import a markdown file imported into workspace
● Sync changes within Databricks to a Git provider

Introduction to Hyperparameter
Optimization
Click here for the customer enrollment link.

Duration: 2 hours

Course description: In this course, you’ll learn how to apply hyperparameter tuning
strategies to optimize machine learning models for unseen data. First, you’ll work
within a balanced binary classification problem setting where you’ll use random
forest to predict the correct class. You’ll learn to tune the hyperparameters of a
random forest to improve a model. Then, you’ll again work with a binary classification
problem using random forest and a technique known as cross-validation to
generalize a model.

Prerequisites:

● Intermediate level knowledge about machine learning/machine learning

Learning objectives:

● Explain common machine learning techniques that are used to optimize

machine learning models for unseen data.
● Apply machine learning techniques to improve the fit of machine learning
models.
● Apply machine learning techniques to improve the generalization of machine
learning models.

Introduction to Jobs
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: Databricks Jobs allow users to run applications in a

non-interactive way on a cluster. Jobs allow users to manage and orchestrate
production tasks, making it simple to promote notebooks from interactive
development to scheduled workloads. In this course, you’ll explore various features
of the Jobs UI as you orchestrate a simple pipeline.

Prerequisites:

● Intermediate knowledge of Python or SQL

● Beginning knowledge of software development principles (e.g. code
modularity, code scheduling, code orchestration)
● Beginning knowledge of navigating Databricks UI

Learning objectives:

● Describe jobs and motivations for using jobs in the workflow of data
practitioners.
● Create single task jobs with a scheduled trigger.
● Orchestrate multiple notebook tasks with the Jobs UI.
● Discuss common use cases and patterns for Jobs.
Introduction to MLflow Model Registry
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: This course will introduce you to MLflow Model Registry. Model
Registry is a centralized model management tool that allows you to track metrics,
parameters, and artifacts as part of experiments, package models and reproducible
ML projects, and deploy models to batch or real-time serving platforms. You will
learn how your team can use Model Registry as a central place to share ML models,
collaborate on moving them from experimentation to testing and production, and
implement approval and governance workflows.

Prerequisites:

● Beginner-level experience with machine learning.

● Beginner-level experience with MLflow Model Tracking.
● Beginner-level experience with Python.
● Beginner-level experience with Apache Spark on Databricks.

Learning objectives:

● Describe the components and functionalities of Model Registry.

● Explain the benefits of using Model Registry for machine learning model
management.
● Describe how Model Registry fits into the ML lifecycle with Databricks
Machine Learning.
● Demonstrate how to use Model Registry to perform essential tasks in the ML
workflow.

Introduction to MLflow Tracking

Click here for the customer enrollment link.

Duration: 1 hour

Course description: MLflow is an open-source platform for managing the

end-to-end machine learning lifecycle. In this course, we’re going to explore one of
the four primary functions of MLflow: tracking. Tracking in MLflow is an API and UI for
logging parameters, code versions, metrics, and output files when running your
machine learning code and for later visualizing the results. MLflow Tracking lets you
log and query experiments using Python, REST, R, and Java APIs.

Prerequisites & Requirements

● Prerequisites
○ Experience developing machine learning models in SciKit-Learn
○ Experience and comfortability using Python and Data Science related
libraries (e.g. writing functions, using attributes and methods,
instantiating classes, basic file I/O with Pandas)
○ Comfortability with building classification and regression models in
SciKit-Learn

Learning objectives

● Describe the basics of Databricks-managed MLflow Tracking.

● Identify the best run using the MLflow Experiments UI and the Tracking UI.
● Identify the best run using the MLflow Client API.
● Manually and automatically log metrics, artifacts, and models in an MLflow
Run.

Introduction to Multi-Task Jobs

Click here for the customer enrollment link.

Duration: 30 minutes

Course description: After a recap of single-task jobs, as well the directed acyclic
graph (DAG) model, you will learn how to create, trigger or schedule Multi-Task jobs
in Databricks.

Prerequisites:

● Experience working with the Databricks Workspace

Learning objectives:

● Explain what Multi-Task Jobs are.

● Describe how Multi-Task Jobs fits into the Databricks ecosystem.
● Articulate how to use Multi-Task Jobs for appropriate use cases.
● Demonstrate how to use Multi-Task Jobs.

Introduction to Natural Language

Processing
Click here for the customer enrollment link.

Duration: 4 hours

Course description: This course will introduce you to natural language processing
with Databricks. You will learn how to generate
term-frequency-inverse-document-frequency (TFIDF) vectors for your datasets
and how to perform latent semantic analysis using the Databricks Machine Learning
Runtime.

Prerequisites:

● Intermediate experience performing machine learning/data science workflows

● Intermediate experience using the Databricks Data Science Workspace to
perform machine learning workflows

Learning objectives:

● Describe foundational concepts about how latent semantic analysis is used

to analyze text data.
● Perform latent semantic analysis using the Databricks Machine Learning
Runtime with the Databricks Workspace.
● Generate TFIDF vectors to reduce the noise in a dataset being used for latent
semantic analysis in a Databricks Workspace.

Introduction to Photon
Click here for the customer enrollment link.

Duration: 30 minutes

Course description: In this course, you’ll learn how Photon can be used to reduce
Databricks total cost of ownership (TCO) and dramatically improve query
performance. You’ll also learn best practices for when to use and not use Photon
Finally, the course will include a demonstration of a query run with and without
Photon to show improvement in query performance.

Prerequisites:

● Administrator privileges
● Introductory knowledge about the Databricks Lakehouse Platform (what the
Databricks Lakehouse Platform is, what it does, main components, etc.)

Learning objectives:

● Explain fundamental concepts about Photon on Databricks.

● Describe the benefits of enabling Photon on Databricks.
● Identify queries that would benefit from using Photon
● Describe the performance differences between a query run with and without
Photon enabled

Introduction to SQL on Databricks

Click here for the customer enrollment link.

Duration: 1 hour

Course description: Databricks, a managed platform for running Apache Spark,

provides a premier environment for processing SQL workloads. Spark SQL is a Spark
module for structured data processing. It can act as a distributed SQL query engine,
enabling queries to run up to 100x faster on existing deployments and data. Users
with a classical SQL background can immediately begin to work in the Databricks
SQL environment. Using Spark SQL on Databricks has multiple advantages over
using SQL with traditional tools.

Prerequisites & Requirements

● Prerequisites
○ Familiarity with SQL

Learning objectives

● Identify the benefits of using Spark SQL on Databricks

● Describe basic cluster computing concepts like parallelization
● Use Spark SQL on Databricks to run basic queries
● Explain how common functions and Databricks tools can be applied to
upload, view, and visualize data

Just Enough Python for Apache Spark

Click here for the customer enrollment link.

Duration: 6 hours

NOTE: This is an e-learning version of the Just Enough Python for Apache Spark
instructor-led course. It is an on-demand recording available via the Databricks
Academy and covers the same content as the instructor-led course. For more
information about what’s in the course itself, please visit this link.

Lakehouse with Delta Lake Deep Dive

Click here for the customer enrollment link.

Duration: 3 hours

Course description: This course begins with an overview of the Lakehouse

architecture, and an in-depth look at key Delta Lake features and functionality that
make a Lakehouse possible. Participants will build end-to-end OLAP data pipelines
using Delta Lake for batch and streaming data. The course also discusses serving
data to end users through aggregate tables and Databricks SQL Analytics.
Throughout the course, emphasis will be placed on using data engineering best
practices with Databricks.

Prerequisites:

● Intermediate to advanced SQL skills

● Intermediate to advanced Python skills
● Beginning experience using the Spark DataFrames API
● Beginning knowledge of general data engineering concepts
● Beginning knowledge of the core features and use cases of Delta Lake

Learning objectives:

● Identify the core components of Delta Lake that make a Lakehouse possible.
● Define commonly used optimizations available in Delta Engine.
● Build end-to-end batch and streaming OLAP data pipeline using Delta Lake.
● Make data available for consumption by downstream stakeholders using
specified design patterns.
● Document data at the table level to promote data discovery and cross-team
communication.
● Apply Databricks’ recommended best practices in engineering a single source
of truth Delta architecture.

Migrating SAS Procedures to Databricks

Click here for the customer enrollment link.

Duration: 30 minutes

Course description: This course will enable experienced SAS developers to quickly
learn how to translate familiar SAS statements and functions into code that can be
run on Databricks. It begins with an introduction to the Databricks environment and
the different approaches to coding in Databricks, followed by an overview of how
SAS PROC and DATA steps can be performed in Databricks. You will learn about how
you can use Spark SQL, PySpark, and other tools to read .sas7bdat files and perform
common operations. Finally, you will see code examples and gain hands-on practice
performing some of the most common SAS operations in Databricks.

Prerequisites:

● Intermediate to advanced SAS programming experience

● Beginning knowledge of Python programming
● Beginning-level experience with SQL

Learning objectives:

● Read data stored in .sas7bdat files using Spark SQL and PySpark.
● Explain the conceptual and syntactical relationships between SAS DATA and
PROC statements and their correlaries on Databricks.
● Learn how Python can be leveraged to augment ANSI SQL to create reusable
Spark SQL code.
● Translate common PROC functions to Databricks.
● Translate common DATA steps to Databricks.
Natural Language Processing at Scale with
Databricks
Click here for the customer enrollment link.

Duration: 5 hours

Course description: This five-hour course will teach you how to do natural language
processing at scale on Databricks. You will apply libraries such as NLTK and Gensim
in a distributed setting as well as SparkML/MLlib to solve classification, sentiment
analysis, and text wrangling tasks. You will learn how to remove stop words, when to
lemmatize vs stem your tokens, and how to generate
term-frequency-inverse-document-frequency (TFIDF) vectors for your dataset. You
will also use dimensionality reduction techniques to visualize word embeddings with
Tensorboard and apply and visualize basic vector arithmetic to embeddings.

Prerequisites:

● Experience working with PySpark DataFrames

● Mastery of concepts presented in the Databricks Academy "Apache Spark
Programming" course
● Mastery of concepts presented in the Databricks Academy "Scalable Machine
Learning with Apache Spark" course

Learning objectives:

● Explain the motivation behind using Natural Language Processing to analyze

data.
● Identify distributed Natural Language Processing libraries commonly used
when analyzing data.
● Perform a series of Natural Language Processing workflows in the Databricks
Data Science Workspace

New Capability Overview: Feature Store

Click here for the customer enrollment link.

Duration: 1 hour
Course description: In this course, learners will practice using the Databricks Feature
Store. From creating and updating feature store tables to searching across the
Feature Store, functionality is accessible through Databricks notebooks and Jobs.
Feature Store enables data practitioners to share and discover features across their
organization, as well as ensure that the same feature computation code is used for
model training and inference.

Prerequisites & Requirements

● Prerequisites
○ Creating models with SciKit-Learn or ML Lib
○ Hardening for security concerns like handling data in flight, CORS or
SQL injection
○ API architecture beyond REST (e.g. SOAP or graph models will not be
discussed)
○ Optimizing clusters for serving (e.g. latency, SLAs, and throughput
concerns)
○ How the MLflow Registry works. Better if learner can log models to the
registry
○ Monitoring model drift and performance

Learning objectives

● Describe common problems that Model Serving overcomes

● Utilize Databricks Model Serving to deploy a real-time model via a REST
endpoint

Optimizing Apache Spark on Databricks

Click here for the customer course enrollment link.

Duration: 12 hours

NOTE: This is an e-learning version of the Optimizing Apache Spark on Databricks

instructor-led course. It is an on-demand recording available via the Databricks
Academy and covers the same content as the instructor-led course. For more
information about what’s in the course itself, please visit this link.
Propagating Changes with Delta Change
Data Feed
Click here for the customer course enrollment link.

Duration: 1 hour

Course description: A Delta change data feed represents row-level changes

between versions of a Delta table. When enabled on a Delta table, the runtime
records “change events” for all the data written into the table. This includes the row
data along with metadata indicating whether the specified row was inserted,
deleted, or updated. In this course, we'll examine some of the motivations and use
cases for this feature and see it in action.

Prerequisites:

● Basic knowledge of Spark Structured Streaming APIs

● Basic knowledge of Delta Lake

Learning objectives:

● Describe how Delta Change Data Feed emits change data records.
● Use appropriate syntax and settings to set up Change Data Feed.
● Propagate inserts, updates, and deletes with Change Data Feed.

Quick Reference: CI/CD

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: This quick-reference provides an overview of fundamental

concepts behind CI/CD. While the Databricks tools and integrations mentioned in
this course can be used by DevOps teams for CI/CD, this course was designed to
summarize what happens during each stage of a CI/CD pipeline (not provide a
technical how-to into each of these stages). Future courses will dive into each of
these stages in greater detail. Note: We will use Jenkins as an example automation
system in this course.
Prerequisites:

● Beginning-level experience with CICD, DevOps and/or the software

development lifecycle (not necessarily on Databricks)

Learning objectives:

● Summarize each stage in a traditional CI/CD pipeline.

● Outline the steps in configuring the Jenkins automation agent for use in
CI/CD.

Quick Reference: Spark Architecture

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: Apache Spark™ is a unified analytics engine for large scale data
processing known for its speed, ease and breadth of use, ability to access diverse
data sources, and APIs built to support a wide range of use-cases. Databricks builds
on top of Spark and adds many performance and security enhancements. This
course is meant to provide an overview of Spark’s internal architecture.

Prerequisites:

● Beginning knowledge of big data and data science concepts.

Learning objectives:

● Describe basic Spark architecture and define terminology such as “driver”

and “executor”
● Explain how parallelization allows Spark to improve speed and scalability of an
application
● Describe lazy evaluation and how it relates to pipelining
● Identify high-level events for each stage in the Optimization process

Scaling Machine Learning Pipelines

Click here for the customer course enrollment link.

Duration: 3 hours
Course description: In this course, learners integrate machine learning solutions with
scalable production pipelines backed by Apache Spark. Learners will start by
investigating common inefficiencies in machine learning. Next, students will learn to
scale the development and tuning of the machine learning workflow using tools like
Spark ML and Hyperopt. Finally, learners will finish by using Pandas UDFs and the
Pandas Function APIs to create and apply group-specific machine learning models.
By the end of this course, learners will be capable of scaling the entirety of a
machine learning pipeline.

Prerequisites:

● Intermediate level experience with Apache Spark (familiarity with Spark

architecture and Spark DataFrame API).
● Intermediate level experience with Python and its single-node data science
stack (familiarity with libraries, iteration, control flow, operators, and classes).
● Intermediate level knowledge of and experience in machine learning
(supervised learning vs. unsupervised learning, regression vs. classification,
clustering, and experience building models following a machine learning
workflow with single-node libraries like Scikit-learn).

Learning objectives:

● Evaluate characteristics of machine learning pipelines to determine how to

scale with Apache Spark.
● Run common machine learning data preparation techniques on big data using
Apache Spark.
● Develop machine learning models for big data using Apache Spark.
● Accelerate the tuning of single-node machine learning models using
Hyperopt and Apache Spark.
● Apply grouped machine learning model training and inference using Pandas
UDFs and the Pandas Function APIs.
● Employ Databricks-recommended best practices to scale a machine learning
pipeline using previously covered techniques.

Scalable Machine Learning with Apache

Spark
Click here for the customer course enrollment link.
Duration: 12 hours

NOTE: This is an e-learning version of the Scalable Machine Learning with Apache
Spark instructor-led course. It is an on-demand recording available via the
Databricks Academy and covers the same content as the instructor-led course. For
more information about what’s in the course itself, please visit this link.

SQL Coding Challenges

Click here for the customer course enrollment link.

Duration: 1 hour

Course description: Taking this course will familiarize you with the content and
format of the Associate SQL Analyst Accreditation, as well as provide you some
practical exercises that you can use to improve your skills or cement newly learned
concepts. We recommend that you complete Fundamentals of SQL on Databricks
and Applications of SQL on Databricks before using this guide.

Prerequisites:

● Intermediate-level ability with SQL

Learning objectives:

● Describe the format and scope of the SQL analyst accreditation

● Identify the scope of knowledge-based and practical topics covered
● Complete practical exercises to practice applying SQL skills on Databricks

Structured Streaming
Click here for the customer course enrollment link.

Duration: 1 hour

Course description: This hands-on self-paced training course targets data

engineers who want to process big data using Apache Spark™ Structured Streaming.
The course is a series of four self-paced lessons. Each lesson includes hands-on
exercises. The course contains Databricks notebooks for both Azure Databricks and
AWS Databricks; you can run the course on either platform.
Prerequisites:

● Completion of Apache Spark Programming on Databricks course strongly

encouraged

Learning objectives:

● Use the interactive Databricks notebook environment

● Ingest streaming log file data
● Aggregate small batches of data with time windows
● Stream data from a Kafka connection
● Use Structured Streaming in conjunction with Databricks Delta
● Visualize streaming live data
● Use Structured Streaming to analyze streaming Twitter data

Tracking Experiments with MLflow

Click here for the customer course enrollment link.

Duration: 2 hours

Course description: In this course, we’ll show you how to design an MLflow
experiment to identify the best machine model for deployment. This course is the
second in a series of three courses developed to show you how to use Databricks to
work with a single data set from experimentation to production-scale machine
learning model deployment. The other courses in this series include:

● Data Science on Databricks: The Bias-Variance Tradeoff

● Deploying a Machine Learning Project with MLflow Projects

Prerequisites:

● Beginning-level experience running data science workflows in the Databricks

Workspace
● Beginner-level experience with Apache Spark
● Intermediate-level experience with the Scipy Numerical Stack

Learning objectives:

● Create and explore an augmented sample from user event and profile data.
● Design an MLflow experiment and write notebook-based software to run the
experiment to assess various linear models.
● Examine experimental results to decide which model to develop for
production.

What are Enterprise Data Management

Systems?
Click here for the customer course enrollment link.

Duration: 1 hour

Course description: Whether your organization is moving to the cloud for the first
time or reevaluating its current approach, making decisions about the technology
used when storing your data can have huge implications for costs and performance
in downstream analytics. As a platform focused on computation and analytics,
Databricks seeks to help our customers make choices that unlock new
opportunities, reduce redundancies, and connect data teams. In this course, you’ll
start by exploring the characteristics of data lakes, and data warehouses, two
popular data storage technologies. Then, you’ll learn about the Lakehouse, a new
data storage system invented and made popular by Databricks.

Prerequisites:

● Beginning knowledge about the Databricks Unified Data Analytics Platform.

● We recommend taking the courses: Fundamentals of Big Data and
Fundamentals of Unified Data Analytics with Databricks prior to taking this
course.

Learning objectives:

● Describe the strengths and limitations of data lakes, related to data storage.
● Describe the strengths and limitations of data warehouses, related to data
storage.
● Contrast data lake and data warehouse characteristics.
● Compare the features of a Lakehouse to the features of popular data storage
management solutions.
What is Big Data?
Click here for the customer course enrollment link.

Duration: 1 hour

Course description: This course was created for individuals who are new to the big
data landscape and want to become conversant with big data terminology. It will
cover foundational concepts related to the big data landscape including:
characteristics of big data; the relationship between big data, artificial intelligence,
and data science; how individuals on data science teams work with big data; and
how organizations can use big data to enable better business decisions.

Prerequisites:

● Experience using a web browser

Learning objectives:

● Explain foundational concepts used to define big data.

● Explain how the characteristics of big data have changed traditional
organizational workflows for working with data.
● Summarize how individuals on data science teams work with big data on a
daily basis to drive business outcomes.
● Articulate examples of real-world use-cases for big data in businesses across
a variety of industries.

What is Cloud Computing?

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: This introductory-level course is designed to familiarize

individuals new to the cloud computing landscape. It will cover foundational
concepts related to cloud computing starting with the basics - what cloud
computing is and why, since 2011, over 30% of organizations have moved their
operations to the cloud. The course will also cover topics like cloud delivery models
and deployment types.
Please note that this course is about cloud computing in general and does not focus
on Databricks, specifically.

Prerequisites:

● Experience using a web browser

Learning objectives:

● Summarize foundational concepts about cloud computing.

● Describe major cloud computing components.
● Explain the three major cloud computing delivery models.
● Explain the three major cloud computing deployment models.
● Outline the benefits of moving an organization’s operations to the cloud.

What is Databricks Machine Learning?

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: Databricks Machine Learning offers data scientists and other
machine learning practitioners a platform for completing and managing the
end-to-end machine learning lifecycle. This course guides business leaders and
practitioners through a basic overview of Databricks Machine Learning, the benefits
of using Databricks Machine Learning, its fundamental components and
functionalities, and examples of successful customer use.

Prerequisites:

● Beginning-level knowledge of the Databricks Lakehouse platform

Learning objectives:

● Describe the basic overview of Databricks Machine Learning.

● Identify how using Databricks Machine Learning benefits data science and
machine learning teams.
● Summarize the fundamental components and functionalities of Databricks
Machine Learning.
● Exemplify successful use cases of Databricks Machine Learning by real
Databricks customers.
What is Databricks SQL?
Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: Databricks SQL offers SQL users a platform for querying,
analyzing, and visualizing data in their organizations Lakehouse. This course explains
how Databricks SQL processes queries and guides users through how to use the
interface. Then, this course will explain how you can connect to Databricks SQL to
your favorite business intelligence tool, so that you can query your Lakehouse
without making changes to your analytical and dashboarding workflows.

Prerequisites:

● None.

Learning objectives:

● Summarize fundamental concepts for using Databricks SQL effectively.

● Identify tools and features in Databricks SQL for querying and analyzing data
as well as sharing insights with the larger organization.
● Explain how Databricks SQL supports data analysis workflows that allow users
to extract and share business insights

What is Delta Lake?

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: Delta Lake is an open format storage layer that sits on top of
your organization’s data lake. It is the foundation of a cost-effective, highly scalable
Lakehouse and is an integral part of the Databricks Lakehouse Platform.

In this course, we’ll break down the basics behind Delta Lake - what it does, how it
works, and why it is valuable from a business perspective, to any organization with
big data and AI projects.

Note: This is an introductory-level course that will not showcase in-depth

technical Delta Lake demos nor provide hands-on technical training with Delta Lake.
Please see the Delta Lake Rapidstart courses available in the Databricks Academy
for technical training on Delta Lake.

Prerequisites:

● Beginning knowledge of the Databricks Lakehouse Platform. We

recommended taking the course Fundamentals of the Databricks Lakehouse
Platform prior to taking this course.

Learning objectives:

● Describe how Delta Lake fits into the Databricks Lakehouse Platform.
● Explain the four elements encompassed by Delta Lake.
● Summarize high-level Delta Lake functionality that helps organizations solve
common challenges related to enterprise-scale data analytics.
● Articulate examples of how organizations have employed Delta Lake on
Databricks to improve business outcomes.

What is Machine Learning?

Click here for the customer course enrollment link.

Duration: 1 hour

Course description: In this course you’ll learn fundamental concepts about machine
learning. First, we’ll review machine learning basics - what it is, why it’s used, and
how it relates to data science. Then, we’ll explore the two primary categories that
machine learning problems are categorized into - supervised and unsupervised
learning. Finally, we’ll review how the machine learning workflow fits into the data
science process.

Prerequisites & Requirements

● Prerequisites
○ Beginning knowledge about concepts related to the big data landscape
helpful but not required (i.e. big data types, analysis techniques,
processing techniques, etc.)
○ We recommend taking the Databricks Academy course "Introduction to
Big Data" before taking this course.

Learning objectives
● Explain how machine learning is used as an analysis tool in data science.
● Summarize the relationship between the data science process and the
machine learning workflow.
● Describe the two primary categories that machine learning problems are
categorized into.
● Describe popular machine learning techniques within the two primary
categories of machine learning.
● Determine the machine learning technique that should be used to analyze
data in a given real-world scenario.

What is Structured Streaming?

Click here for the customer course enrollment link.

Duration: 1 hour

Course description: A common struggle that organizations face is how to accurately

ingest and perform calculations on real-time data. This data is also referred to as
streaming data, and the challenges behind working with it lie in its real-time nature -
because it is constantly arriving, mechanisms must be put into place to process and
write to a data store. In this course, you’ll learn about Structured Streaming, an
Apache Spark API that helps data practitioners overcome the challenges of working
with streaming data. We’ll cover fundamental concepts about batch and streaming
data to help set the stage for our discussion on Structured Streaming. Then, we’ll
discuss where Structured Streaming fits into an organization’s big data ecosystem.
Finally, we’ll review real-world Structured Streaming business use cases.

Prerequisites & Requirements

● Prerequisites
○ Beginning knowledge about the Databricks Unified Data Analytics
Platform (what it is, what it is used for)
○ Beginning knowledge about concepts related to the big data landscape
(for example: structured streaming, batch processing, data pipelines)
○ Note: We recommend taking the following two Databricks Academy
courses to help you prepare for this course: Fundamentals of Big Data
and Fundamentals of Unified Data Analytics with Databricks.

Learning objectives
● Explain the benefits of Structured Streaming for working with streaming data.
● Distinguish where Structured Streaming fits into an organization’s big data
ecosystem.
● Articulate examples of real-world business use cases for Structured
Streaming.
● Describe popular machine learning techniques within the two primary
categories of machine learning.

What is the Databricks Lakehouse

Platform?
Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: This course is designed for everyone who is brand new to the
Platform and wants to learn more about what it is, why it was developed, what it
does, and the components that make it up.

Our goal is that by the time you finish this course, you’ll have a better understanding
of the Platform in general and be able to answer questions like: What is Databricks?
Where does Databricks fit into my workflow? How have other customers been
successful with Databricks?

NOTE: This course does not contain hands-on practice with the Databricks
Lakehouse Platform.

Prerequisites:

● Experience using a web browser.

Learning objectives:

● Describe what the Databricks Lakehouse Platform is.

● Explain the origins of the Lakehouse data management paradigm.
● Outline fundamental problems that cause most enterprises to struggle with
managing and making use of their data.
● Identify the most popular components of the Databricks Lakehouse Platform
used by data practitioners, depending on their unique role.
● Give examples of organizations that have used the Databricks Lakehouse
Platform to streamline big data processing and analytics.
● Describe security features that come built-in to the Databricks Lakehouse
Platform.

What’s New in Apache Spark 3.0

Click here for the customer course enrollment link.

Duration: 30 minutes

Course description: This course was created to teach Databricks users about the
major improvements to Spark in the 3.0 release. It will give an overview of new
features meant to improve performance and usability. Students will also learn about
backwards compatibility with 2.x and some of the considerations required for
updating to Spark 3.0.

Prerequisites:

● Familiarity with Spark 2.x

Learning objectives:

● Describe major improvements to performance in Spark 3.0

● Identify major usability improvements in Spark 3.0
● Recognize relevant compatibility considerations for migrating to Spark 3.0

Credential descriptions

Azure Databricks Certified Associate

Platform Administrator
Click here for the customer certification informational page.

Cost: $200 USD

Duration: 2 hours
Certification exam description: The Azure Databricks Certified Associate Platform
Administrator certification exam assesses an understanding of network
infrastructure and security with Databricks, including workspace deployment, Azure
cloud concepts, and network security. The exam also assesses the understanding of
identity and access on Azure Databricks, including identity management, workspace
access control, data access control, and fine-grained security. In addition, the exam
assesses cluster configuration and usage management. Lastly, developer tools and
automation processes are assessed.

Prerequisites:

● The minimally qualified candidate should have:

○ have an intermediate understanding of network infrastructure and
security, including: workspace deployment, Azure cloud concepts,
network security
○ have a complete understanding of identity and access configurations,
including: identity management, workspace access control, data
access control, fine-grained security using SQL
○ have an intermediate understanding of cluster usage, including: clust
configuration and usage management
○ have a basic understanding of automation, including: developer tools,
automation processes

Databricks Certified Associate Developer

for Apache Spark 3.0
Click here for the customer certification informational page.

Cost: $200 USD

Duration: 2 hours

Certification exam description: The Databricks Certified Associate Developer for

Apache Spark 3.0 certification exam assesses the understanding of the Spark
DataFrame API and the ability to apply the Spark DataFrame API to complete basic
data manipulation tasks within a Spark session. These tasks include selecting,
renaming and manipulating columns; filtering, dropping, sorting, and aggregating
rows; handling missing data; combining, reading, writing and partitioning DataFrames
with schemas; and working with UDFs and Spark SQL functions. In addition, the exam
will assess the basics of the Spark architecture like execution/deployment modes,
the execution hierarchy, fault tolerance, garbage collection, and broadcasting.

Prerequisites:

● The minimally qualified candidate should:

○ have a basic understanding of the Spark architecture, including
Adaptive Query Execution
○ be able to apply the Spark DataFrame API to complete individual data
manipulation task, including:
○ selecting, renaming and manipulating columns
○ filtering, dropping, sorting, and aggregating rows
○ joining, reading, writing and partitioning DataFrames
○ working with UDFs and Spark SQL functions

It is expected that developers that have been using the Spark DataFrame API
for six months or more should be able to pass this certification exam.

While it will not be explicitly tested, the candidate must have a working
knowledge of either Python or Scala. The exam is available in both languages.

Databricks Certified Associate Data

Engineer
Click here for the customer certification informational page.

Duration: 1.5 hours

Price $150

Certification exam description: The Databricks Certified Data Engineer Associate

certification exam assesses an individual’s ability to use the Databricks Lakehouse
Platform to complete introductory data engineering tasks. This includes an
understanding of the Lakehouse Platform and its workspace, its architecture, and its
capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks
using Apache Spark SQL and Python in both batch and incrementally processed
paradigms. Finally, the exam assesses the tester’s ability to put basic ETL pipelines
and Databricks SQL queries and dashboards into production while maintaining
entity permissions. Individuals who pass this certification exam can be expected to
complete basic data engineering tasks using Databricks and its associated tools.

Prerequisites:

The minimally qualified candidate should be able to:

● Understand how to use and the benefits of using the Databricks Lakehouse
Platform and its tools, including:
○ Data Lakehouse (architecture, descriptions, benefits)
○ Data Science and Engineering workspace (clusters, notebooks, data
storage)
○ Delta Lake (general concepts, table management and manipulation,
optimizations)
● Build ETL pipelines using Apache Spark SQL and Python, including:
○ Relational entities (databases, tables, views)
○ ELT (creating tables, writing data to tables, cleaning data, combining
and reshaping tables, SQL UDFs)
○ Python (facilitating Spark SQL with string manipulation and control flow,
passing data between PySpark and Spark SQL)
● Incrementally process data, including:
○ Structured Streaming (general concepts, triggers, watermarks)
○ Auto Loader (streaming reads)
○ Multi-hop Architecture (bronze-silver-gold, streaming applications)
○ Delta Live Tables (benefits and features)
● Build production pipelines for data engineering applications and Databricks
SQL queries and dashboards, including:
○ Jobs (scheduling, task orchestration, UI)
○ Dashboards (endpoints, scheduling, alerting, refreshing)
● Understand and follow best security practices, including:
○ Entity Permissions (team-based permissions, user-based permissions)

Databricks Certified Professional Data

Engineer
Click here for the customer certification informational page.

Cost: $200 USD

Duration: 2 hours

Certification exam description: The Databricks Certified Data Engineering

Professional certification exam assesses an individual’s ability to use Databricks to
perform common data engineering tasks. This includes an understanding of the
Databricks platform and developer tools like Apache Spark, Delta Lake, MLflow, and
the Databricks CLI and REST API. It also assesses the ability to build optimized and
cleaned ETL pipelines. Additionally, modeling data into a Lakehouse using knowledge
of general data modeling concepts will also be assessed. Finally, ensuring that data
pipelines are secure, reliable, monitored, and tested before deployment will also be
included in this exam.

Individuals who pass this certification exam can be expected to complete data
engineering tasks using Databricks and its associated tools.

Prerequisites:

The minimally qualified candidate should be able to:

● Understand how to use and the benefits of using the Databricks platform and
its tools, including:
○ Platform (notebooks, clusters, Jobs, Databricks SQL, relational entities,
Repos)
○ Apache Spark (PySpark, DataFrame API, basic architecture)
○ Delta Lake (SQL-based Delta APIs, basic architecture, core functions)
○ Databricks CLI (deploying notebook-based workflows)
○ Databricks REST API (configure and trigger production pipelines)
● Build data processing pipelines using the Spark and Delta Lake APIs, including:
○ Building batch-processed ETL pipelines
○ Building incrementally processed ETL pipelines
○ Optimizing workloads
○ Deduplicating data
○ Using Change Data Capture (CDC) to propagate changes
● Model data management solutions, including:
○ Lakehouse (bronze/silver/gold architecture, databases, tables, views,
and the physical layout)
○ General data modeling concepts (keys, constraints, lookup tables,
slowly changing dimensions)
● Build production pipelines using best practices around security and
governance, including:
○ Managing notebook and jobs permissions with ACLs
○ Creating row- and column-oriented dynamic views to control
user/group access
○ Securely storing personally identifiable information (PII)
○ Securely delete data as requested according to GDPR & CCPA
● Configure alerting and storage to monitor and log production jobs, including:
○ Setting up notifications
○ Configuring SparkListener
○ Recording logged metrics
○ Navigating and interpreting the Spark UI
○ Debugging errors
● Follow best practices for managing, testing and deploying code, including:
○ Managing dependencies
○ Creating unit tests
○ Creating integration tests
○ Scheduling Jobs
○ Versioning code/notebooks
○ Orchestration Jobs

It is expected that testers with at least 1-2 years of experience in data engineering
with Databricks should be able to pass this exam.

Databricks Certified Professional Data

Scientist
Click here for the customer certification informational page.

Cost: $200 USD

Duration: 2 hours

Certification exam description: The Databricks Certified Professional Data Scientist

certification exam assesses the understanding of the basics of machine learning
and the steps in the machine learning lifecycle, including data preparation, feature
engineering, the training of models, model selection, interpreting models, and the
production of models. The exam also assesses the understanding of basic machine
learning algorithms and techniques, including linear regression, logistic regression,
regularization, decision trees, tree-based ensembles, basic clustering algorithms,
and matrix factorization techniques. The basics of model management with MLflow,
like logging and model organization, are also assessed.

Prerequisites:

● The minimally qualified candidate should have:

○ a complete understanding of the basics of machine learning, including:
the bias-variance tradeoff, in-sample vs. our of sample data, categories
of machine learning, applied statistics concepts
○ an intermediate understanding of the steps in the machine learning
lifecycle, including: data preparation, feature engineering, model
training, model selection and model production, interpreting models
○ a complete understanding of basic machine learning algorithms and
techniques, including: linear, logistic, and regularized regression,
tree-based models like decision trees, random forest and gradient
boosted trees, unsupervised techniques like K-means and PCA,
specific algorithms like ALS for recommendation and isolation forests
for outlier detection
○ a complete understanding of the basics of machine learning model
management like logging and model organization with MLflow

Fundamentals of the Databricks

Lakehouse Platform Accreditation
Click here for the customer accreditation enrollment link.

Cost: Free for Databricks customers

Duration: .5 hours

Accreditation description: This is a 30-minute assessment that will test your

knowledge about fundamental concepts related to the Databricks Lakehouse
Platform. Questions will assess how well you know about the platform in general, how
familiar you are with the individual components of the platform, and your ability to
describe how the platform helps organizations accomplish their data engineering,
data science/machine learning, and business/SQL analytics use cases. Please note
that this assessment will not test your ability to perform tasks using Databricks
functionality. Instead, it will test how well you can explain components of the
platform and how they fit together.

After successfully completing this assessment, you will be awarded a Databricks

Lakehouse Platform badge.

This accreditation is the beginning step in most of the Databricks Academy learning
plans - SQL Analysts, Data Scientists, Data Engineers, and Platform Administrators.
Business leaders are also welcome to take this assessment.

Prerequisites:

● We recommend that you take the following courses to prepare for this
accreditation exam:
○ What is the Databricks Lakehouse Platform?
○ What are Enterprise Data Management Systems? (particularly the
section on Lakehouse architecture)
○ What is Delta Lake?
○ What is Databricks SQL?
○ What is Databricks Machine Learning?

SQL Analyst Associate Accreditation

Click here for the customer accreditation enrollment link.

Duration: 1 hour

Click here for the self-paced customer enrollment link:

Accreditation description: In this 1-hour accreditation exam, you will demonstrate

your ability to use Apache Spark SQL to query, transform, and present data.

Prerequisites:

● Intermediate experience with SQL.

Azure Data Engineering Interview Q & A - Topicwise
100% (1)
Azure Data Engineering Interview Q & A - Topicwise
57 pages
Pyspark
No ratings yet
Pyspark
31 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
SQL Interview Questions For A Data Engineer
No ratings yet
SQL Interview Questions For A Data Engineer
11 pages
Azure Data Factory
100% (1)
Azure Data Factory
6 pages
SQL Patterns v1.5
100% (1)
SQL Patterns v1.5
113 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Azure Databricks Interview Question
No ratings yet
Azure Databricks Interview Question
12 pages
Databricks Essentials: A Guide to Unified Data Analytics
From Everand
Databricks Essentials: A Guide to Unified Data Analytics
Robert Johnson
No ratings yet
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
Pyspark Hands On
No ratings yet
Pyspark Hands On
189 pages
External Tables
No ratings yet
External Tables
105 pages
Azure Synapse
No ratings yet
Azure Synapse
229 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Dec 01 2020
No ratings yet
Dec 01 2020
298 pages
Assignment Roof
100% (4)
Assignment Roof
68 pages
Snowflake - Syllubus and DBT
No ratings yet
Snowflake - Syllubus and DBT
11 pages
SCD in Databricks
No ratings yet
SCD in Databricks
16 pages
Snowflake
No ratings yet
Snowflake
122 pages
De Mod 4 Build Data Pipelines With Delta Live Tables
No ratings yet
De Mod 4 Build Data Pipelines With Delta Live Tables
52 pages
Databricks Final
100% (1)
Databricks Final
81 pages
ADE Azure Data Engineer Interview
No ratings yet
ADE Azure Data Engineer Interview
12 pages
Databricks
No ratings yet
Databricks
43 pages
DP 3011 ENU PowerPoint - 01 Content
No ratings yet
DP 3011 ENU PowerPoint - 01 Content
42 pages
Manage Data Access With Unity Catalog
No ratings yet
Manage Data Access With Unity Catalog
17 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Practicum Report On Transformer Repairing and Testing at 33/11kV Substation of Gazipur PBS-1, BREB Power Distribution Network
No ratings yet
Practicum Report On Transformer Repairing and Testing at 33/11kV Substation of Gazipur PBS-1, BREB Power Distribution Network
82 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
How To Create Secrets in Databricks? - by Ashish Garg - Medium
No ratings yet
How To Create Secrets in Databricks? - by Ashish Garg - Medium
13 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
What Is Spark?: Up To 100× Faster
No ratings yet
What Is Spark?: Up To 100× Faster
56 pages
Environment Notes by Akshay Jadhav Sir Rank52
No ratings yet
Environment Notes by Akshay Jadhav Sir Rank52
176 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
Azure DE Interview Que
100% (1)
Azure DE Interview Que
25 pages
Databricks
No ratings yet
Databricks
4 pages
Philippine Public Administration
No ratings yet
Philippine Public Administration
15 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
06.introduction To Data Factory
No ratings yet
06.introduction To Data Factory
26 pages
Spark SQL Optimization
No ratings yet
Spark SQL Optimization
29 pages
Delta Table and Pyspark Interview Questions
100% (1)
Delta Table and Pyspark Interview Questions
14 pages
The Medallion Architecture
100% (1)
The Medallion Architecture
2 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Advanced Project For Data Engineering in Azure
100% (1)
Advanced Project For Data Engineering in Azure
5 pages
Dice Resume CV SN
No ratings yet
Dice Resume CV SN
5 pages
Databricks Dbutils
100% (1)
Databricks Dbutils
34 pages
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
54 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Master Snowflake Interview Q A 1729835390
No ratings yet
Master Snowflake Interview Q A 1729835390
7 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
SCD Typ2 in Databricks Azure
0% (1)
SCD Typ2 in Databricks Azure
8 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
BNN Bootcamp 5 (Combination of Planets Part-3)
100% (3)
BNN Bootcamp 5 (Combination of Planets Part-3)
63 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
A Data Pipeline Should Address These Issues:: Topics To Study
No ratings yet
A Data Pipeline Should Address These Issues:: Topics To Study
10 pages
Spark Concept
No ratings yet
Spark Concept
18 pages
ERP in FMCG Company
No ratings yet
ERP in FMCG Company
48 pages
Spark SQL
No ratings yet
Spark SQL
24 pages
Bhaskar ADE - Altimetrik
No ratings yet
Bhaskar ADE - Altimetrik
3 pages
Lab 3 - Enabling Team Based Data Science With Azure Databricks
No ratings yet
Lab 3 - Enabling Team Based Data Science With Azure Databricks
18 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Neofiti 1 - Deuteronomio - Translation-English
No ratings yet
Neofiti 1 - Deuteronomio - Translation-English
68 pages
AXP 2023 2024 ESG Report
No ratings yet
AXP 2023 2024 ESG Report
91 pages
Batl006 PDF
No ratings yet
Batl006 PDF
26 pages
Best Practices For Delivering and Sharing Content in The Power BI Service
No ratings yet
Best Practices For Delivering and Sharing Content in The Power BI Service
73 pages
150 Data Engineering Interview Questions PDF
No ratings yet
150 Data Engineering Interview Questions PDF
8 pages
DataEngineer Roadmap
No ratings yet
DataEngineer Roadmap
12 pages
Li Fung Trading - Case Study Solutions
100% (2)
Li Fung Trading - Case Study Solutions
3 pages
2.7 Years AzureDataEngineer Prateek
No ratings yet
2.7 Years AzureDataEngineer Prateek
2 pages
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
No ratings yet
Writing Drills - Answer 2012: A) Exercise 1A: Formal Letter
10 pages
VRTM
No ratings yet
VRTM
161 pages
Spark Use Cases
No ratings yet
Spark Use Cases
2 pages
ADE Project Along With CI - CD Pipeline
No ratings yet
ADE Project Along With CI - CD Pipeline
36 pages
Perencanaan Tebal Perkerasan Landasan Pacu
No ratings yet
Perencanaan Tebal Perkerasan Landasan Pacu
8 pages
The Legendary Life of Upamanyu
No ratings yet
The Legendary Life of Upamanyu
15 pages
Top Superstocks of India
No ratings yet
Top Superstocks of India
17 pages
Comparison Table: Power BI Licence
No ratings yet
Comparison Table: Power BI Licence
5 pages
Ben Beya Article Rodopi Caribbean Global Ethics
No ratings yet
Ben Beya Article Rodopi Caribbean Global Ethics
14 pages
Mini Research On Homeless
No ratings yet
Mini Research On Homeless
6 pages
MDM Guide
No ratings yet
MDM Guide
13 pages
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
No ratings yet
Daily Lesson Log: Tle - Icttd9 - 12al - Ic - E - 3
4 pages
PEPSICO
No ratings yet
PEPSICO
5 pages
Feb 221
No ratings yet
Feb 221
19 pages
How You Can Talk With God
No ratings yet
How You Can Talk With God
5 pages
Red Pills
100% (1)
Red Pills
2 pages
1.1 How-To-Use-This-Competency-Based-Learning-Material
No ratings yet
1.1 How-To-Use-This-Competency-Based-Learning-Material
2 pages
Animal Toxins: - Composition & Chemical Properties
No ratings yet
Animal Toxins: - Composition & Chemical Properties
6 pages
Meal Pass Options
No ratings yet
Meal Pass Options
4 pages
Stratus 3i Installation Guide
No ratings yet
Stratus 3i Installation Guide
8 pages
AEIF 2024 Proposal Forms
No ratings yet
AEIF 2024 Proposal Forms
10 pages
SQL Sub Query
No ratings yet
SQL Sub Query
10 pages
Data Analyst - 6 - Financial Data Analyst
No ratings yet
Data Analyst - 6 - Financial Data Analyst
1 page
Best Processors - January 2024
No ratings yet
Best Processors - January 2024
5 pages
Qlik To PBI
No ratings yet
Qlik To PBI
2 pages
Adjective Order NA
No ratings yet
Adjective Order NA
2 pages
ACM-F015 Intern's Competency Checklist
No ratings yet
ACM-F015 Intern's Competency Checklist
6 pages
Jada - Manikanta ATS Resume-1
No ratings yet
Jada - Manikanta ATS Resume-1
1 page
Best Practices For Migrating From Qlik To Power BI
No ratings yet
Best Practices For Migrating From Qlik To Power BI
2 pages
Type of Tables
No ratings yet
Type of Tables
1 page
Fpsyt 15 1458939
No ratings yet
Fpsyt 15 1458939
11 pages
The Nexus Between Visioning and Planning
No ratings yet
The Nexus Between Visioning and Planning
2 pages
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
No ratings yet
Abbotsford VFR Terminal Procedures Chart Rwy 01 & 19
3 pages
Group Assignment 6 ICT (XII IPA 5) - 20240118 - 003400 - 0000
No ratings yet
Group Assignment 6 ICT (XII IPA 5) - 20240118 - 003400 - 0000
13 pages
Test Bench TS1300 - High Quality in A Small Space
No ratings yet
Test Bench TS1300 - High Quality in A Small Space
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.