0% found this document useful (0 votes)

538 views

Introduction to Databricks

Uploaded by

Prajwal Khairnar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

538 views

Introduction to Databricks

Uploaded by

Prajwal Khairnar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 149

Introduction to

Databricks
Lakehouse
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Analytics Practitioner
The Data Warehouse
Data Warehouse
Pros

Great for structured data

Highly performant

Easy to keep data clean

Cons

Very expensive

Cannot support modern applications

Not built for Machine Learning

1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

INTRODUCTION TO DATABRICKS
The Data Lake
Data Lake
Pros

Support for all use cases

Very flexible

Cost effective

Cons

Data can become messy

Historically not very performant

1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

INTRODUCTION TO DATABRICKS
Birth of the Lakehouse

1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

INTRODUCTION TO DATABRICKS
Birth of the Lakehouse

1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

INTRODUCTION TO DATABRICKS
The Databricks Lakehouse
The Databricks Lakehouse Platform

Single platform for all data workloads

Built on open source technology

Collaborative environment

Simplified architecture

1 https://www.databricks.com/blog/2021/05/19/evolution-to-the-data-lakehouse.html

INTRODUCTION TO DATABRICKS
Databricks Architecture Benefits
Unification Multi-Cloud

Every use case from AI to BI Bring powerful platform to your data

Benefits of data warehouse and data lake No lock-in to a specific cloud platform

INTRODUCTION TO DATABRICKS
Databricks Development Benefits
Collaborative Open-Source

Every data persona Underpinned by Apache Spark

Ability to work in same platform in real- Support for most popular languages
time (Python, R, Scala, SQL)

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Core features of the
Databricks
Lakehouse Platform
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Apache Spark
Apache Spark is an open-source data processing framework and is the engine underneath
Databricks.

DataCamp Courses

Introduction to Pyspark

Big Data Fundamentals with Pyspark

Cleaning Data with Pyspark

Machine Learning with Pyspark

Introduction to Spark SQL in Python

INTRODUCTION TO DATABRICKS
Benefits of Spark
Key Benefits:

1. Extensible, flexible open-source framework

2. Large developer community

3. High performing

4. Databricks optimizations

1 https://spark.apache.org/docs/latest/cluster-overview.html

INTRODUCTION TO DATABRICKS
Cloud computing basics

INTRODUCTION TO DATABRICKS
Databricks Compute
Clusters

Collection of computational resources

All workloads, any use case

All-purpose vs. Jobs

SQL Warehouses

SQL only

BI use cases

Photon

INTRODUCTION TO DATABRICKS
Cloud data storage

INTRODUCTION TO DATABRICKS
Delta
Delta is an open-source data storage file
format, and provides:

ACID transactions

Unified batch and streaming

Schema evolution

Table history

Time-travel

1 delta.io

INTRODUCTION TO DATABRICKS
Unity Catalog
Unity Catalog is an open data governance
strategy that controls access to all data
assets in the Databricks Lakehouse platform.

SQL GRANT , REVOKE statements to control

access

Simple interface for governance

INTRODUCTION TO DATABRICKS
Databricks UI
Designed for easier access to capabilities
based on your data workload.

All users have access to data and compute

SQL users get a familiar interface for

queries and reports

Data engineers leverage Delta Live Tables

Machine Learning workloads use models,

features, and more

INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Administering a
Databricks
workspace
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Account Admin
Key Responsibilities:

Creating and managing workspaces

Enabling Unity Catalog

Managing identities

Managing the account subscription

INTRODUCTION TO DATABRICKS
Account Console

https://accounts.cloud.databricks.com/

INTRODUCTION TO DATABRICKS
Account Console - Workspaces

https://accounts.cloud.databricks.com/

INTRODUCTION TO DATABRICKS
Account Console - Data

https://accounts.cloud.databricks.com/

INTRODUCTION TO DATABRICKS
Account Console - Users & Groups

https://accounts.cloud.databricks.com/

INTRODUCTION TO DATABRICKS
Account Console - Settings

https://accounts.cloud.databricks.com/

INTRODUCTION TO DATABRICKS
Workspace Admin
Key Responsibilities:

Managing identities in your workspace

Creating and managing compute resources

Managing workspace features and settings

INTRODUCTION TO DATABRICKS
Data Plane
Contains all of the customer's assets needed for computation with Databricks.

Data is stored in the customer's cloud environment

Clusters / SQL Warehouses run in customer's cloud tenant.

INTRODUCTION TO DATABRICKS
Control Plane
The portion of the platform that is managed and hosted by Databricks.

Orchestrates various background tasks in Databricks

Sends requests to Data Plane to create clusters, run jobs, etc.

INTRODUCTION TO DATABRICKS
Databricks Platform Architecture
Each cloud will have the same general
options to create a workspace:

Cloud Service Provider marketplace

Account Console

Using the Accounts API with Databricks

Programmatic deployment (e.g., Terraform)

1 https://docs.databricks.com/getting-started/overview.html

INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Setting up a
Databricks
workspace example
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Getting started with
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Compute cluster refresh

INTRODUCTION TO DATABRICKS
Create your first cluster
The first step is to create a cluster for your
data processing!

Configuration options:

INTRODUCTION TO DATABRICKS
Create your first cluster
The first step is to create a cluster for your
data processing!

Configuration options:

Cluster policies and access

INTRODUCTION TO DATABRICKS
Cluster Access

INTRODUCTION TO DATABRICKS
Create your first cluster
The first step is to create a cluster for your
data processing!

Configuration options:

Cluster policies and access

Databricks Runtime

Photon Acceleration

INTRODUCTION TO DATABRICKS
Create your first cluster
The first step is to create a cluster for your
data processing!

Configuration options:

Cluster policies and access

Databricks Runtime

Photon Acceleration

Node instance types and number

Auto-scaling / Auto-termination

INTRODUCTION TO DATABRICKS
Data Explorer
Get familiar with the Data Explorer! In this UI,
you can:

1. Browse available catalogs/schemas/tables

2. Look at sample data and summary

statistics

3. View data lineage and history

You can also upload new data by clicking the

"plus" icon!

1 Photo by Jakub Zerdzicki: https://www.pexels.com/photo/magnifier-loupe-17284804/

INTRODUCTION TO DATABRICKS
Create a notebook
Databricks notebooks:

Standard interface for Databricks

Improvements on open-source Jupyter

Support for many languages
Python, R, Scala, SQL

Magic commands (%sql)

Built-in visualizations

Real-time commenting and collaboration

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Data Engineering
foundations in
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Medallion architecture

INTRODUCTION TO DATABRICKS
Reading data
Spark is a highly flexible framework and can
read from various data sources/types.

Common data sources and types:

Delta tables

File formats (CSV, JSON, Parquet, XML)

Databases (MySQL, Postgres, EDW)

Streaming data

Images / Videos

INTRODUCTION TO DATABRICKS
Reading data
Spark is a highly flexible framework and can #Delta table
read from various data sources/types. spark.read.table()
#CSV files
Common data sources and types:
spark.read.format('csv').load('*.csv')
Delta tables #Postgres table
spark.read.format("jdbc")
File formats (CSV, JSON, Parquet, XML)
.option("driver", driver)
Databases (MySQL, Postgres, EDW)
.option("url", url)
Streaming data .option("dbtable", table)
.option("user", user)
Images / Videos
.option("password", password)
.load()

INTRODUCTION TO DATABRICKS
Structure of a Delta table
A Delta table provides table-like qualities to an open file format.

Feels like a table when reading

Access to underlying files (Parquet and JSON)

INTRODUCTION TO DATABRICKS
Explaining the Delta Lake structure

INTRODUCTION TO DATABRICKS
DataFrames
DataFrames are two-dimensional id customerName bookTitle
representations of data. 1 John Data Guide to Spark

Look and feel similar to tables 2 Sally Bricks SQL for Data
Engineering
Similar concept for many different data
3 Adam Delta Keeping Data Clean
tools
Spark (default), pandas, dplyr, SQL df = (spark.read
queries .format("csv")
Underlying construct for most data .option("header", "true")
processes .option("inferSchema", "true")
.load("/data.csv"))

INTRODUCTION TO DATABRICKS
Writing data
Kinds of tables in Databricks df.write.saveAsTable(table_name)

1. Managed tables
CREATE TABLE table_name
Default type
USING delta
Stored with Unity Catalog AS ...
Databricks managed

2. External tables df.write

Stored in another location .location('').saveAsTable(table_name)

Set LOCATION
CREATE TABLE table_name
Customer managed USING delta
LOCATION "<path>"
AS ...

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Data
transformations in
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
SQL for data engineering
SQL -- Creating a new table in SQL

Familiar for Database Administrators

CREATE TABLE table_name
(DBAs)
USING delta
Great for standard manipulations AS (
Execute pre-defined UDFs SELECT *
FROM source_table
WHERE date >= '2023-01-01'
)

INTRODUCTION TO DATABRICKS
Other languages for data engineering
Python, R, Scala #Creating a new table in Pyspark

Familiar for software engineers

spark
Standard and complex transformations .read
Use and define custom functions .table('source_table')
.filter(col('date') >= '2023-01-01')
.write
.saveAsTable('table_name')

INTRODUCTION TO DATABRICKS
Common transformations
Schema manipulation #Pyspark

Add and remove columns df

Redefine columns .withColumn(col('newCol'), ...)
.drop(col('oldCol'))

Filtering #Pyspark

Reduce DataFrame to subset of data

df
Pass multiple criteria .filter(col('date') >= target_date)
.filter(col('id') IS NOT NULL)

INTRODUCTION TO DATABRICKS
Common transformations (continued)
Nested data df
.explode(col('arrayCol')) #wide to long
Arrays or Struct data
.flatten(col('items')) #long to wide
Expand or contract

Aggregation df
.groupBy(col('region'))
Group data based on columns
.agg(sum(col('sales')))
Calculate data summarizations

INTRODUCTION TO DATABRICKS
Auto Loader
Auto Loader processes new data files as they
land in a data lake.

Incremental processing

Efficient processing

Automatic

spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "json")
.load(file_path)

1 https://www.databricks.com/blog/2020/02/24/introducing-databricks-ingest-easy-data-ingestion-into-delta-
lake.html

INTRODUCTION TO DATABRICKS
Structured Streaming
spark.readStream
.format("kafka")
.option("subscribe", "<topic>")
.load()
.join(table_df,
on="<id>", how="left")
.writeStream
.format("kafka")
.option("topic", "<topic>")
.start()

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Orchestration in
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Analytics Practitioner
What is data orchestration?
Data orchestration is a form of automation!

Enables data engineers to automate the end-to-end data life cycle

INTRODUCTION TO DATABRICKS
Databricks Workflows
Databricks Workflows is a collection of built-in capabilities to orchestrate all your data
processes, at no additional cost!

Example Databricks Workflow

1 https://docs.databricks.com/workflows

INTRODUCTION TO DATABRICKS
What can we orchestrate?
Data engineers/data scientists Data analysts

INTRODUCTION TO DATABRICKS
Databricks Jobs
Workflows UI
Users can create jobs directly from the
Databricks UI:

Directly from a notebook

In the Workflows section

1 https://docs.databricks.com/workflows/jobs

INTRODUCTION TO DATABRICKS
Databricks Jobs
Programmatic {
Users can also programmatically create jobs "name": "A multitask job",
using the Jobs CLI or Jobs API with the "tags": {},
Databricks platform. "tasks": [],
"job_clusters": [],
"format": "MULTI_TASK",
}

INTRODUCTION TO DATABRICKS
Delta Live Tables

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
End-to-end data
pipeline example in
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Overview of
Databricks SQL
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
INTRODUCTION TO DATABRICKS
Databricks for SQL Users
Databricks SQL
Data Warehousing for the Lakehouse

Familiar environment for SQL users

SQL-optimized performance (Photon)

Connect to your favorite BI tools

Comes built into the platform!

INTRODUCTION TO DATABRICKS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format

Separation of compute and storage Storage often tied to compute

INTRODUCTION TO DATABRICKS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format

Separation of compute and storage Storage often tied to compute

ANSI SQL Tech-specific SQL

INTRODUCTION TO DATABRICKS
Databricks SQL vs. other databases
Databricks SQL Other Data Warehouses
Open file format (Delta) Proprietary data format

Separation of compute and storage Storage often tied to compute

ANSI SQL Tech-specific SQL

Integrated into other data workloads Usually lacking advanced analytics

INTRODUCTION TO DATABRICKS
SQL in the Lakehouse Architecture

INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Getting started with
Databricks SQL
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
SQL Compute vs. General Compute
Designing compute clusters for data science is inherently different than designing compute
or data engineering workloads... for SQL workloads

import pyspark.sql.functions as F SELECT *

FROM user_table u
spark_df = (spark LEFT JOIN product_use p
.read ON u.userId = p.userId
.table('user_table')) WHERE country = 'USA'
AND utilization >= 0.6
spark_df = (spark_df
.withColumn('score',
F.flatten(...))
)

INTRODUCTION TO DATABRICKS
SQL Warehouse

INTRODUCTION TO DATABRICKS
SQL Warehouse
SQL Warehouse Configuration Options

1. Cluster Name

2. Cluster Size (S, M, L, etc.)

3. Scaling behavior

INTRODUCTION TO DATABRICKS
SQL Warehouse
SQL Warehouse Configuration Options

1. Cluster Name

2. Cluster Size (S, M, L, etc.)

3. Scaling behavior

4. Cluster Type

INTRODUCTION TO DATABRICKS
SQL Warehouse Types
Different types provide different benefits Classic

Most basic SQL compute

In customer cloud

Pro Serverless

More advanced features than Classic Cutting edge features

In customer cloud In Databricks cloud

Most cost performant

INTRODUCTION TO DATABRICKS
SQL Editor

INTRODUCTION TO DATABRICKS
Common SQL Commands
COPY INTO CREATE <entity> AS

Grab raw data and put into Delta Create a Table or View

The Extract of ETL The Transform in ETL

COPY INTO my_table CREATE TABLE events

FROM '/path/to/files' USING DELTA
FILEFORMAT = <format> AS (
FORMAT_OPTIONS ('mergeSchema' = 'true') SELECT *
COPY_OPTIONS ('mergeSchema' = 'true'); FROM raw_events
WHERE ...
)

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Databricks SQL
queries and
dashboards
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Databricks SQL Assets

INTRODUCTION TO DATABRICKS
Databricks SQL Assets

INTRODUCTION TO DATABRICKS
Visualizations
Lightweight, in-platform visualizations
Support for standard visual types

Ability to quickly comprehend data in a graphical way

INTRODUCTION TO DATABRICKS
Databricks SQL Assets

INTRODUCTION TO DATABRICKS
Dashboards
Lightweight, easily created dashboards
Ability to share and govern across your organization

Scalable and performant

INTRODUCTION TO DATABRICKS
Query Filters
Filters

Interactive query / dashboard components

that allow the user to reduce the size of the
result dataset SELECT *

Works on the client-side, so is very fast FROM nyctaxi.trips

WHERE pickup_zip = 10103
Supports single select, multi-select, text
AND dropoff_zip = 10023
fields, and date / time pickers

INTRODUCTION TO DATABRICKS
Query Parameters
Parameters

More flexible than filters, and supports

more kinds of selectors SELECT *
Allow the user to provide a value that is FROM nyctaxi.trips
input into the underlying SQL query text WHERE pickup_zip = 10103
AND dropoff_zip = 10023
Created in the query by using the {{ }}
AND {{ nullCheck }} IS NOT NULL
syntax

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Creating a
Databricks SQL
Dashboard
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Overview of
Lakehouse AI
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Lakehouse AI
Why the Lakehouse for AI / ML?
1. Reliable data and files in the Delta lake

2. Highly scalable compute

3. Open standards, libraries, frameworks

4. Unification with other data teams

1 https://www.databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html

INTRODUCTION TO DATABRICKS
MLOps Lifecycle

INTRODUCTION TO DATABRICKS
MLOps in the Lakehouse
DataOps
Integrating data across different sources
(AutoLoader)

Transforming data into a usable, clean

format (Delta Live Tables)

Creating useful features for models

(Feature Store)

INTRODUCTION TO DATABRICKS
MLOps in the Lakehouse
ModelOps
Develop and train different models
(Notebooks)

Machine learning templates and

automation (AutoML)

Track parameters, metrics, and trials

(MLFlow)

Centralize and consume models (Model

Registry)

INTRODUCTION TO DATABRICKS
MLOps in the Lakehouse
DevOps
Govern access to different models (Unity
Catalog)
Continuous Integration and Continuous
Deployment (CI/CD) for model versions
(Model Registry)

Deploy models for consumption (Serving

Endpoints)

INTRODUCTION TO DATABRICKS
Let's review!
I N T R O D U C T I O N T O D ATA B R I C K S
Using Databricks for
machine learning
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained

INTRODUCTION TO DATABRICKS
Planning and preparation

INTRODUCTION TO DATABRICKS
Planning for machine learning
What do I have? What do I want?

1. Data availability 1. Use cases

2. Business requirements 2. Legal and security compliance

3. Data scientists/data analysts 3. Business outcomes

INTRODUCTION TO DATABRICKS
ML Runtime
Extension of Databricks compute
Optimized for machine learning
applications

Contains most common libraries and

frameworks
scikit-learn , SparkML , TensorFlow

MLFlow

Works with cluster library management

INTRODUCTION TO DATABRICKS
Exploratory Data Analysis
import pandas as pd
pd.describe(df)

# Spark DF
df.summary()

dbutils.data.summarize()

import bamboolib as bam

INTRODUCTION TO DATABRICKS
Feature tables and feature stores
Raw Data Feature table
count category price shelf_loc rating count category price shelf_loc rating
4 horror 12.50 end 3 4 1 12.50 1 3
6 romance 13.99 top 4.5 6 2 13.99 2 4.5
12 sci-fi 16.50 bottom 5 12 3 16.50 3 5
31 romance 9.99 bottom 3.5 31 2 9.99 3 3.5
23 fantasy 24.99 top 4 23 4 24.99 2 4
18 horror 19.99 end 2.5 18 1 19.99 1 2.5
19 cooking 17.50 end 4.5 19 5 17.50 1 4.5
7 fantasy 12.99 top 3 7 4 12.99 2 3
37 sci-fi 14.99 bottom 5 37 3 14.99 3 5

INTRODUCTION TO DATABRICKS
Databricks Feature Store
Centralized storage for featurized datasets from databricks import feature_store
Easily discover and re-use features for
machine learning models fs = feature_store.FeatureStoreClient()

Upstream and downstream lineage

fs.create_table(
name=table_name,
primary_keys=["wine_id"],
df=features_df,
schema=features_df.schema,
description="wine features"
)

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Model training with
MLFlow in
Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained

INTRODUCTION TO DATABRICKS
Model training and development

INTRODUCTION TO DATABRICKS
Single-node vs. Multi-node
Single-node machine learning Multi-node machine learning

Great for experimenting and starting Great for production workloads

Easier initial setup Easier maintenance long-term

Hard to implement in production Highly scalable

INTRODUCTION TO DATABRICKS
AutoML
"Glass box" approach to automated
machine learning

Leverages open-source libraries

Creates models based on data and

targeted prediction

Provides notebook with generated code for

further

1 https://www.databricks.com/product/automl

INTRODUCTION TO DATABRICKS
MLFlow
Open-source framework import mlflow
End-to-end machine learning lifecycle
management with mlflow.start_run() as run:
# machine learning training
Track, evaluate, manage, and deploy

Pre-installed on ML Runtime!
mlflow.autolog()

mlflow.log_metric('accuracy', acc)

mlflow.lot_param('k', kNum)

INTRODUCTION TO DATABRICKS
MLFlow Experiments
Collect information across multiple runs in a single location
Sort and compare model runs

Find and promote the best model

MLFlow Experiments

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Deploying a model
in Databricks
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Machine Learning Lifecycle

1 https://www.datacamp.com/blog/machine-learning-lifecycle-explained

INTRODUCTION TO DATABRICKS
Model Deployment and Operations

INTRODUCTION TO DATABRICKS
Concerns with deploying models
Availability Evaluation

How will my end users or application use Are my users actually using my model?
the model?
Is my model still performing well?
Where do I need to put my model to access Do I need to retrain my model?
it?
Do I need a new model that is better?
Will the model be easy to understand or
use?

INTRODUCTION TO DATABRICKS
Model Deployment Process

INTRODUCTION TO DATABRICKS
Model Flavors
MLFlow Models can store a model from any
machine learning framework

Models are stored alongside different

configurations and artifacts

Models can be "translated" into another

kind of model based on needs. For
example:
scikit-learn

pyfunc

spark
tensorflow

INTRODUCTION TO DATABRICKS
Model Registry

INTRODUCTION TO DATABRICKS
Model Serving

1 https://www.databricks.com/product/model-serving

INTRODUCTION TO DATABRICKS
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Example end-to-end
machine learning
pipeline
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Let's practice!
I N T R O D U C T I O N T O D ATA B R I C K S
Wrap Up
I N T R O D U C T I O N T O D ATA B R I C K S

Kevin Barlow
Data Practitioner
Why the Lakehouse?

1 https://www.databricks.com/blog/2020/01/30/what-is-a-data-lakehouse.html

INTRODUCTION TO DATABRICKS
Databricks for data engineering
Apache Spark
Delta
Delta Live Tables
Auto Loader
Structured Streaming
Workflows

INTRODUCTION TO DATABRICKS
Databricks for data warehousing
SparkSQL Databricks for data warehousing

ANSI SQL
SQL Warehouses
Queries
Visualizations
Dashboards

INTRODUCTION TO DATABRICKS
Databricks for machine learning

INTRODUCTION TO DATABRICKS
Congratulations!
I N T R O D U C T I O N T O D ATA B R I C K S

Databricks Certified Professional Data Engineer Questions and Answers PDF Dumps
No ratings yet
Databricks Certified Professional Data Engineer Questions and Answers PDF Dumps
6 pages
Certified Data Engineer Associate
No ratings yet
Certified Data Engineer Associate
24 pages
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
No ratings yet
Databricks Certified Data Engineer Professional Dumps by Ball 21-03-2024 10qa Ebraindumps
19 pages
Advanced Data Engineering With Databricks
No ratings yet
Advanced Data Engineering With Databricks
154 pages
DatabricksDataEngineer Associate2024
75% (4)
DatabricksDataEngineer Associate2024
157 pages
Databricks Certified Data Engineer Associate 9
No ratings yet
Databricks Certified Data Engineer Associate 9
12 pages
ETL Processes Using PySpark
67% (3)
ETL Processes Using PySpark
7 pages
Azure Databricks Interview
100% (2)
Azure Databricks Interview
35 pages
Data Engineer Certification Questions1
100% (1)
Data Engineer Certification Questions1
22 pages
Databricks Questions
No ratings yet
Databricks Questions
23 pages
Databricks Certified Developer For Apache Spark 3.0 Practice Tests 540 Questions
0% (1)
Databricks Certified Developer For Apache Spark 3.0 Practice Tests 540 Questions
290 pages
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
From Everand
Ultimate Data Engineering with Databricks: Develop Scalable Data Pipelines Using Data Engineering's Core Tenets Such as Delta Tables, Ingestion, Transformation, Security, and Scalability
Mayank Malhotra
No ratings yet
Data Engineering With Databricks Da
100% (2)
Data Engineering With Databricks Da
232 pages
Databricks Certified Data Engineer Associate PDF
0% (1)
Databricks Certified Data Engineer Associate PDF
5 pages
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
From Everand
Mastering Azure Synapse Analytics: Learn how to develop end-to-end analytics solutions with Azure Synapse Analytics (English Edition)
Debananda Ghosh
No ratings yet
Sybex's Study Guide for Snowflake SnowPro Core Certification: COF-C02 Exam
From Everand
Sybex's Study Guide for Snowflake SnowPro Core Certification: COF-C02 Exam
Hamid Mahmood Qureshi
No ratings yet
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
From Everand
Practice Questions for Snowflake Snowpro Core Certification Concept Based - Latest Edition 2023
Exam OG
5/5 (1)
TechnoGrity Leveraging SOA
No ratings yet
TechnoGrity Leveraging SOA
28 pages
Data Analysis With Databricks
75% (4)
Data Analysis With Databricks
80 pages
Data Engineers Guide Apache Spark Delta Lake v3
No ratings yet
Data Engineers Guide Apache Spark Delta Lake v3
94 pages
Mastering Spark SQL PDF
100% (1)
Mastering Spark SQL PDF
1,776 pages
Oracle Trading Community Architecture
No ratings yet
Oracle Trading Community Architecture
4 pages
DataEngineeringDatabricks
No ratings yet
DataEngineeringDatabricks
139 pages
Databricks For The SQL Developer: Gerhard Brueckl
No ratings yet
Databricks For The SQL Developer: Gerhard Brueckl
40 pages
Databricks Certification Preparation Associate DE
50% (2)
Databricks Certification Preparation Associate DE
65 pages
Databricks Sparkconfig 1669383836
No ratings yet
Databricks Sparkconfig 1669383836
1 page
Databricks Certified Data Engineer Associate 4
No ratings yet
Databricks Certified Data Engineer Associate 4
13 pages
Databricks Question
No ratings yet
Databricks Question
89 pages
Databuildtoolpdf 220704 142715
No ratings yet
Databuildtoolpdf 220704 142715
39 pages
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
No ratings yet
8 Steps For A Developer To Learn Apache Spark and Delta Lake PDF
35 pages
Crack Your Databricks
No ratings yet
Crack Your Databricks
103 pages
De Mod 4 Build Data Pipelines With Delta Live Tables
No ratings yet
De Mod 4 Build Data Pipelines With Delta Live Tables
52 pages
Spark Interview Q&A
No ratings yet
Spark Interview Q&A
31 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Databricks Practice Questions
No ratings yet
Databricks Practice Questions
83 pages
Unity Catalog
No ratings yet
Unity Catalog
16 pages
Databricks Lab 1
100% (3)
Databricks Lab 1
7 pages
Spark SQL and DataFrames - Spark 2.2.0 Documentation
No ratings yet
Spark SQL and DataFrames - Spark 2.2.0 Documentation
35 pages
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
100% (2)
DATABRICKS DATA ENGG PRO CERTIFICATION DUMPS
41 pages
Airflow Introduction
No ratings yet
Airflow Introduction
9 pages
PySpark Reference Guide
No ratings yet
PySpark Reference Guide
2 pages
Azure Synapse Course Presentation
100% (1)
Azure Synapse Course Presentation
261 pages
PracticeExam DataEngineerAssociate
No ratings yet
PracticeExam DataEngineerAssociate
23 pages
azure DE interview que
100% (1)
azure DE interview que
25 pages
Delta Lake Cheat Sheet-1
100% (1)
Delta Lake Cheat Sheet-1
2 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Databricks Certified Data Engineer Associate Exam Guide
No ratings yet
Databricks Certified Data Engineer Associate Exam Guide
7 pages
Databricks Pyspark 1712042928
100% (1)
Databricks Pyspark 1712042928
21 pages
Slide 10 PySpark - SQL
No ratings yet
Slide 10 PySpark - SQL
131 pages
Apache Spark - DataFrames and Spark SQL
100% (2)
Apache Spark - DataFrames and Spark SQL
146 pages
DataBricks_Note_free__1736678274
No ratings yet
DataBricks_Note_free__1736678274
87 pages
4.1 The Spark UI - Databricks
No ratings yet
4.1 The Spark UI - Databricks
7 pages
Day1 Main
No ratings yet
Day1 Main
188 pages
Spark Interview Questions: Click Here
No ratings yet
Spark Interview Questions: Click Here
35 pages
ADB Course Catalog
No ratings yet
ADB Course Catalog
84 pages
Data Engineering Roadmap 2023
No ratings yet
Data Engineering Roadmap 2023
1 page
Modern Data Pipelines With Apache Airflow
No ratings yet
Modern Data Pipelines With Apache Airflow
36 pages
Azure Databricks Overview
No ratings yet
Azure Databricks Overview
23 pages
Introducing Snowflake Role Based Access Control
No ratings yet
Introducing Snowflake Role Based Access Control
11 pages
Interview DE by Company Azurelib Dot Com
No ratings yet
Interview DE by Company Azurelib Dot Com
14 pages
Pyspark Material
No ratings yet
Pyspark Material
16 pages
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
From Everand
Mastering Snowflake Platform: Generate, fetch, and automate Snowflake data as a skilled data practitioner (English Edition)
Pooja Kelgaonkar
No ratings yet
Streaming Concepts
No ratings yet
Streaming Concepts
94 pages
Introduction to Statistics
No ratings yet
Introduction to Statistics
210 pages
Introduction to DevOps
No ratings yet
Introduction to DevOps
146 pages
Introduction to Docker
No ratings yet
Introduction to Docker
136 pages
Analytical Transient Response and Propagation Delay Evaluation of The CMOS Inverter For Short-Channel Devices
No ratings yet
Analytical Transient Response and Propagation Delay Evaluation of The CMOS Inverter For Short-Channel Devices
6 pages
PMOS, NMOS and CMOS Transmission Gate Characteristics.
No ratings yet
PMOS, NMOS and CMOS Transmission Gate Characteristics.
13 pages
Sop - Ics2
No ratings yet
Sop - Ics2
3 pages
Sample Internship Report-2
No ratings yet
Sample Internship Report-2
16 pages
Ashish, Sr. Full Stack Java/J2EE Developer, Email: Professional Summary
No ratings yet
Ashish, Sr. Full Stack Java/J2EE Developer, Email: Professional Summary
8 pages
Open Source Concepts Type A: Very Short Answer Questions
No ratings yet
Open Source Concepts Type A: Very Short Answer Questions
1 page
Lab 07statediagram
No ratings yet
Lab 07statediagram
6 pages
BlueVoyant CloseUp Whitepaper Microsoft Azure Sentinel Deployment Q3 2021
No ratings yet
BlueVoyant CloseUp Whitepaper Microsoft Azure Sentinel Deployment Q3 2021
71 pages
UNIT 1 COA Part1
No ratings yet
UNIT 1 COA Part1
21 pages
FSF Fortinet
No ratings yet
FSF Fortinet
31 pages
What Is The SAP Business Connector (S AP BC) ?
No ratings yet
What Is The SAP Business Connector (S AP BC) ?
36 pages
IT Presentation SACHIN 5th Sem
No ratings yet
IT Presentation SACHIN 5th Sem
21 pages
PRM Process Map
100% (1)
PRM Process Map
5 pages
Blood Bank and Donor Management Report PHP
No ratings yet
Blood Bank and Donor Management Report PHP
91 pages
Important file 1-1
No ratings yet
Important file 1-1
44 pages
Extreme Networks Differentiating Solutions Guide
No ratings yet
Extreme Networks Differentiating Solutions Guide
2 pages
Rate Card Zalo Solutions
No ratings yet
Rate Card Zalo Solutions
14 pages
Install Kali Linux
No ratings yet
Install Kali Linux
31 pages
Sap Idoc Edi
No ratings yet
Sap Idoc Edi
12 pages
ISPM - Chapter 5 Project Scope Management
No ratings yet
ISPM - Chapter 5 Project Scope Management
36 pages
Sub Ict 1 A Level Aitel
No ratings yet
Sub Ict 1 A Level Aitel
8 pages
Promotion Management in SAP Business Objects BI Platform 4.2
No ratings yet
Promotion Management in SAP Business Objects BI Platform 4.2
18 pages
CASE-STUDY-LG-ORACLE
No ratings yet
CASE-STUDY-LG-ORACLE
5 pages
The Beginners Handbook
No ratings yet
The Beginners Handbook
273 pages
Lesson 4: Codes of Conduct
No ratings yet
Lesson 4: Codes of Conduct
9 pages
Course Contents: Introduction To SQL Server 2008 Administration Welcome To SQL Server 2008
No ratings yet
Course Contents: Introduction To SQL Server 2008 Administration Welcome To SQL Server 2008
5 pages
PDF Architecting a Modern Data Warehouse for Large Enterprises Build Multi cloud Modern Distributed Data Warehouses with Azure and AWS 1st Edition Anjani Kumar download
100% (2)
PDF Architecting a Modern Data Warehouse for Large Enterprises Build Multi cloud Modern Distributed Data Warehouses with Azure and AWS 1st Edition Anjani Kumar download
81 pages
Container Security Best Practices - Comprehensive Guide - Sysdig
No ratings yet
Container Security Best Practices - Comprehensive Guide - Sysdig
29 pages
Advanced Micro-Segmentation Services With Vmware NSX and Palo Alto Networks
No ratings yet
Advanced Micro-Segmentation Services With Vmware NSX and Palo Alto Networks
3 pages
Automation of 3D Modelling: Case Study
No ratings yet
Automation of 3D Modelling: Case Study
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.