dbt Zero to Hero Guide
dbt Zero to Hero Guide
Python 3.x
PyCharm IDE
Snowflake Trial Account
Sample Project
Requirements: Link
Solved Queries: Link
2. Folder Structure
Created automatically by the “dbt init” command )
Create the basic dbt model
1. dbt_project.yml
name: 'my_new_project'
version: '0.1.0'
config-version: 2
require-dbt-version: ">=0.17.2"
# This setting configures which "profile" dbt uses for this project.
profile: 'datawarehouse'
2. Profile.yml
datawarehouse:
target: trial_acc
send_anonymous_usage_stats: false
outputs:
trial_acc:
type: snowflake
account: um36278.ap-southeast-1
user: athangachamy
password: **********
#authenticator: externalbrowser
database: dbt_database_dev
warehouse: compute_wh
schema: dbt_pract_schema
role: accountadmin
threads: 3
client_session_keep_alive: False
3. src_sample_db.yml
version: 2
sources:
- name: tpch_sf1
database: SNOWFLAKE_SAMPLE_DATA
schema: TPCH_SF1
tables:
- name: customer
4. sample_select_demo_records.sql
{{ config(
alias = 'new_customer',
materialized = 'table',
)
}}
with sample_customer as
(
select * from {{ source('tpch_sf1','customer')}}
)
select * from sample_customer
dbt Commands
help
version
init
deps
clean
debug
run
Description - Runs the dbt model
Ex: Running all the models:
dbt run –-profiles-dir profiles/snowflake
Running a specific model
dbt run –-profiles-dir profiles/snowflake -–models
demo_snowflake_db.sample_select_demo_records
Sample Output:
test
seed
snapshot
build
Assume you want to run an entire DAG of models together (instead of running them one
after another). We can trigger the `dbt run` followed by `dbt test` command on top of the DAG. But
by doing this we are running the model end to end and then testing the data end to end. This way
we are allowing to flow the data from one node to another to another till the end of the DAG. Just in
case if there is a wrong data in an intermediate model, that invalid data will flow to downstream
nodes and infecting the downstream nodes as well. To avoid, it would good to test the model as
soon as it finished running( i.e before running the downstream model). To achieve this we can use
the `dbt build` command. This command will run the model and test them immediately. When we
run a DAG of models, the command will run the model and test it immediately if there is any wrong
data, the job will fail immediately and stop running the downstream nodes. Once the data is fixed,
we can trigger the job from the point it failed.
https://learn.getdbt.com/learn/course/dbt-fundamentals/tests-30min/building-tests?page=7
compile
parse
clone
docs
environment
ls (list)
retry
rpc
run-operation
show
source
Sources are tables/views used in the models whose schema name, database name are
configured in the source section of the YAML file. They can be referred in multiple models.
Advantages of sources over configuring them as hard coded in the model
Ability to choose the models based on sources
dbt run --select "source:snowplow+"
Ability to find the source freshness
Documentation
Data Lineage