0% found this document useful (0 votes)
85 views6 pages

dbt Zero to Hero Guide

The document outlines the prerequisites for setting up dbt, including Python, PyCharm, and a Snowflake account. It details the initial project setup, including commands for project initialization, folder structure, and configuration files. Additionally, it explains dbt commands for running models and emphasizes the importance of testing models immediately after execution to maintain data integrity.

Uploaded by

Raja Thangarajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views6 pages

dbt Zero to Hero Guide

The document outlines the prerequisites for setting up dbt, including Python, PyCharm, and a Snowflake account. It details the initial project setup, including commands for project initialization, folder structure, and configuration files. Additionally, it explains dbt commands for running models and emphasizes the importance of testing models immediately after execution to maintain data integrity.

Uploaded by

Raja Thangarajan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

dbt setup pre-requisites

 Python 3.x
 PyCharm IDE
 Snowflake Trial Account

Sample Project
 Requirements: Link
 Solved Queries: Link

Initial Project Setup


1. Project Initialization:
Outcome of the “dbt init” command

2. Folder Structure
Created automatically by the “dbt init” command )
Create the basic dbt model
1. dbt_project.yml
name: 'my_new_project'
version: '0.1.0'
config-version: 2
require-dbt-version: ">=0.17.2"
# This setting configures which "profile" dbt uses for this project.
profile: 'datawarehouse'
2. Profile.yml
datawarehouse:
target: trial_acc
send_anonymous_usage_stats: false
outputs:
trial_acc:
type: snowflake
account: um36278.ap-southeast-1
user: athangachamy
password: **********
#authenticator: externalbrowser
database: dbt_database_dev
warehouse: compute_wh
schema: dbt_pract_schema
role: accountadmin
threads: 3
client_session_keep_alive: False
3. src_sample_db.yml
version: 2

sources:
- name: tpch_sf1
database: SNOWFLAKE_SAMPLE_DATA
schema: TPCH_SF1
tables:
- name: customer
4. sample_select_demo_records.sql
{{ config(
alias = 'new_customer',
materialized = 'table',
)
}}

with sample_customer as
(
select * from {{ source('tpch_sf1','customer')}}
)
select * from sample_customer

Preparing the Snowflake Environment

dbt Commands
help
version
init
deps
clean
debug
run
Description - Runs the dbt model
Ex: Running all the models:
dbt run –-profiles-dir profiles/snowflake
Running a specific model
dbt run –-profiles-dir profiles/snowflake -–models
demo_snowflake_db.sample_select_demo_records
Sample Output:
test
seed
snapshot
build
Assume you want to run an entire DAG of models together (instead of running them one
after another). We can trigger the `dbt run` followed by `dbt test` command on top of the DAG. But
by doing this we are running the model end to end and then testing the data end to end. This way
we are allowing to flow the data from one node to another to another till the end of the DAG. Just in
case if there is a wrong data in an intermediate model, that invalid data will flow to downstream
nodes and infecting the downstream nodes as well. To avoid, it would good to test the model as
soon as it finished running( i.e before running the downstream model). To achieve this we can use
the `dbt build` command. This command will run the model and test them immediately. When we
run a DAG of models, the command will run the model and test it immediately if there is any wrong
data, the job will fail immediately and stop running the downstream nodes. Once the data is fixed,
we can trigger the job from the point it failed.
https://learn.getdbt.com/learn/course/dbt-fundamentals/tests-30min/building-tests?page=7
compile
parse
clone
docs
environment
ls (list)
retry
rpc
run-operation
show
source
Sources are tables/views used in the models whose schema name, database name are
configured in the source section of the YAML file. They can be referred in multiple models.
Advantages of sources over configuring them as hard coded in the model
Ability to choose the models based on sources
 dbt run --select "source:snowplow+"
 Ability to find the source freshness
 Documentation
 Data Lineage

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy