0% found this document useful (0 votes)
59 views5 pages

Session-10-Data Loading in Snowflake

Uploaded by

bhaskar1082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views5 pages

Session-10-Data Loading in Snowflake

Uploaded by

bhaskar1082
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

Agenda:

--------
-Load types
-Bulk Loading/ Continous Loading
-Copy Command
-Transforming Data

Two types of loading in snow flake


1.Bulk loading using copy command
2.continous loading using snowpipe

1.Bulk loading using copy command:

-This option enables loading batches of data from files already avilable in cloud
storage(External stages)

- we have to create storage integration objects to extract data from these


cloud storages.

(or)
- Copying data files from a local machine to an internal stage (i.e snowflake )
before loading the data into table.
- Bulk loading uses virtualwarehouses
- Users are required to size the wareouse appropriately to accommodate expected
loadsusing the copy command.

2.Continous Loading using SNowpipe.


-Designed to load small volume of data (i.e micro batches) and incrementally make
them available for analysis.
-Live or real time data
-Snowpipe loads dat within minutes after files are added to stage and submitted
for ingestion.
-This ensures users have the latest data for busines analaysis.
-Snowpipe uses compute resources provided by snow flake ,it is a serverless task
and there will be seperate charge for these serverless tasks.
-The COPY statement in the pipe definition supports the same COPY Transformation
options as when bulk loading data.

COPY Command:
COPY INTO TABLENAME
FROM @STAGE
file_format=(...)
files=(filename1,filename2)

(or)
pattern='.*filepattern.*'
other_optional_props;

***********************************************************************************
*********************************************************************
COPY COMMAND:

Location of files:
Local environment ----------> Files are first staged in a snwoflake
stage ,then loaded into a table.
Amazon S3 ----------> Files can be loaded directly from any user
supplied S3 Bucket.
Google Cloud Storage ----------> Files can be loaded directly from any user
supplied Cloud Storage container.
Microsoft Azure ----------> Files can be loaded directly from any user
supplied AZURE Container.

File formats:
Delimited files(CSV,TSV etc) --------> Any valid delimiter is supported,default
is comma(i.e CSV)
JSON
AVRO --------> Includes automatic detection and processing
of staged AVRO files that were compressed using snappy.
ORC --------> Includes automatic detection and processing
of staged ORC files that were compressed using snappy or zlib.
Parquet --------> Includes automatic detection and processing
of staged Parquet files that were compressed using snappy.
XML --------> Supported as a preview feature.

Other way to load data by using ETL tools like


- MATELLION
- DATA STAGE
- INFORMATICA
- HEVO
- AZURE DATA FACTORY
- AZURE SYNAPSE ETC

***********************************************************************************
*********************************************************************
Simple transformations during data load:

Snow flake supports transforming data while loading it into a table using COPY
command .options includes
- Column reordering
- Column Omssion
- String operation
- Other functions
- Sequence Numbers
- Auto Increment fields

1. create Database

CREATE DATABASE MYDB;

2.use Database

USE DATABASE MYDB;

3.create table

CREATE OR REPLACE TABLE MYDB.PUBLIC.LOAN_PAYMENT


(
"Loan_ID" STRING,
"Loan_status" STRING,
"Principal" STRING,
"terms" STRING,
"effective_date" STRING,
"due_date" STRING,
"paid_off_time" STRING,
"past_due_days" STRING,
"age" STRING,
"education" STRING,
"Gender" STRING
);

SELECT * FROM PUBLIC.LOAN_PAYMENT;

--Loading the data from S3 bucket


COPY INTO PUBLIC.LOAN_PAYMENT
FROM S3://bucketsnowflake3/Loan_Payments_data.csv
file_format = ( type=csv, field_delimiter =',',skip_header=1);

- Validate the data

SELECT * FROM PUBLIC.LOAN_PAYMENT;

- check the count

SELECT COUNT(*) FROM PUBLIC.LOAN_PAYMENT;

***********************************************************************************
*********************************************************************
//Create a schema for external stage

CREATE OR REPLACE SCHEMA MYDB.external_stages;

//Publicly accessble staging area


CREATE OR REPLACE STAGE MYDB.external_stages.aws_ext_stage

url='s3://bucketsnowflake3';

//listing the files in external stage


list @MYDB.external_stages.aws_ext_stage;
list @aws_ext_stage;

//CASE1:Just viewing data from ext stage:


------------------------------------------

select $1,$2,$3,$4,$5,$6 from @MYDB.external_stages.aws_ext_stage/Orderdetails.csv;

//Giving alias names to fields

select $1 as OID,$2 as AMT,$3 as PFT,$4 as QNT,$5 as CAT,$6 as SUBCAT from


@MYDB.external_stages.aws_ext_stage/Orderdetails.csv;

select $1 as OID,$4 as QNT,$2 as AMT from


@MYDB.external_stages.aws_ext_stage/Orderdetails.csv;

//Transforming the data while loading

CASE2:load only required fields


-------------------------------
CREATE OR REPLACE TABLE MYDB.PUBLIC.ORDERS_EX
(
ORDER_ID VARCHAR(30),
AMOUNT INT
);

COPY INTO MYDB.PUBLIC.ORDERS_EX


FROM (select s.$1,s.$2 from @MYDB.external_stages.aws_ext_stage s)
file_format= (type= csv field_delimiter=',' skip_header=1)
files=('OrderDetails.csv');

SELECT * FROM MYDB.PUBLIC.ORDER_EX;

CASE3:Applying basic transformation by using functions


-------------------------------------------------------

CREATE OR REPLACE TABLE MYDB.PUBLIC.ORDERS_EX


(

ORDER_ID VARCHAR(30),
PROFIT INT,
AMOUNT INT,
CAT_SUBSTR VARCHAR(5),
CAT_CONCAT VARCHAR(60),
PFT_OR_LOSS VARCHAR(10)
);

//Copy command using a SQL function

COPY INTO MYDB.PUBLIC.ORDERS_EX FROM


(
select
s.$1,
s.$3,
s.$2,
substring(s.$5,1,5),
concat($5,$6),-- or simply $5||$6
CASE WHEN s.$s3 <=0 THEN 'LOSS' ELSE 'PROFIT' END
from @MYDB.external_stages.aws_ext_stage s
)
file_format=(type = csv field_delimiter=',' skip_header=1)
FILES=('OrderDetails.csv');

SELECT * FROM MYDB.PUBLIC.ORDERS_EX;

CASE4: Loading sequence numbers in columns


------------------------------------------
//Create a sequence

create sequence seq1;

CREATE OR REPLACE TABLE MYDB.PUBLIC.LOAN_PAYMENT


(
"SEQ_ID" number default seq1.nextval,
"LOAN_ID" STRING,
"LOAN_STATUS" STRING,
"PRINCIPAL" STRING,
"TERMS" STRING,
"EFFECTIVE_DATE" STRING,
"DUE_DATE" STRING,
"PAID_OFF_TIME" STRING,
"PAST_DUE_DAYS" STRING,
"AGE" STRING,
"EDUCATION" STRING,
"GENDER" STRING
);

//loading the data from s3 bucket


COPY INTO PUBLIC.LOAN_PAYMENT
("LOAN_ID","LOAN_STATUS","PRINCIPAL","TERMS","EFFECTIVE_DATE","DUE_DATE","PAID_OFF_
TIME","PAST_DUE_DAYS","AGE","EDUCATION","GENDER")
FROM s3://bucketsnowflake3/Loan_payments_data.csv
file_format = (type = csv field_delimiter = ',' skip_header=1);

// validate the data

SELECT * FROM PUBLIC.LOAN_PAYMENT;

CASE5: Using auto increment


----------------------------

CREATE OR REPLACE TABLE MYDB.PUBLIC.LOAN_PAYMENT2


(
"LOAN_SEQ_ID" number autoincrement start 1001 increment 1,
"LOAN_ID" STRING,
"LOAN_STATUS" STRING,
"PRINCIPAL" STRING,
"TERMS" STRING,
"EFFECTIVE_DATE" STRING,
"DUE_DATE" STRING,
"PAID_OFF_TIME" STRING,
"PAST_DUE_DAYS" STRING,
"AGE" STRING,
"EDUCATION" STRING,
"GENDER" STRING
);

//loading the data from s3 bucket


COPY INTO PUBLIC.LOAN_PAYMENT2
("LOAN_ID","LOAN_STATUS","PRINCIPAL","TERMS","EFFECTIVE_DATE","DUE_DATE","PAID_OFF_
TIME","PAST_DUE_DAYS","AGE","EDUCATION","GENDER")
FROM s3://bucketsnowflake3/Loan_payments_data.csv
file_format = (type = csv field_delimiter = ',' skip_header=1);

// validate the data

SELECT * FROM PUBLIC.LOAN_PAYMENT2;

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy