0% found this document useful (0 votes)
4 views15 pages

ETL Notes

The document provides an overview of data types, database management, and cloud services including IaaS, PaaS, and SaaS. It discusses OLTP and OLAP systems, ETL processes, and roles such as Database Administrator and Data Engineer. Additionally, it covers SQL commands, indexing, and various Azure storage solutions like Azure Table Storage, Blob Storage, and Cosmos DB.

Uploaded by

srikiarya123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views15 pages

ETL Notes

The document provides an overview of data types, database management, and cloud services including IaaS, PaaS, and SaaS. It discusses OLTP and OLAP systems, ETL processes, and roles such as Database Administrator and Data Engineer. Additionally, it covers SQL commands, indexing, and various Azure storage solutions like Azure Table Storage, Blob Storage, and Cosmos DB.

Uploaded by

srikiarya123
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

What is Data : - Data is a collection of Fact and figures.

OLTP : - current business Env. The data are being modified is a part of online transaction processing.

The data which is used for transactional purpose is called OLTP data.

The data are classified in the strcutured,semi-Structured and unstructured.

The database hold tables in the form of rows and columns is called relational database.

Semi-Structured : - JSON File,XML,Azure Table(schema free),

UnStructured : - BLOB,CLOB,BFILE etc(photo,image,videos,audios)

Transactional Workload : - 80% write operation and 20% read opeation .

OLAP(Online analytical processing) : - This is Datawarehouse which is used to generate report for analysis purpose. we
have 80% read operation and 20% write operations.

ETL(Extract Transform and Load) : -

E(Data Ingestion) : - we can extract the data from different source(multiple databases or any other source)

T(Data Transformaion or processing) : - Transform the data at staging area(Removing any issue in the data)

L(Loading data into the Datawarehouse) : - Loading data into the Datawarehouse for reporting and analysis purpose.

ETL(Extract Transform and Load) and ELT(Extract ,Load and Transform)

ELT(Extract Load and Transform) : - Extracting data and then loading into the Datawarehouse and after some time we
can transform the data.

ETL is good for big org. because it support incremental load and transform at staging area and then loading into the
Datawarehouse.

Data Visulaization : - Data represented in tables such as rows and coumns or as a document,We can use powerbi to
design a rerort using the Datawarehouse.
Roles in Database

-------------------

Database Administrator : - Database admin is responsible for the design ,implement,maint. and operational aspects of
on-premises and cloud based database solutions built on Azure data services.

Managing Database

Improving Performance of the Database

managing backup and recovery

Managing Alert and notification/Creating alert and notification

Restore Database in case of failure

Making Database highly available

Data Engineer : -

Compute and Storage : - Resources(CPU and memory) that we provide to the azure sqldata

====================

Two mode of Authentication

------------------------------

SQL Connection

Active Directory Connection

Azure Database : - Azure Database has only sql login or active directory connection.

Indexing

-----------

Indexing are using to fetch data as fast as possible.

Index entry having row number

index entry having page number

index entry having extent number

index entry having datafile number

select sal from Test where id=101

View

----

View is a schema object which provide the access of specific records.View is a virtual table based on the result set of a
query.
View is a window on a specified rows in an underlying table.

View is always use to provide security on the object level

View doesn't have any data,When-ever user fire a query on the view,It always fetch data from the base table and return
to the user.

User1 (id,sal,deptno)

User2 (id,deptno)

We can create a view on multiple tables to provide single location of multiple table to the user.

==========================================

==========================================

IaaS(Infrastrcuture as a Service) : - Virtual Machine is a good example of Iaas.

-- we have control over the entire virtual machine

-- we can configure backup policies(Full backup+Differential backup+Transactionlog) for sql server

-- Storage we can choose

-- O/S as per our requirement we can select

-- Any Specific version of SQL server we can select

-- Some certain application requied specific version of sqlsever.

-- We can install SSIS/SSRS/SSAS/PowerBI or any other application in the Virtual Machine.

-- we can create our required virtual network

PaaS(Platform as a Service) : - Azure sqldatabase and managed instance

-- Azure Database are specific to be use

-- we cannot peform full backup for the database

-- we cannot perform differential backup for the database

-- we cannot perform transactionlog backup

-- we cannot select specific version of the sql,It always select latest verion of the sql

-- We can monitor the Azure SQLDB

-- We can create alert and notificalion for the SQLDB

-- we can go for Copy only backup for the database


Managed Instance is also part of PaaS.

-- It almost same as on-premises sqlserver

-- 99% functionlality are same as on prem-sqlserver

-- we can only perform "copy only backup"

-- Replication/Sqlserver log /System databases/sql server agent job are accessible

-- 99% functionalality are same as complete version of sqlserver

-- We cannot access the O/S and Datafile

SaaS(Software as a service) : - It is almost same as Paas in terms of database.

==============================================================

==============================================================
Day02

------

A Database is a collection of data.A database can be as simple as desktop spreadsheet or as complex as a global system
holding petabytes of highly strcutured information

Database can be semi-Structured or unstructured,comprising a mass of raw ,unprocessed data.

Service Tier for The Database

-------------------------------

CPU Purchasing Model

-----------------------

General Purpose : - We can use upto 80VCPU for the database and we have 4TB limited storage to be use in this.

Hyperscale : - Hypserscale support upto or even more than 100TB of the storage size which approx. 7.20/- per GB
and per Month Cost.

Business Critical :- Uptp 80VCPU can be used as general purpose but its provide faster storage along with high
availability which provide secondary region to store the data

DTU Purchasing Model

------------------------

DTU(Database Trasaction Unit) : -it is a combination of CPU and memory.

Basic : - 5 DTU can be used as a maximum capacity

Standard Tier : - 3000 DTU

Premium Tier : - 4000 DTU

IaaS(InfraStructure as a Service) : -

SQL Server in Virtual Machine

------------------------------

-- Virtual Machine is an example of the Iaas.

-- We can access the O/S,Even we can install any of the O/S that we required.

-- We can install any supported version of the Sql Server

-- We can Install any application or server software as per our requirement, SSAS,SSIS,SSRS,PowerBI

-- We can configure Full Backup+differential Backup+TransactionLog Backup

-- We can Configure Windows cluster along with SQL Server Cluster

-- We have Full access of the Sql Server Features as its completely available for us

-- we have access of the virtual storage where our datafiles are stored.

-- We can access the datafiles

-- we can access the O/S


PaaS(Platform as a Service) : -

Azure SQLDB

---------------

-- We can access the latest Database only.

-- we cannot install any version of the database or server.

-- we can Monitor the Database serer using Azure portal.

-- We can Configure Alert and Notification using Azure portal.

-- We can use Runbook to configure Automation task but we cannot use SQl Server Agent job

-- We cannot access all the features of sqlserver

-- we cannot configure full backup+Differential Backup+TransactionLogBackup

-- We can configure Copy only Backup

-- We can use Query store to Monitor the Execution Plan and statistics

-- In AzureSQL DB,We can see only Master Database.

-- We cannot access the Datafile and O/S ,O/S Files

Managed Instance

-----------------

Managed Instance is also an example of PaaS offering.But the difference between Azure SQLDB and Managed Instance
are below.

-- In Managed Instane we have access of the 99% feature of sqlsercer

-- In Managed Instance,we can access all the system database(Master,Model,Tempdb,MSDB)

-- We cannot access the Database file,O/S,O/S files

-- we cannot configure the full backup+differential Backup+Transactionlog Backup

-- We can configure the Copy Only backup

-- In Managed Instance,sqlserver agent job,operator,proxy,credential,sqlsever login,replication,SSIS package


deployment are available.

SaaS(Software as a Service) : - In Terms of SQL Database,AzureSQLDB and Managed instance is an example of Saas.

-- Outlook Email

-- Gmail is an example of SaaS.


--------------------------------------------------------------

--------------------------------------------------------------

SQL(Structure Query Language)

--------------------

DDL(Data Defination Language) : - Create/Alert/Drop/Truncate

DML(Data Manipulation Language) : - Insert/Update/Delete/Merge/Select

DCL(Data Control Language) : - Grant/Revoke/Deny

TCL(Transaction Control Language) : - Commit/Rollback/Savepoint

-- DDL : - Create/Alert/Drop/Truncate

-- Creating a Table

Create Table Test(id int,ename varchar(10),sal int,deptno int)

select * from TEst

-- Adding Column In The Table

alter table Test

add job Varchar(10)

select * from Test

-- Droping A Column

alter table Test

drop column Job

select * from Test

-- Droping A Table

drop Table Test

select * from Test

-- DML(Data Manipulation Language) : -

Create Table Test(id int,ename varchar(10),sal int,deptno int)


-- Inserting Data into the Table

insert into Test(id,ename,sal,deptno)

values(101,'Smith',2342,10),(102,'Martin',2342,20),

(103,'King',2422,30)

select * from Test

-- Updating Table Data

--begin Tran

update Test

set sal=9000

where id=101

select * from Test

-- Deleting Data

delete from Test Where id=101

select * from Test

-- Difference between Truncate and Delete

-- deleted data can be rollback or doesn't release the space immediately

-- Delete can be executed on single row but Trucated can be executed on entire table

truncate table Test -- DDL Statement and it is Autocommi.

-- Truncate is always faster than the Delete,It free or release the space immeditely

-- Delete statement doesn't free the resource or page where as truncate relase the space immeditely

select * from Test

-- DCL(Data Control Language) : - Grant and Revoke

-- TCL(Transaction Control Language) : - Commit/Rollback

Indexes : - Index is very useful to fetch data as fast as possible.

Types of Indexes

------------------

Clustered Index : - It store data in asc. order and keep data at leaf level page.It follow the B-Structure.(DP-300),It store
all columns in single page

columnStore Index :- It store each column in seprate page and in compressed form that is why it is faster to read,It is
much faster to fetch data from the data pages.
Non-clustered Index :- It store data in B-Tree Structure and point always to the Clustered Indexes.

Clustered ColumnStore Index

Non-clustered ColumnStore Index

Virtual Machine

----------------

In Virtual Machine Creation we have Azure Availability Zone and Avaialbility sets

Availability Zone : - we have upto 3 Zone in a signle region

Zone1(Datacenter1)

Zone2(Datacenter2)

Zone3(Datacenter3)

Availability Sets : - We have multiple nodes in same datacenters

All nodes are having seprate power,cooling and network etc.We can use these nodes while we perform patching
,upgrade or high availability

All Open Source Databases comes under the PaaS offering.

--------------------------------------------------------

Functionality(Service and management of Database) almost same as we have in AzureSQL Database

Elastic Pool

--------------

DB1 : - This Database used in Day time and occupy 2 CPU (9-5)

DB2 : - This Database used in night time and occupy 2 CPU(8-4)

2 Databases and 4 CPU Allocation.

Elastic Pool (2 CPU)

---------------

We can allocate resource to the Elastic Pool and elastic pool allocate the cpu and resource the database which is need it
most.

DB1 : - It will get 1.5% of 2 CPU in Day time.

DB2 : - It Will get 1.5% of 2 CPU in night time

-- Both Database must not be used in same time(peak utilization)


In Virtual Machine : - We have automated backup so we can configure automated backup for sql server which store in
the storage account upto 10 years.

====================================================

====================================================

Module03

--------

Data comes in all shapes and sizes and can be used for a large number of purpose.

Azure Table : - Azure Table can use to store and implement the NoSQL key value model.In this model(Azure Table) data
are stored as a fields.

What is Azure Table Storage

-----------------------------

this is non-relational Data management system.In the Azure Table Storage,Items are referred as a rows and fields are
called as a columns.

Use Case and management of Azure Table Storage

-----------------------------------------------

-- Row insertion is very fast.

-- Data Retrival is fast ,if we sepcify the partition and rows keys as query criteria.

--A Table can hold semi-structured data

--It is schema free and relationship free ,That is why it is much faster.

-- Its simple to scale,it takes the same time to insert data in an empty table or table with billions of entries

-- If we specifiy the paritionkey and rows keys that make it much faster.

Disadvantage

------------

-- Consistency needs to be given consideration as transactional updates across multiple entities

-- Its difficult to filter and sort on non-key data,query that search is based on non-key fields could result in full table
scans.

Azure BLOB Storage

----------------------

Block Blobs : - Each in block in block blobs can vary in size ,upto 100MB.A block blob can contain upto 50,000 blocks
,giving a maximum size of over 4.7TB.

Page blobs: - A page blob is org. as a collection of fixed size 512 bytes pages.A page blob is optimized to support random
read and write operations.

Azure Mainly use the page blob to implement the virtual disk storage for virutal machine.
Append Blobs : - IT support the append operations,we can only add blocks to the end of an append blob ,updating or
deleting existing blocks

Maximum capacity of append blob is 195Gb.

Types of Disk

---------------

Standard HDD Disk : - more than 20 ms l This type of storage is always good for the backup files.

Standard SSD Disk :- 10-20ms l,this type of storage are good for the Test and Dev.

Premium SSD Disk :- 5-10ms L,this type of storage are good for the Sql server configuration.(Storing Datafiles and
system datafiles )

Ultra Disk :- 1-2ms l for fully configured sql server,We should store TempDB and TransactionLog int the the Ultra Disk

USe case and management benefit of using Azure blob Storage

-------------------------

-- Streaming video and audio

-- Storing files for distributed access

-- Storing data for analysis by an on-premises or Azure hosted service

File Share

------------

A file share enables us to store a file on one computer and grant access to that file to users and applications running on
other computers.This process can work well for computers in the same local area network,but doesn't work scale well
as the number of usrs increase,

Azure file Share Storage

--------------------------

Azure File share storage creates file share in the cloud and access these file share from anywhere with an internet
conneciton.

the application can be running on-premises or in the cloud.we can control access to share in azure file storage using
authentication and authorziation methods.
Azure Queue Storage

--------------------

Azure Queue storage canbe u sed to store the large number of messages from any source.we can access these messages
from anywhere in the world.

we can use HTTP or HTTPS to access.

the size of the Queue messages is 64kb in size.Capacity limit can be depend on the storage capacity.

Data Lake Storage

------------------

Data Lake storage is a massively scalable and secure data lake for higher performance analytics workloads.Azure data
lake storage was formely know and is sometimes called storage account.

Azure Datalake storage provide the single storage platform for Data analytics which we can be use in synapase analytics
using polybase.

We can store our data in folder format along with ACL.

We can store very huge amount of data in the data lake store as compare to the storage account

it store data higher than the storage account.

Azure Datalake Analytics

----------------------------

It is an on-demand analytics paltform for big data.Users can develop and run massively parallel data transformation and
processing programs in SQL.

Two Type of Keys

----------------

1. Vendor provided Key(azure provided Key)

2. Customer provided key also we can use(BYOK)


Azure Cosmos DB

---------------------

Azure CosmosDb is multimodel nosql database.we can store Semi-Structured files in it.CosmosDB manages data as a
partitioned set of documents.

Document is a collection of fields which is identified by key value.

Many Document database use JSON format to work.

CustomerId" : "101"

"Name"

"Customername" :"Allen"

Document can hold upto 2mb of data including small binary objects.If we need to store larger blobs as part of a
document,we can use Azure Blob Storage and add a reference to the blob in the document.

DP-420(Cosmod DB)

https://www.examtopics.com/exams/microsoft/dp-900/ (For certification)

Module04

--------

DataWarehouse : - Datawarehouse gather data from many different sources within an org. This data is then used as the
source for analysis.

ETL(Extract,Stage and Transform , load)

ELT(Extract,Load and Transform)

Datawarehouse : - This is a relational database management system.This is called OLAP(Online analytical


processing).Datawarehouse have to handle big datga.Big data is the term used for large quantity of data collected in
escalating volumes.

Normally we ingest the data from the different source and than we can stage the data at staging area and last we can
load the data into the datawarehouse.
-- If it is a On-prem env.

SSIS

-- If it is Azure cloud

Azure DataFactory

Azure snyapase analytics

Azure Databric Cluster

Combined Batch and stream processing

----------------------------------------

Data Lake

------------

Data Lake is a repository for large quantities of raw data.Because the data is raw and unprocessed ,its very fast to load
and update.

-- Datalake storage org. our files into directories and subdirectories for improved firl org. .Blob stroag only a directory
structure.

-- Data Lake storage support the protable O/S files and directories

-- Azure Datalake storage is compatible with the Hadoop Distributed File system.

-- Azure Datastorage can be used in azure synapase analytics.

Azure Synapse Analytics

--------------------------

Azure Synapse analytics is an analytics engine.It is design to process the very large amount of data very fast.

We can ingest data from external source like flat files,Azure data lake or other database management systems and then
we can transform,aggregate the data as well.

ETL : - Extract ,Transform and load

E : - Extracting the Data from different source(Source can be Azure SQLDB,Source can be On-prem sqlserver,Source can
be csv file,it can be Oracle,Mysql,website)
T : - Transforming the data,Remvoing any duplicate value,removing any incosistent value,removing any invalid value
,removing any null in the data

(Data quality service and Master Data service)

Staging area can be database or any other location

L : - Loading Data into the Datawarehouse(Target Database system).

ELT : - Extract Load and Transform

E : - Extracting the Data from different source(Source can be Azure SQLDB,Source can be On-prem sqlserver,Source can
be csv file,it can be Oracle,Mysql,website)

L : - Loading the Data into the Datawarehosue first.

T : - After loading the data into the datawarehouse,We can transform the data.

Transforming the data,Remvoing any duplicate value,removing any incosistent value,removing any invalid value
,removing any null in the data

(Data quality service and Master Data service)

SSIS : - SQL Server Integration Service.

this is on-premises way to perform ETL(using visual studio to create a pipline)

PowerBI Desktop

PowerBI Services

PowerBI Apps

Azure Data Factory uses

---------------------------

1. Creating a Storage Account -- Done

2. Uploading emp.csv file -- Done

3. Creating Azure SQL Database -- Done

4. Creating a blank table in Azure SQLDB --done

5. Creating Azure Datafactory -- Done

6. Creating Pipline to ingest data from storage account to Azure SQLDB. --

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy