Internal and Architecture
Internal and Architecture
Source: Microsoft
Azure Storage and Distribution
Source: Microsoft
Round Robin tables
CREATE TABLE [dbo].[Dates](
[Date] [datetime2](3) ,
[DateKey] [decimal](38, 0) ,
..
..
[WeekDay] [nvarchar](100) ,
[Day Of Month] [decimal](38, 0)
)
WITH
(
• Generally use to load staging tables CLUSTERED COLUMNSTORE INDEX
, DISTRIBUTION = ROUND_ROBIN
• Distribute data evenly across the table without )
;
additional optimization
• Joins are slow, because it requires to reshuffle data
• Default distribution type
Source: Microsoft
Hash Distribution Tables
Source: Microsoft
Hash Distribution Tables
Source: Microsoft
Avoid Data Skew
Even Distribution
Determines the method in which Azure SQL Data Warehouse spreads the data
across multiple nodes.
Azure SQL Data Warehouse uses up to 60 distributions when loading data into the
system.
Good Hash Key
Used as Join
condition
What Data Distribution to Use?
Type Great fit for Watch out if…
Use the smallest data type which will support your data
The goal is to not only save space but also move data as efficiently as possible.
Data types
Clustered
columnstore
Heap
Allows secondary
No compression
indexes
Sorted index on the data Fast singleton lookup
Clustered
B-Tree
Allows secondary
No compression
indexes
Table Partitioning
Table
Partitioning
Table partitions enable you to divide your data into
smaller groups of data
Improve the efficiency and performance of loading data
by use of partition deletion, switching and merging
Usually data is partitioned on a date column tied to when
the data is loaded into the database