0% found this document useful (0 votes)
25 views50 pages

Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane

Snowflake is a cloud-based data warehouse that allows organizations to store and analyze their data without having to manage the underlying infrastructure. It runs entirely on cloud platforms like AWS and Azure, eliminating the need for hardware and complex software management. Snowflake uses a centralized data repository that can be accessed from compute nodes, providing scalability. It offers features like SQL support, a web GUI, security, and connectors to load and analyze large volumes of data with no upfront costs or long-term commitments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views50 pages

Unit 4 Databases, Cloud & Snowflake: Prof. Thushara Weerawardane

Snowflake is a cloud-based data warehouse that allows organizations to store and analyze their data without having to manage the underlying infrastructure. It runs entirely on cloud platforms like AWS and Azure, eliminating the need for hardware and complex software management. Snowflake uses a centralized data repository that can be accessed from compute nodes, providing scalability. It offers features like SQL support, a web GUI, security, and connectors to load and analyze large volumes of data with no upfront costs or long-term commitments.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

UNIT 4

DATABASES, CLOUD & SNOWFLAKE

PROF. THUSHARA WEERAWARDANE


Outcomes

¡ Define database types and categories

¡ Describe modern database architectures

¡ Study about the progression of the modern database platforms

¡ Applications snowflake cloud system and use cases


Structured Vs Unstructured Data

¡ Structured Data ?
¡ Structured Data is neat, has known schema and could be fit in a fixed fields in a table.

¡ Unstructured Data ?
¡ Unstructured data has no schema or structure
Key Databases Categories

RDBMS ?

Data Warehouse ?

Data Lake ?
Data Warehouse
¡ A data warehouse exist on top of several databases and used for business
intelligence.
¡ Data warehouse consumes data from all these databases and creates a layer
optimized to perform data analytics.
¡ Schema is done on import.
Why Data Lakes?

¡ Data lakes were created to address the challenges that “big data”
introduced. Those challenges were often identified with the “3 Vs” of big
data: volume, verity and velocity.
Data Lake
¡ A data lake is a centralized repository where
you can store all of your structured, semi-
structured and unstructured data on any scale
for very low cost
¡ Data lake could be used to store raw data as is
without any structure (schema)
¡ There is no need to perform ETL or
transformation jobs on it
¡ You can store many types of data such as
image, text, files, videos, etc.
¡ You can store ML models artifacts, real time
data, and analytics outputs in data lakes.
¡ Processing could be done on expert , so
schema is defined on read.
ETL vs ELT
Extract Transform Load (ETL)

Extract Transform Load

Insight
Raw Data
Extract Transform Load (ETL) cont.

Extract Load
Transform - Data send to warehouse
- From Sources
- Data Cleaning/Organizing - Batch load
- Structured Data
- Single system format - Incremental loading
- Unstructured Data
- Improving data Quality - Full loading
Logical and Physical extraction
Extract Load Transform (ELT)

Staging and transformation


Source System
Example for ETL
Types of Databases

Centralized Database Relational Database Cloud Database

Distributed Database NoSQL Database

Object Oriented Database Network Database


Hierarchical Database

Operational Database Enterprise Database


Centralized vs Distributed Databases
Centralized Distributed
Better Data Data is distributed
Quality

Easy Data Access


Data
Consistency
Lower risk of data
management Low cost

Ideal for Big Data


Centralized
database is large
Server
Failures Example: Apache Cassandra, HBase, Ignite
Relational Database

¡ Databases are typically structured with a


defined schema.
¡ Items are organized as a set of tables
with columns and rows
¡ Columns include attributes and rows
indicate an objects or entity
¡ Database is designed to be transactional,
and they are not designed to perform
data analytics
Properties of Relational Database
Atomicity:
Data operation will complete either with success or with failure

Consistency:
Value preservation

ACID
Isolation:
Data remain isolated

Durability:
Data change is permanent
SQL vs NoSQL?

SQL NoSQL

Data uses Schemas (Defined) Schema-less (Dynamic)

Relations No (very few) Relations

Data is typically merged / nested in a few


Data is distributed across multiple tables
collections
Horizontal scaling is difficult/impossible, only
Both horizontal and vertical scaling are possible
vertical scaling
Limitation for lots of (thousands) read & write Great performance for mass (simple) read & write
quarries per second request
NoSQL Databases

Store wide range of data sets

¡ Key-value storage: Store every single item as key (DynamoDB, Redshift etc)

¡ Document-oriented database: Store like a JSON-like documents (Mongo DB)

¡ Graph Database: Store vast amounts of data in a graph-like structure (Neo4J)

¡ Wide-column store: Data is store in large columns (Cassandra, HBase, Bigtable)


Advantages of NoSQL Databases

Good Productivity

Manage & Handle


large datasets

High Scalability
Quick access
Example – Relational and Non-relational Databases
Example AWS Services

Data Ingestion

Data Analyze
Cloud Databases
Data is stored in a virtual environment and executes over the cloud computing. Some examples are

AWS (Amazon web


Microsoft Azure
services)

Google could Kamatera

Sciencesoft
PhonixNAP
Could Architectures
Examples of Data Lake Architectures
Example for Traditional Data lake Architecture

ODS (operational data Store) is often an intermediary or staging area for a data warehouse, the ODS differs in that its data is overwritten and changes frequently. In contrast, a
data warehouse contains static data for archiving, storage, historical analysis, and reporting. A data mart serves the same purpose but comprises only one subject area.
Example Data lake Architecture
Example of Big Data Platform Architecture
Example for Modern Data Lake Pattern
Modern data architectures (simple)
Database Progression
RDBMS
(Structured/SQL)

Digital World
(Internet, decentralized, data silos)

Data Warehouse
(Data integration, Audit & Governance)

Big Data
(Distributed, batch & stream, data lakes)

Cloud Adoption
(Agility, Scalability, Cost effective, ease of use)

Cloud Efficiency
(intelligent Efficiencies, better economics)
Logical Data Zones
Modern Directions of Data Zones – Virtual Space
Data Lake on Cloud

- Different Processing
- Single source of truth
Capabilities
- Schema on read
- Flexible and quicker
- Decouple storage and
ingestion and
compute
consumption

- Scalability
- Reliability
- Data security
- Future Ready
- Low-cost storage
- Varied data access
pattern
Snowflake

Snowflake is a data warehouse built on top of the cloud infrastructure


(Ex. AWS & Azure).
It is a SaaS which is ideal for the organizations that donÊt want to dedicate
resources for setup, maintenance and support of in-house servers
Why Snowflake ?
Snowflake cont.

No Hardware Virtually No Software

Runs completely on cloud Not a Packaged Software


Infrastructure

Internally handling of management, Uses Virtually compute instances for


maintenance, upgrades & tuning compute needs
Snowflake Key Features

Standard & Extended SQL Web based GUI

Command Line Interface Rich set of client connectors

Bulk Load & Unloading Data Adequate Data Protection & Security
Data Solution – Enterprise Data
Snowflake cont.
¡ No software, No Hardware, No maintenance. Snowflake is provided as Software-as-
a-Service (SaaS) that runs completely on cloud infrastructure
¡ Snowflake uses a central data repository for persisted data that is accessible from
all compute nodes in the data warehouse. (Shared Disk)
¡ Similar to shared-nothing architectures, Snowflake processes queries using MPP
(massively parallel processing) compute clusters where each node in the cluster
stores a portion of the entire data set locally (Shared Nothing)
¡ Shared disk architecture & Shared nothing architecture (SDA/SNA) (multi cluster
architecture)
¡ This approach offers the data management simplicity of a shared-disk architecture,
but with the performance and scale-out benefits of a shared-nothing architecture.
Snowflake Architecture
¡ Snowflake’s unique architecture
consists of three key layers:
¡ Cloud Services (The brain of the system)
¡ Query Processing (The muscles of the
system)
¡ Database Storage

¡ Snowflake Manages everything for


customer
¡ Authentication
¡ Configuration
¡ Resource Management
¡ Data Protection
¡ Availability (it has built in redundancy)
¡ Optimization
Database Storage
¡ When data is loaded into Snowflake, Snowflake reorganizes that data into
its internal optimized, compressed, columnar format.
¡ All data is encrypted AES 256 strong encryption
¡ Snowflake stores this optimized data in cloud storage.

¡ Snowflake manages all aspects of how this data is stored — the


organization, file size, structure, compression, metadata, statistics, and
other aspects of data storage are handled by Snowflake.
¡ The data objects stored by Snowflake are not directly visible nor accessible
by customers; they are only accessible through SQL query operations run
using Snowflake
Query Processing

¡ Snowflake allows to create multiple, independent compute clusters for


query processing, and they are called virtual warehouses.

¡ They all access same data source without any contention (Unlimited scale)

¡ When a virtual warehouse is resized, all subsequent queries take advantage


of new resources.
Cloud Services
¡ Services layer is fully maintained by
snowflake and distributed across
multiple availability to ensure high
availability ¡ Snowflake Service layer - unique
features
¡ The cloud services layer is a collection
¡ Zero copy cloning, Time travel, Data
of services that coordinate activities sharing
across Snowflake. ¡ Authentication & session management
¡ These services tie together all of the ¡ Infrastructure management
different components of Snowflake in ¡ Metadata management
order to process user requests, from
¡ Query parsing and optimization
login to query dispatch
¡ Access control
¡ The cloud services layer also runs on
compute instances provisioned by
Snowflake from the cloud provider.
Simplified Architecture
Snowflake as a Data Lake
Cloud Data Security
Authentication
Is the user who they claim to be? (The process through which a user / process
confirm their identity)

Authorization
What data can the user rightfully (The process through which the system grants
access ? a user/a process, the ability to access data
and carry out actions)

Later can we see who accessed, Auditability


what and when (The ability to trace and review actions in a
system)
ANY QUESTION??
THANK YOU!!
Data Lake Vs Data Warehouse

¡ A Traditional Data Warehouse ¡ A Traditional Data Lake


¡ A data warehouse is a repository of ¡ A data lake is a centralized repository
many kinds of data and is highly where you can store all of your
modelled structured, semi-structured and
unstructured data on any scale for very
¡ Data you find in data warehouse is low cost
carefully related to all of the other data
in the warehouse ¡ In a data lake, you are able to store
your data “as is,” without needing to
¡ In addition, data in a warehouse tends structure it beforehand.
to be highly standardized and cleansed
¡ Once stored you can run different
types of analytics against it.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy