0% found this document useful (0 votes)

19 views34 pages

Data Engineering - Session 01

Uploaded by

Divam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views34 pages

Data Engineering - Session 01

Uploaded by

Divam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Course Curriculum

• Session 01 – Theory
• Introduction to Enterprise Data, Data Engineering,
Modern Data Applications & Patterns, Data
Frameworks, Components & Best Practices
• Session 02 – Theory & Lab Demos
• Introduction to Data stores: SQL, NoSQL, File Systems,
Data Lakes, Data Warehouses, Data Mesh Cloud Data
Products, Lab Demos of select data stores
• Session 03 – Theory & Lab Demos
• Data Architecture Layers, Data Pipelines,
Transformation, Orchestration, Data Aggregation vs
Federation, Lab Demos of sleect Data Pipeline
Products
• Session 04 – Theory & In-Class Design
• Data Governance: Data Catalogs, Data Quality,
Lineage, Provenance, Data Security, Regulatory
Compliance, Real-World Application Data Design
• Tutorials
Enterprise Data

Enterprise Data refers to the

collection of structured, semi-
structured, and unstructured data
that is generated, collected, and
utilized across an entire
organization to support business
operations, decision-making, and
strategic planning.
Source Driven Enterprise
Data Classification
Including transaction data about sales and
purchases; inventory data regarding raw
Operational data materials, finished goods and inventory
levels; and financial data regarding
revenues, costs and profits

Employee information including profiles,

Human resources data performance, payroll, attendance and
training records

Internal Infrastructure data

Details on the company’s physical assets,
properties and IT infrastructure

Enterprise
Data
Records of internal emails, notes, memos
Communication data and minutes of meetings

Research and Information from research projects,

product development stages and testing
development data results

Contact details, demographic information,

purchase history, loyalty program data,
Customer data customer service feedback and other
interactions
Including competitor analysis; data on product
offerings, pricing and marketing campaigns;
Market data and trend analysis of the market, emerging
sectors and negative issues

Including social media postings, user reviews

Customer data and other data generated by customer inputs

Related to the economy, both domestic and

External
Economic indicators globally

Relevant to industry regulations, standards

Regulatory and compliance data

Enterprise
and compliance requirements

Concerning environmental conditions

Environmental data

Data
pertinent to certain industries and activities

For example from analysts or market research

Third-party data firms

From news channels, blogs, forums and social

Social and news media feeds media platforms

Information on populations, age groups,

cultural nuances and geographic distribution,
Demographic and geographic data which is essential for market segmentation
and targeting
Behaviour Driven
Enterprise Data
Classification
Data Sensitivity
Driven
Enterprise Data
Classification
Public Data Internal Data Confidential Data: Restricted/Sensitive Regulated Data
Data:
Description: Non-sensitive data Description: Data used within Description: Sensitive data that Description: Highly sensitive Description: Data governed by
intended for public use. the organization but not could harm the organization if data that, if compromised, could specific laws, regulations, or
Examples: Company brochures, intended for public access. exposed. result in severe damage to the industry standards that mandate
publicly available reports, press Examples: Internal emails, Examples: Business strategies, organization. how it should be handled,
releases. internal process documents, and customer data, internal financial Access: Strictly limited to a few stored, and protected.

Access: Open to everyone, no employee handbooks. reports. authorized personnel; highest Examples: Financial records
restrictions. Access: Accessible to all Access: Restricted to authorized level of security controls. (SOX), healthcare information
employees; minimal security employees; requires moderate Examples: Personally Identifiable (HIPAA), and credit card details
controls. security measures. Information (PII), trade secrets, (PCI-DSS).
financial transactions, Requirements: Strict access
intellectual property controls, encryption, regular
audits, and compliance
reporting.
Key Data Compliance
Frameworks & Regulations
• GDPR (General Data Protection Regulation): Governs data privacy in the EU.
• CCPA (California Consumer Privacy Act): Protects consumer data in California.
• HIPAA (Health Insurance Portability and Accountability Act): Regulates healthcare data in the US.
• PCI-DSS (Payment Card Industry Data Security Standard): Ensures secure handling of credit card
information.
• SOX (Sarbanes-Oxley Act): Mandates financial data integrity for publicly traded companies.
Analyzing Key Enterprise Data Characteristics
Non-Functional Requirements

Historical data
Volumes and Scale Computation Data Security
retention
(incremental) requirements requirements
requirements

Data lifecycle
Data Audit Data Lineage Data Quality
management
requirements requirements requirements
requirements

Data Archival &

Regulatory Data Access Patterns Data Producers (ETL
Purging
requirements (Data Consumers) requirements)
requirements
Let’s review real-world data requirements!
Artifact 01
Introduction to
Data Engineering
Practice of designing, building, and
maintaining the infrastructure, architecture,
and processes that enable the collection,
storage, transformation, and delivery of data
across an organization. It involves creating
data pipelines and workflows that allow raw
data from various sources to be processed
into structured, usable formats for analytics,
reporting, and machine learning.

Importance
Data engineering provides the foundation for data-driven decision-making by ensuring that data is accessible,
reliable, and ready for use by data scientists, analysts, and business stakeholders.
Data Engineering: The
Backbone of Modern
Enterprises

Data engineering is the essential bridge

between raw data and actionable insights.
It empowers organizations to leverage
their data effectively, drive innovation, and
gain a competitive edge in the market.

Data as a Strategic Enabling Data-Driven Driving Innovation Overcoming Data Foundation for
Asset Decisions Challenges Advanced Analytics
In today's digital age, data is Data engineers build the By providing clean, accessible, Data engineers address Data engineering provides the
the new oil. Effective data infrastructure that empowers and reliable data, data common challenges such as foundation for advanced
engineering is crucial for organizations to make engineering fuels innovation data quality, scalability, analytics techniques like
extracting value from this informed decisions across and competitive advantage. security, and compliance to machine learning and artificial
valuable resource. various departments, from ensure data integrity and intelligence, enabling
marketing to operations. trustworthiness. organizations to uncover
hidden insights and predict
future trends.
Traditional Applications vs Modern Data-Driven
Applications

Advanced Data Patterns

Historically
We had only two kind of data models:
RDBMS & Data Warehouses
Centralized
RDBMS, that served as the Operational Data Store that Databases
held the most current data that is subject to maximum Row stores also called
changes (CUD)
Shared
Data Warehouse, that served the purpose of Data Everything
marts holding all the historic data of years that went Column
Database
through less changes but more used for providing stores
views or business intelligence

These Scale Vertically only

Processing was mostly offline and batch or
intraday driven

How do these considerations address the today’s changing requirements of scale, performance, consistency, availability etc. ??

Time: 3 mins
The Paradigm Shift

• Horizontal vs Vertical Scalability for Data

• Centralized vs Distributed vs Decentralized Data
• Traditional structured stores vs NoSQLs
• Data Aggregation vs Data Federation/Virtualization
• Standard ETLs vs Versatile Data Integration
• Polyglot & Tiered Databases (Fit-for-purpose
• Data Lakes: Where is my Enterprise Data ?
• Bridging Unstructured & Structured Data with AI techniques
• Data-as-a-Service
• What about Data Governance ?
Pattern 1: Vertical vs Horizontal Scaling
Scaling out
Scaling up or Vertical scaling, Scaling-out or Horizontal scaling,
this method involves adding this method involves adding more
more resources to a single unit, servers or systems to distribute
such as a server or application the workload across multiple
pod. This can be done by adding machines. This can improve
more memory, CPU, or disk performance and redundancy by
capacity. Scaling up can be using a network of
easier to manage and more systems. Scaling out can be a good
cost-effective because you only choice when you need to
need to manage one larger distribute a workload across
server. However, it's limited by multiple channels, such as when
the maximum capacity of the different kinds of bread need to be
storage controller. toasted.
Pattern 2: Centralized vs Distributed vs
Decentralized Data Storage Patterns
Have a single point of access, which Spread data across multiple nodes,
makes them more secure than which can increase reliability and
distributed databases. They're also speed. However, securing data becomes
easier to monitor and control, and can more difficult, as each node needs to be
help reduce errors. However, if the
protected. Decentralized databases are
server load increases, it may not
perform well. Centralized databases better suited for organizations with a
are best suited for organizations with generalist workforce that performs
specialized roles and standardized complex tasks.
procedures.

Designed to scale horizontally by adding

more nodes to the network. This allows for
increased storage capacity and processing
power. Distributed databases are ideal for
applications that require large amounts of
data to be processed quickly and efficiently
SQLs
NoSQLs
File stores Distributed Caches
Caches Storage & Compute

File Stores
Examples of a
few NoSQL, File
and Distributed
Cache Datbases
Polyglot Polyglot persistence involves using different data storage
technologies to support the unique needs of different
types of data within an application.
Persistence

Benefits E-Commerce Example

Flexibility: Different types of data can be
adapted to specific requirements.
Scalability: Different database
technologies can handle different scaling
requirements.
Loosely coupled services: Each service
can use a different type of database than
other services.
Avoiding monolith
applications: Different services can use
different data stores to avoid a single
database failure taking down the entire
business.
A multi-model database is a database system that can
store and process different types of data in a single
Multi-Model database, instead of requiring multiple specialized
databases. This flexibility allows businesses to handle
Databases a variety of data types without reformatting data or
switching databases. Multi-model databases can be a
Benefits good fit for dynamic environments where business
• Flexibility: By supporting multiple data models, such as relational, needs are constantly changing
document, graph, key-value, and columnar, they allow for diverse data
representation within a single database.
• Unified Query Language: A unified query language to access and
manipulate different data models, simplifying interactions and application
development.
• Data Consistency: Maintaining consistency across multiple single-model
databases can be challenging. Multi-model databases can ensure ACID
(Atomicity, Consistency, Isolation, Durability) properties across various data
models.
• Reduced Data Redundancy: By centralizing various data models, multi-
model databases can reduce the redundancy that might arise from storing
similar data in multiple databases.
• Simplified Architecture: Reduces the need to manage and integrate
multiple standalone databases, leading to a more simplified architecture.
• Reduced Complexity: Multi-model databases can simplify data integration
between different parts of an application or between different applications,
as the data is stored in a common database.
• Cost Effective: Operating and maintaining multiple database systems can be
expensive. By consolidating these into one multi-model database,
organizations can realize significant operational and infrastructure savings.
Few
examples of
Multi-Model
Databases
Data Duplication vs Deduplication
Benefits of Data Deduplication
Data duplication Data deduplication
The process of identifying and
Same data is stored in Reduced storage costs: By eliminating duplicate
multiple locations within a removing duplicate data. It
system or across different involves comparing data data, organizations can significantly reduce their
systems. This can lead to blocks or chunks to find
identical copies and then
storage requirements and associated costs.
inconsistencies,
inefficiencies, and increased storing only a single instance,
storage costs. while maintaining references Improved performance: Deduplication can
to the duplicates. This can improve system performance by reducing the
significantly reduce storage
requirements and improve amount of data that needs to be processed and
data management efficiency. stored.

Enhanced data integrity: Deduplication helps

ensure data consistency and accuracy by
eliminating conflicting copies.

Increased data availability: Deduplicated data

can be more easily accessed and retrieved,
improving data availability and reducing
downtime.
Key Deduplication Methods

Source: Oracle
Data Aggregation
Data aggregation is the process of collecting and organizing data
from multiple sources into a single format for analysis and
decision-making. It can be applied at any scale, from pivot
tables to data lakes.

Key Areas of Focus

Improved Efficiency and Data Quality
Better Decision-Making
Integrating Different Types of Data
Producing Quality Results
Ensuring Legal, Regulatory, and Privacy Compliance

Source: brightdata.com
Data Federation

• It is a technique that integrates data from multiple sources by

executing queries across the sources and combining the results.
Data federation uses a federated query engine that translates the
user or application queries into subqueries that are sent to the
source systems and then merges the subquery results into a final
output. Data federation allows for flexible and scalable data
integration, preserves the autonomy and security of the source
systems, and supports complex queries and transformations.

Data Virtualization

• It is a technique that creates a unified view of data from multiple

sources without physically moving or copying the data. Data
virtualization uses a middleware layer that connects to the source
systems and provides a virtual schema that can be queried by the
users or applications. Data virtualization enables real-time access to
the latest data, reduces data duplication and storage costs, and
simplifies data management and governance.
Data
Engineering
Components
Data Engineering Components
•Definition: The process of collecting and •Definition: Storing data in a way that is •Definition: Converting raw data into a •Definition: Combining data from •Data Quality: Ensures that data is
importing data from various sources accessible, scalable, and secure for structured, analyzable format by different sources into a unified view to accurate, consistent, complete, and
into a centralized data repository. future processing and analysis. cleaning, transforming, and enriching it. create a complete, consistent data set. reliable.
•Sources: Databases, APIs, IoT devices, •Types: Data Warehouses (e.g., Amazon •Techniques: ETL (Extract, Transform, •Methods: Batch Processing, Real-Time •Data Governance: Establishes policies,
social media, cloud storage. Redshift, Google BigQuery), Data Lakes Load), ELT (Extract, Load, Transform). Streaming, Data Virtualization. standards, and procedures for data
•Tools: Apache NiFi, Apache Kafka, (e.g., Azure Data Lake, AWS S3), NoSQL •Tools: Apache Spark, Apache Beam, •Tools: Apache Camel, MuleSoft, management across the organization.
Flume, AWS Glue. Databases (e.g., MongoDB, Cassandra). Talend, Dataflow, Databricks. Informatica, Stitch. •Tools: Great Expectations, Talend Data
•Considerations: Scalability, data format Quality, Collibra, Alation.
(structured, semi-structured,
unstructured), security.

Data Quality -
Data Processing -
Data Ingestion Data Storage Data Integration Data
Transformation
Governance

•Definition: Coordinating and managing •Definition: Protecting data from •Definition: Tracking data flow, •Definition: Providing data in a
the execution of data workflows, unauthorized access, breaches, and identifying issues, and ensuring data consumable format to end-users,
ensuring that data moves through the ensuring compliance with regulations pipelines function correctly. analysts, data scientists, and
pipelines smoothly and efficiently. (e.g., GDPR, CCPA). •Tools: Prometheus, Grafana, Splunk, applications.
•Tools: Apache Airflow, Luigi, Prefect, •Components: Data encryption, access Datadog. •Methods: APIs, SQL queries,
Control-M. controls, data masking, and auditing. dashboards, data exports.
•Tools: AWS Identity and Access •Tools: Looker, Tableau, Power BI,
Management (IAM), Azure Active Jupyter Notebooks
Directory, Vault by HashiCorp.

Data Data Security & Data Monitoring Data Access &

Orchestration Compliance & Observability Delivery
Q&A

Understanding Data Governance
100% (3)
Understanding Data Governance
28 pages
CH1 - Introduction To Data Engineering
No ratings yet
CH1 - Introduction To Data Engineering
36 pages
Week 5 Chapter 6
No ratings yet
Week 5 Chapter 6
29 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
Introduction To Big Data & Basic Data Analysis
No ratings yet
Introduction To Big Data & Basic Data Analysis
51 pages
ACC IT APP MIdterm Bigdata
No ratings yet
ACC IT APP MIdterm Bigdata
12 pages
KCA 034 - Unit 1
No ratings yet
KCA 034 - Unit 1
48 pages
Coursera - IBM - Introduction To Data Analytics
No ratings yet
Coursera - IBM - Introduction To Data Analytics
13 pages
Big Data
No ratings yet
Big Data
51 pages
Module 1.ppt
No ratings yet
Module 1.ppt
29 pages
Harness The Power of Data: Now Is The Time To Become An Analytics-Driven Organization. Discover How
No ratings yet
Harness The Power of Data: Now Is The Time To Become An Analytics-Driven Organization. Discover How
20 pages
Week 2 Data Rols DataPlatfro Use Cases v1 S25
No ratings yet
Week 2 Data Rols DataPlatfro Use Cases v1 S25
50 pages
CH 16 Data and Competitive Advantage
No ratings yet
CH 16 Data and Competitive Advantage
48 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Data_Engineering_Part_1__1735286787
No ratings yet
Data_Engineering_Part_1__1735286787
22 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
79 pages
CH 2 Data Science
No ratings yet
CH 2 Data Science
28 pages
Lecture-5
No ratings yet
Lecture-5
22 pages
EB - How To Build A Modern Data - 200529 - E - Final
No ratings yet
EB - How To Build A Modern Data - 200529 - E - Final
15 pages
Data Engineering UNIT-1 (2)
No ratings yet
Data Engineering UNIT-1 (2)
5 pages
business_analytics[1]
No ratings yet
business_analytics[1]
3 pages
DSS ch2
No ratings yet
DSS ch2
112 pages
BI- Chap 2 Data Warehouses
No ratings yet
BI- Chap 2 Data Warehouses
31 pages
Unit 1 Introduction: Data Science and Big Data: Syllabus
No ratings yet
Unit 1 Introduction: Data Science and Big Data: Syllabus
38 pages
Unit 1
No ratings yet
Unit 1
21 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
Mysql Workbench A Data Modeling Guide For Developers and Dbas
No ratings yet
Mysql Workbench A Data Modeling Guide For Developers and Dbas
13 pages
Lecture Notes Ch1 (1)
No ratings yet
Lecture Notes Ch1 (1)
24 pages
DA - presentation _20250421_182554_0000
No ratings yet
DA - presentation _20250421_182554_0000
19 pages
essentials-of-data-engineeringByMukeshSaini
No ratings yet
essentials-of-data-engineeringByMukeshSaini
30 pages
Unit 1
No ratings yet
Unit 1
61 pages
21CS71 IMP
No ratings yet
21CS71 IMP
29 pages
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
No ratings yet
Build A Modern, Unified Analytics Data Platform With Google Cloud - Whitepaper August 2021
18 pages
module 2-3 fuba midterms
100% (1)
module 2-3 fuba midterms
5 pages
Harness Data To Reinvent Your Organization
No ratings yet
Harness Data To Reinvent Your Organization
20 pages
CS8091 BDA Unit 1
No ratings yet
CS8091 BDA Unit 1
118 pages
Data Mining v3
No ratings yet
Data Mining v3
54 pages
Summary Chapter 3 and 4
No ratings yet
Summary Chapter 3 and 4
9 pages
M
No ratings yet
M
13 pages
Fast Data Enterprise Data Architecture
100% (2)
Fast Data Enterprise Data Architecture
47 pages
CHAPTER 02: Big Data Analytics
No ratings yet
CHAPTER 02: Big Data Analytics
73 pages
A Comprehensive Meta Model For The
No ratings yet
A Comprehensive Meta Model For The
61 pages
Session 1
No ratings yet
Session 1
48 pages
Data Driven Transformations Fabric
No ratings yet
Data Driven Transformations Fabric
36 pages
From Data Warehouses and Lakes To Data Mesh A Guide To Enterprise Data Architecture
No ratings yet
From Data Warehouses and Lakes To Data Mesh A Guide To Enterprise Data Architecture
23 pages
C1_W1
No ratings yet
C1_W1
91 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
MIS - 7 (Compatibility Mode)
No ratings yet
MIS - 7 (Compatibility Mode)
48 pages
02 Establishing The Need For An Organization-Wide Data Dictionary v2
No ratings yet
02 Establishing The Need For An Organization-Wide Data Dictionary v2
19 pages
Data Engineering Unit-1
No ratings yet
Data Engineering Unit-1
16 pages
MIS- UNIT 3 new
No ratings yet
MIS- UNIT 3 new
72 pages
Introduction to Data Engineering
No ratings yet
Introduction to Data Engineering
13 pages
Analytics and Processing: Yuanyuan Zhu Email: Yyzhu@whu - Edu.cn
No ratings yet
Analytics and Processing: Yuanyuan Zhu Email: Yyzhu@whu - Edu.cn
47 pages
Big Data Lesson 1 Lucrezia Noli
No ratings yet
Big Data Lesson 1 Lucrezia Noli
46 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Data Engineering - Beginner's Guide
100% (1)
Data Engineering - Beginner's Guide
9 pages
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
No ratings yet
Introduction To Big Data: Types of Digital Data, History of Big Data Innovation
12 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
A Corporate Librarian’s Guide to Information Governance and Data Privacy
From Everand
A Corporate Librarian’s Guide to Information Governance and Data Privacy
Phyllis L. Elin
No ratings yet
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
BA Orientation
No ratings yet
BA Orientation
11 pages
Ba Mpbam
No ratings yet
Ba Mpbam
21 pages
Tableau SOP
No ratings yet
Tableau SOP
4 pages
Ba MPBDS
No ratings yet
Ba MPBDS
5 pages
Boarding Pass (CCU-BLR)
No ratings yet
Boarding Pass (CCU-BLR)
1 page
Inspiron 15 7572 Laptop Service Manual en Us
No ratings yet
Inspiron 15 7572 Laptop Service Manual en Us
64 pages
Azure Cosmos DB
No ratings yet
Azure Cosmos DB
835 pages
NoSQL Paper 3
No ratings yet
NoSQL Paper 3
39 pages
InterSystems IRIS Data Platform-Unified Platform For Powering Real-Time Data-Intensive Applications-Whitepaper
No ratings yet
InterSystems IRIS Data Platform-Unified Platform For Powering Real-Time Data-Intensive Applications-Whitepaper
12 pages
Graph Databases
No ratings yet
Graph Databases
191 pages
Multi Model Identifies Fraud at Scale - ArangoDB White Paper
No ratings yet
Multi Model Identifies Fraud at Scale - ArangoDB White Paper
17 pages
21 5COSC020W LECT04 SQL Simple
No ratings yet
21 5COSC020W LECT04 SQL Simple
39 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
Lieponiene
No ratings yet
Lieponiene
9 pages
Naval Postgraduate School Monterey, California
No ratings yet
Naval Postgraduate School Monterey, California
147 pages
Comparison ArangoDB Vs Neo4j
No ratings yet
Comparison ArangoDB Vs Neo4j
4 pages
Diploma Engineering: Laboratory Manual
No ratings yet
Diploma Engineering: Laboratory Manual
65 pages
Oracle SQL Day 1
No ratings yet
Oracle SQL Day 1
14 pages
Iris Intersystem Health Technology Guide
No ratings yet
Iris Intersystem Health Technology Guide
12 pages
ADBMS
No ratings yet
ADBMS
12 pages
Data Services - PreQuiz - Attempt Review
No ratings yet
Data Services - PreQuiz - Attempt Review
4 pages
Arangodb Tutorial
0% (1)
Arangodb Tutorial
17 pages
Assignment1 622
No ratings yet
Assignment1 622
3 pages
Practical 1 Aim: Introduction To Nosql Database
No ratings yet
Practical 1 Aim: Introduction To Nosql Database
16 pages
Multi-Model-Identifies-Fraud-At-Scale-–-ArangoDB-White-Paper
No ratings yet
Multi-Model-Identifies-Fraud-At-Scale-–-ArangoDB-White-Paper
17 pages
The Modern Graph Database Buyers Guide
No ratings yet
The Modern Graph Database Buyers Guide
17 pages
Isgui 503
No ratings yet
Isgui 503
98 pages
Globally Distributed, Secure MongoDB With Azure Cosmos DB
No ratings yet
Globally Distributed, Secure MongoDB With Azure Cosmos DB
23 pages
MongoDB Why Documents
No ratings yet
MongoDB Why Documents
15 pages
Para Distr Nosql Notes
No ratings yet
Para Distr Nosql Notes
13 pages
ArangoDB GraphCourse Beginners
No ratings yet
ArangoDB GraphCourse Beginners
64 pages
Task - Written Assignment - DLMBIRND01
No ratings yet
Task - Written Assignment - DLMBIRND01
4 pages
Orient DB
0% (1)
Orient DB
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Engineering - Session 01

Uploaded by

Data Engineering - Session 01

Uploaded by

Course Curriculum

Enterprise Data refers to the

Employee information including profiles,

Internal Infrastructure data

Research and Information from research projects,

Contact details, demographic information,

Including social media postings, user reviews

Related to the economy, both domestic and

Relevant to industry regulations, standards

Concerning environmental conditions

For example from analysts or market research

From news channels, blogs, forums and social

Information on populations, age groups,

Data Archival &

Data engineering is the essential bridge

Advanced Data Patterns

These Scale Vertically only

• Horizontal vs Vertical Scalability for Data

Designed to scale horizontally by adding

Benefits E-Commerce Example

Enhanced data integrity: Deduplication helps

Increased data availability: Deduplicated data

Key Areas of Focus

• It is a technique that integrates data from multiple sources by

• It is a technique that creates a unified view of data from multiple

Data Data Security & Data Monitoring Data Access &

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.