0% found this document useful (0 votes)

17 views21 pages

Week 2 - The Data Engineering Ecosystem

The document provides an overview of the Data Engineering Ecosystem, highlighting the importance of automated tools, frameworks, and processes in data analytics. It categorizes data into structured, semi-structured, and unstructured types, each with distinct characteristics and storage methods. Additionally, it discusses various data sources, file formats, and programming languages essential for data professionals in managing and analyzing data.

Uploaded by

amine bouzidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views21 pages

Week 2 - The Data Engineering Ecosystem

Uploaded by

amine bouzidi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

🔥

Week 2 - The Data Engineering Ecosystem

Overview of the Data Engineering Ecosystem

Week 2 - The Data Engineering Ecosystem 1

Week 2 - The Data Engineering Ecosystem 2
Week 2 - The Data Engineering Ecosystem 3
Conclusion
Automated tools, frameworks, and processes for all stages of the data analytics process are part of the Data Engineer’s
ecosystem.
It‘s a diverse, rich, and challenging ecosystem.

Types of Data

Structured data
Has a well-defined structure

Can be stored in well-defined schemas

Can be represented in a tabular manner with rows and columns

Week 2 - The Data Engineering Ecosystem 4

Semi-Structured data
Has some organizational properties but lacks a fixed or rigid schema

Cannot be stored in the forms of rows and columns as in databases

Contains tags and elements, or metadata, which is used to griup data and organize it in a hierarchy

Unstructured data
Does not have an easily identifiable structure

Cannot be organized in a mainstream relational database in the form of rows and columns

Does not follow any particular format, sequence, semantics, or rules

Week 2 - The Data Engineering Ecosystem 5

Conclusion
Structured data is data that is well organized in formats that can be stored in databases and lends itself to standard data
analysis methods and tools;

Semi-structured data is data that is somewhat organized and relies on meta tags for grouping and hierarchy;
Unstructured data is data that is not conventionally organized in the form of rows and columns in a particular format. In the
next video, we will learn about the different types of file structures.

Understanding Different Types of File Formats

Week 2 - The Data Engineering Ecosystem 6

Week 2 - The Data Engineering Ecosystem 7
Week 2 - The Data Engineering Ecosystem 8
Sources of Data

Week 2 - The Data Engineering Ecosystem 9

Week 2 - The Data Engineering Ecosystem 10
Week 2 - The Data Engineering Ecosystem 11
Week 2 - The Data Engineering Ecosystem 12
Week 2 - The Data Engineering Ecosystem 13
Languages for Data Professionals

Week 2 - The Data Engineering Ecosystem 14

Week 2 - The Data Engineering Ecosystem 15
Week 2 - The Data Engineering Ecosystem 16
Week 2 - The Data Engineering Ecosystem 17
Week 2 - The Data Engineering Ecosystem 18
Reading: Metadata and Metadata Management

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0100EN-SkillsNetwork/readings/Reading_Metadata_and_Metadata_Managemen
t.md.html?origin=www.coursera.org

Summary and Highlights

A Data Engineer’s ecosystem includes the infrastructure, tools, frameworks, and processes for extracting data,
architecting and managing data pipelines and data repositories, managing workflows, developing applications, and
managing BI and Reporting tools.
Based on how well-defined the structure of the data is, data can be categorized as

Structured data, that is data which is well organized in formats that can be stored in databases.

Semi-structured data, that is data which is partially organized and partially free-form.

Unstructured data, that is data which can not be organized conventionally into rows and columns.

Data comes in a wide-ranging variety of file formats, such as, delimited text files, spreadsheets, XML, PDF, and JSON,
each with its own list of benefits and limitations of use.
Data is extracted from multiple data sources, ranging from relational and non-relational databases, to APIs, web services,
data streams, social platforms, and sensor devices.

Once the data is identified and gathered from different sources, it needs to be staged in a data repository so that it can be
prepared for analysis. The type, format, and sources of data influence the type of data repository that can be used.
Data professionals need a host of languages that can help them extract, prepare, and analyse data. These can be
classified as:

Querying languages, such as SQL, used for accessing and manipulating data from databases.

Programming languages such as Python, R, and Java, for developing applications and controlling application
behavior.

Shell and Scripting languages, such as Unix/Linux Shell, and PowerShell, for automating repetitive operational tasks.

Quiz
Practice Quiz
Question 1

Week 2 - The Data Engineering Ecosystem 19

Automated tools, frameworks, and processes for all stages of the data analytics process are part of the Data Engineer’s
ecosystem. What role do data integration tools play in this ecosystem?

Store high-volume day-to-day operational data in data repositories

Cover the entire journey of data from source to destination

Combine data from multiple sources into a unified view that is accessed by data consumers to query and
manipulate data

Conduct complex data analytics

Question 2
Which of these data sources is an example of semi-structured data?

Documents

Social media feeds

Emails

Network and web logs

Question 3
Which one of the provided file formats is commonly used by APIs and Web Services to return data?

XML

Delimited file

JSON

XLS

Question 4
What is one example of the relational databases
discussed in the video?

Spreadsheet

XML

Flat files

SQL Server

Question 5
Which of the following languages is one of the most popular querying languages in use today?

SQL

Java

Python

Graded Quiz
Question 1
There are two main types of data repositories – Transactional and Analytical. For high-volume day-to-day operational data
such as banking transactions, Transactional, or OLTP, systems are the ideal choice.

True

False

Transactional, or OLTP, systems are designed and optimized for handling high-volume transactions.

Question 2
Which of the following is an example of unstructured data?

Zipped files

Week 2 - The Data Engineering Ecosystem 20

Video and Audio files

XML

Spreadsheets

Question 3
Which one of these file formats is independent of software, hardware, and operating systems, and can be viewed the
same way on any device?

XML

XLSX

PDF

Delimited text file

PDF format is independent of software, hardware, and operating systems, and can be viewed the same way on any
device.

Question 4
Which data source can return data in plain text, XML, HTML, or JSON among others?

APIs

Delimited text file

XML

PDF

APIs can return data in a wide variety of formats such as plain text, XML, HTML, or JSON among others.

Question 5
In the data engineer’s ecosystem, languages are classified by type. What are shell and scripting languages most
commonly used for?

Manipulating data

Building apps

Automating repetitive operational tasks

Querying data

Week 2 - The Data Engineering Ecosystem 21

100_data_engineering_QUESTIONS_ANSWERS
No ratings yet
100_data_engineering_QUESTIONS_ANSWERS
59 pages
Data Engineering For Everyone 2
No ratings yet
Data Engineering For Everyone 2
39 pages
Data Engineering Questions Answers 1679109980
No ratings yet
Data Engineering Questions Answers 1679109980
26 pages
Structured and Unstructured Data: Learning Outcomes
100% (1)
Structured and Unstructured Data: Learning Outcomes
13 pages
L1_Introduction and Data EcoSystem
No ratings yet
L1_Introduction and Data EcoSystem
42 pages
BD Unit 1
No ratings yet
BD Unit 1
72 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
80 pages
60+ Data Engineer Interview Questions and Answers
No ratings yet
60+ Data Engineer Interview Questions and Answers
16 pages
Chapter 2 - Types of digital data
No ratings yet
Chapter 2 - Types of digital data
12 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
28 pages
Data Engineering vs Data Science
No ratings yet
Data Engineering vs Data Science
26 pages
Big Data and Analytics Cse448 Module 1 L
No ratings yet
Big Data and Analytics Cse448 Module 1 L
38 pages
Mod 2 Business Analytics
No ratings yet
Mod 2 Business Analytics
43 pages
Chapter 01: Types of Digital Data
No ratings yet
Chapter 01: Types of Digital Data
79 pages
Cse Big Data 702 Notes
No ratings yet
Cse Big Data 702 Notes
91 pages
Unit 4 DigitalData
No ratings yet
Unit 4 DigitalData
22 pages
Module-1
No ratings yet
Module-1
40 pages
Big Data & Analytics (CSE448) L1 (1)
No ratings yet
Big Data & Analytics (CSE448) L1 (1)
51 pages
DA(Unit-1)
No ratings yet
DA(Unit-1)
45 pages
Unit - Big - Data
No ratings yet
Unit - Big - Data
107 pages
Selected Topic: Data Modeling and Management: What Are You Thinking of When We Talk About ?
No ratings yet
Selected Topic: Data Modeling and Management: What Are You Thinking of When We Talk About ?
28 pages
CSC4404 Chap3
No ratings yet
CSC4404 Chap3
84 pages
Structured, Semi-Structured and Unstructured Data (M-2)
No ratings yet
Structured, Semi-Structured and Unstructured Data (M-2)
3 pages
BDA_ppt1
No ratings yet
BDA_ppt1
45 pages
Untitled document.
No ratings yet
Untitled document.
7 pages
Ds Notes
No ratings yet
Ds Notes
88 pages
DP_900_Data_Fundamentals_1710103456
No ratings yet
DP_900_Data_Fundamentals_1710103456
35 pages
Lecture Notes Ch1 (1)
No ratings yet
Lecture Notes Ch1 (1)
24 pages
Data Management Strategy Ver 3
No ratings yet
Data Management Strategy Ver 3
25 pages
Lecture 1
No ratings yet
Lecture 1
25 pages
UNIT 3
No ratings yet
UNIT 3
12 pages
Structured, Semi Structured and Unstructured Data
No ratings yet
Structured, Semi Structured and Unstructured Data
13 pages
IDS_sem ans unit 1
No ratings yet
IDS_sem ans unit 1
10 pages
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
No ratings yet
Practical No.10 Aim:Case Study Case Study Topic: Structureddata vs. Unstructureddata
5 pages
Data Engineer Roadmap 2024 _ Navigating the Landscape of Data Engineering _ by Ansam Yousry _ in Technology Hits - Freedium
No ratings yet
Data Engineer Roadmap 2024 _ Navigating the Landscape of Data Engineering _ by Ansam Yousry _ in Technology Hits - Freedium
12 pages
2023_IT_22IT405_U1-LM1 (1)
No ratings yet
2023_IT_22IT405_U1-LM1 (1)
11 pages
UNIT-1 Bda Kalyan
No ratings yet
UNIT-1 Bda Kalyan
25 pages
Lecture 1.1 - Introduction To DE
No ratings yet
Lecture 1.1 - Introduction To DE
27 pages
Dbms (Lab) Da-1: Table Creation For Airline Database
No ratings yet
Dbms (Lab) Da-1: Table Creation For Airline Database
23 pages
33200122134_Twinkle Mahato_BG
No ratings yet
33200122134_Twinkle Mahato_BG
8 pages
Sap BW Tutorial
33% (3)
Sap BW Tutorial
30 pages
Implementing Information Security Based On Iso 27001 Iso 27002 A Management Guide
60% (5)
Implementing Information Security Based On Iso 27001 Iso 27002 A Management Guide
27 pages
Big Data & Analytics (CSE448) L1
No ratings yet
Big Data & Analytics (CSE448) L1
50 pages
Module 1 Notes
No ratings yet
Module 1 Notes
7 pages
Epicor10 techrefSystemAdministration 101400
No ratings yet
Epicor10 techrefSystemAdministration 101400
243 pages
Week-2-Data Warehouse and Olap
No ratings yet
Week-2-Data Warehouse and Olap
48 pages
Systems Interface Document
No ratings yet
Systems Interface Document
9 pages
Compatibility Mode
No ratings yet
Compatibility Mode
72 pages
Design and Implementation of An Enterprise Data Warehouse
No ratings yet
Design and Implementation of An Enterprise Data Warehouse
91 pages
Concurrency Control Protocol & Recovery
No ratings yet
Concurrency Control Protocol & Recovery
23 pages
2 Dumps 1z0-071
No ratings yet
2 Dumps 1z0-071
147 pages
subtitle (7)
No ratings yet
subtitle (7)
3 pages
Introduction To Data Engineering
No ratings yet
Introduction To Data Engineering
8 pages
Netbackup Health Check Script
100% (1)
Netbackup Health Check Script
2 pages
Objective Type
No ratings yet
Objective Type
8 pages
Concepts and Techniques: - Chapter 1
No ratings yet
Concepts and Techniques: - Chapter 1
48 pages
High: United Nation Interview Questions Free
100% (1)
High: United Nation Interview Questions Free
6 pages
378-Bca 602N - (A) PDF
No ratings yet
378-Bca 602N - (A) PDF
24 pages
Erp Going Live and Post Implementation
No ratings yet
Erp Going Live and Post Implementation
14 pages
Candidate Key Is A Minimal Super Key, Which Contains No Extra Attributes. It Is Also Called Subset of Super Key. Sid Reg - No E-Mail
No ratings yet
Candidate Key Is A Minimal Super Key, Which Contains No Extra Attributes. It Is Also Called Subset of Super Key. Sid Reg - No E-Mail
7 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
2 pages
Question 3
No ratings yet
Question 3
14 pages
Your Guide To Enterprise Data Architecture
No ratings yet
Your Guide To Enterprise Data Architecture
23 pages
Pimcore Support
No ratings yet
Pimcore Support
3 pages
Data Mesh
No ratings yet
Data Mesh
4 pages
HPXS200 1 July Dec2020 SA2 CMu V2 08052020
No ratings yet
HPXS200 1 July Dec2020 SA2 CMu V2 08052020
3 pages
Naukri AmitGupta (21y 0m)
No ratings yet
Naukri AmitGupta (21y 0m)
7 pages
Insurance Project Simple Explanation
100% (5)
Insurance Project Simple Explanation
1 page
Sap BW - Hana - 1
No ratings yet
Sap BW - Hana - 1
6 pages
Extract, Transform, Load
No ratings yet
Extract, Transform, Load
8 pages
Assignment of Generic Models (Software Engineering)
No ratings yet
Assignment of Generic Models (Software Engineering)
8 pages
DB HW2
No ratings yet
DB HW2
3 pages
Modula-2 Language and Programming Techniques: Definitive Reference for Developers and Engineers
From Everand
Modula-2 Language and Programming Techniques: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Rebol Programming Insights: Definitive Reference for Developers and Engineers
From Everand
Rebol Programming Insights: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
From Everand
Practical Parquet Engineering: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Data Structure in Python: Essential Techniques
From Everand
Data Structure in Python: Essential Techniques
Ed A Norex
No ratings yet
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
From Everand
Pandas Essentials for Data Analysis: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Splunk for Data Insights: Definitive Reference for Developers and Engineers
From Everand
Splunk for Data Insights: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
SQLite Essentials: Definitive Reference for Developers and Engineers
From Everand
SQLite Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
From Everand
Efficient Data Querying with Drill: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
From Everand
Teradata Architecture and SQL Essentials: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Efficient Workflows with Notepad++: Definitive Reference for Developers and Engineers
From Everand
Efficient Workflows with Notepad++: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Databases: System Concepts, Designs, Management, and Implementation
From Everand
Databases: System Concepts, Designs, Management, and Implementation
Jonathan Rigdon
No ratings yet
Data Structures Explained: A Practical Guide with Examples
From Everand
Data Structures Explained: A Practical Guide with Examples
William E. Clark
No ratings yet
C++ File Handling Step by Step: A Practical Guide with Examples
From Everand
C++ File Handling Step by Step: A Practical Guide with Examples
William E. Clark
No ratings yet
Concise Oracle Database For People Who Has No Time
From Everand
Concise Oracle Database For People Who Has No Time
Billy Aung Myint
No ratings yet
Introduction to Microsoft SQL Server
From Everand
Introduction to Microsoft SQL Server
Eric Frick
No ratings yet
Python For Data Science
From Everand
Python For Data Science
Kevin Clark
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Week 2 - The Data Engineering Ecosystem

Uploaded by

Week 2 - The Data Engineering Ecosystem

Uploaded by

🔥

Week 2 - The Data Engineering Ecosystem

Week 2 - The Data Engineering Ecosystem 1

Can be stored in well-defined schemas

Can be represented in a tabular manner with rows and columns

Week 2 - The Data Engineering Ecosystem 4

Cannot be stored in the forms of rows and columns as in databases

Does not follow any particular format, sequence, semantics, or rules

Week 2 - The Data Engineering Ecosystem 5

Understanding Different Types of File Formats

Week 2 - The Data Engineering Ecosystem 6

Week 2 - The Data Engineering Ecosystem 9

Week 2 - The Data Engineering Ecosystem 14

Summary and Highlights

Week 2 - The Data Engineering Ecosystem 19

Store high-volume day-to-day operational data in data repositories

Cover the entire journey of data from source to destination

Conduct complex data analytics

Social media feeds

Network and web logs

Week 2 - The Data Engineering Ecosystem 20

Delimited text file

Delimited text file

Automating repetitive operational tasks

Week 2 - The Data Engineering Ecosystem 21

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.