0% found this document useful (0 votes)
17 views21 pages

Week 2 - The Data Engineering Ecosystem

The document provides an overview of the Data Engineering Ecosystem, highlighting the importance of automated tools, frameworks, and processes in data analytics. It categorizes data into structured, semi-structured, and unstructured types, each with distinct characteristics and storage methods. Additionally, it discusses various data sources, file formats, and programming languages essential for data professionals in managing and analyzing data.

Uploaded by

amine bouzidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views21 pages

Week 2 - The Data Engineering Ecosystem

The document provides an overview of the Data Engineering Ecosystem, highlighting the importance of automated tools, frameworks, and processes in data analytics. It categorizes data into structured, semi-structured, and unstructured types, each with distinct characteristics and storage methods. Additionally, it discusses various data sources, file formats, and programming languages essential for data professionals in managing and analyzing data.

Uploaded by

amine bouzidi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

🔥

Week 2 - The Data Engineering Ecosystem


Overview of the Data Engineering Ecosystem

Week 2 - The Data Engineering Ecosystem 1


Week 2 - The Data Engineering Ecosystem 2
Week 2 - The Data Engineering Ecosystem 3
Conclusion
Automated tools, frameworks, and processes for all stages of the data analytics process are part of the Data Engineer’s
ecosystem.
It‘s a diverse, rich, and challenging ecosystem.

Types of Data

Structured data
Has a well-defined structure

Can be stored in well-defined schemas

Can be represented in a tabular manner with rows and columns

Week 2 - The Data Engineering Ecosystem 4


Semi-Structured data
Has some organizational properties but lacks a fixed or rigid schema

Cannot be stored in the forms of rows and columns as in databases

Contains tags and elements, or metadata, which is used to griup data and organize it in a hierarchy

Unstructured data
Does not have an easily identifiable structure

Cannot be organized in a mainstream relational database in the form of rows and columns

Does not follow any particular format, sequence, semantics, or rules

Week 2 - The Data Engineering Ecosystem 5


Conclusion
Structured data is data that is well organized in formats that can be stored in databases and lends itself to standard data
analysis methods and tools;

Semi-structured data is data that is somewhat organized and relies on meta tags for grouping and hierarchy;
Unstructured data is data that is not conventionally organized in the form of rows and columns in a particular format. In the
next video, we will learn about the different types of file structures.

Understanding Different Types of File Formats

Week 2 - The Data Engineering Ecosystem 6


Week 2 - The Data Engineering Ecosystem 7
Week 2 - The Data Engineering Ecosystem 8
Sources of Data

Week 2 - The Data Engineering Ecosystem 9


Week 2 - The Data Engineering Ecosystem 10
Week 2 - The Data Engineering Ecosystem 11
Week 2 - The Data Engineering Ecosystem 12
Week 2 - The Data Engineering Ecosystem 13
Languages for Data Professionals

Week 2 - The Data Engineering Ecosystem 14


Week 2 - The Data Engineering Ecosystem 15
Week 2 - The Data Engineering Ecosystem 16
Week 2 - The Data Engineering Ecosystem 17
Week 2 - The Data Engineering Ecosystem 18
Reading: Metadata and Metadata Management

https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBM-DB0100EN-SkillsNetwork/readings/Reading_Metadata_and_Metadata_Managemen
t.md.html?origin=www.coursera.org

Summary and Highlights


A Data Engineer’s ecosystem includes the infrastructure, tools, frameworks, and processes for extracting data,
architecting and managing data pipelines and data repositories, managing workflows, developing applications, and
managing BI and Reporting tools.
Based on how well-defined the structure of the data is, data can be categorized as

Structured data, that is data which is well organized in formats that can be stored in databases.

Semi-structured data, that is data which is partially organized and partially free-form.

Unstructured data, that is data which can not be organized conventionally into rows and columns.

Data comes in a wide-ranging variety of file formats, such as, delimited text files, spreadsheets, XML, PDF, and JSON,
each with its own list of benefits and limitations of use.
Data is extracted from multiple data sources, ranging from relational and non-relational databases, to APIs, web services,
data streams, social platforms, and sensor devices.

Once the data is identified and gathered from different sources, it needs to be staged in a data repository so that it can be
prepared for analysis. The type, format, and sources of data influence the type of data repository that can be used.
Data professionals need a host of languages that can help them extract, prepare, and analyse data. These can be
classified as:

Querying languages, such as SQL, used for accessing and manipulating data from databases.

Programming languages such as Python, R, and Java, for developing applications and controlling application
behavior.

Shell and Scripting languages, such as Unix/Linux Shell, and PowerShell, for automating repetitive operational tasks.

Quiz
Practice Quiz
Question 1

Week 2 - The Data Engineering Ecosystem 19


Automated tools, frameworks, and processes for all stages of the data analytics process are part of the Data Engineer’s
ecosystem. What role do data integration tools play in this ecosystem?

Store high-volume day-to-day operational data in data repositories

Cover the entire journey of data from source to destination

Combine data from multiple sources into a unified view that is accessed by data consumers to query and
manipulate data

Conduct complex data analytics

Question 2
Which of these data sources is an example of semi-structured data?

Documents

Social media feeds

Emails

Network and web logs

Question 3
Which one of the provided file formats is commonly used by APIs and Web Services to return data?

XML

Delimited file

JSON

XLS

Question 4
What is one example of the relational databases
discussed in the video?

Spreadsheet

XML

Flat files

SQL Server

Question 5
Which of the following languages is one of the most popular querying languages in use today?

SQL

Java

Python

Graded Quiz
Question 1
There are two main types of data repositories – Transactional and Analytical. For high-volume day-to-day operational data
such as banking transactions, Transactional, or OLTP, systems are the ideal choice.

True

False

Transactional, or OLTP, systems are designed and optimized for handling high-volume transactions.

Question 2
Which of the following is an example of unstructured data?

Zipped files

Week 2 - The Data Engineering Ecosystem 20


Video and Audio files

XML

Spreadsheets

Question 3
Which one of these file formats is independent of software, hardware, and operating systems, and can be viewed the
same way on any device?

XML

XLSX

PDF

Delimited text file

PDF format is independent of software, hardware, and operating systems, and can be viewed the same way on any
device.

Question 4
Which data source can return data in plain text, XML, HTML, or JSON among others?

APIs

Delimited text file

XML

PDF

APIs can return data in a wide variety of formats such as plain text, XML, HTML, or JSON among others.

Question 5
In the data engineer’s ecosystem, languages are classified by type. What are shell and scripting languages most
commonly used for?

Manipulating data

Building apps

Automating repetitive operational tasks

Querying data

Week 2 - The Data Engineering Ecosystem 21

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy