0% found this document useful (0 votes)
8 views19 pages

Data EcoSystem and LifeCyle

The document outlines the concepts of data ecosystems and the data life cycle, detailing the components and steps involved in each. It defines big data, describes the processes of data sensing, collection, wrangling, analysis, and storage, and presents the eight stages of the data life cycle. Additionally, it includes review questions and a summary of the main points covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views19 pages

Data EcoSystem and LifeCyle

The document outlines the concepts of data ecosystems and the data life cycle, detailing the components and steps involved in each. It defines big data, describes the processes of data sensing, collection, wrangling, analysis, and storage, and presents the eight stages of the data life cycle. Additionally, it includes review questions and a summary of the main points covered.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

CT127-3-2& Programming for Data Analysis

DATA ECOSYSTEM AND LIFE CYCLE

Module Code & Module Title Slide Title SLIDE 1


TOPIC LEARNING OUTCOMES

At the end of this topic, you should be able to:

1. Understand what is data eco system


2. Understand the steps involved in the data life cycle

Module Code & Module Title Slide Title SLIDE 2


Contents & Structure

• What is Big Data?


• What is Data EcoSystem?
• Components of the Data EcoSystem
• Data Life Cycle
• Steps in the Data Life Cycle

Module Code & Module Title Slide Title SLIDE 3


What is Big Data?

Bigdata is a data sets that are too large, complex and


dynamics for any conventional data tools to capture,
store, manage and analyse.
Module Code & Module Title Slide Title SLIDE 4
BIG DATA ECOSYSTEM

• Data ecosystem refers to a combination of enterprise infrastructure


and applications that is utilized to aggregate and analyze information.

• It includes programming languages, packages, algorithms, cloud-


computing services, and general infrastructure an organization uses
to collect, store, analyze, and leverage data

Module Code & Module Title Slide Title SLIDE 5


COMPONENTS OF DATA ECOSYSTEM

The concept of the data ecosystem is explored through the


lens of key stages in the data project life cycle: sensing,
collection, wrangling, analysis, and storage.

Module Code & Module Title Slide Title SLIDE 6


COMPONENTS OF DATA ECOSYSTEM
1. Sensing
Sensing refers to the process of identifying data sources for your
project. It involves evaluating the quality of data so you can better
understand whether it’s valuable.
This evaluation includes asking such questions as:
- Is the data accurate?
- Is the data recent and up to date?
- Is the data complete?
- Is the data valid? Can it be trusted?
Data can be sourced from
• Internal data sources: Proprietary databases, spreadsheets, and other resources that originate from within
your organization
• External data sources: Databases, spreadsheets, websites, and other data sources that originate from
outside your organization
• Software: Custom software that exists for the sole purpose of data sensing
• Algorithms: A set of steps or rules that automates the process of evaluating data for accuracy and
completion before it’s used
Module Code & Module Title Slide Title SLIDE 7
2. Collection
• Once a potential data source has been identified, data must
be collected
-manual or automated processes
• data scientists use programming languages to write software
designed to automate the data collection process. Eg: web
scraper

3. Wrangling
• Data wrangling is a set of processes designed to transform raw data
into a more usable format
• It may involve merging multiple datasets, identifying and filling gaps
in data, deleting unnecessary or incorrect data, and “cleaning” and
structuring data for future analysis.
data wrangling tools –Eg: OpenRefine, DataWrangler, and
Module Code & Module Title Slide Title SLIDE 8
4. Analysis
• After raw data has been inspected and transformed into a readily
usable state, it can be analyzed.
• Analysis can be done using Algorithms, statistical models,
visualization tools.

5. Storage

• Throughout all of the data life cycle stages, data must be stored in a
way that’s both secure and accessible
Cloud-based storage solutions: store data off-site and access it remotely
On-site servers: give organizations a greater sense of control over how data is
stored and used
Other storage media: includes hard drives, USB devices, CD-ROMs, and floppy
disks
Module Code & Module Title Slide Title SLIDE 9
Module Code & Module Title Slide Title SLIDE 10
DATA LIFE CYCLE
• All data projects follow the same basic life cycle from start to finish.
This life cycle can be split into eight common stages, steps, or
phases:

Module Code & Module Title Slide Title SLIDE 11


1. Generation
• For the data life cycle to begin, data must first be generated.
• Data generation occurs regardless of whether you’re aware of it,
especially in our increasingly online world.
• Some of this data is generated by your organization, some by your
customers, and some by third parties you may or may not be aware
of.
• Every sale, purchase, hire, communication, interaction—
everything generates data.
2. Collection
You can collect data in a variety of ways, including:
-Forms
-Surveys
-Interviews
-Direct Observation
Module Code & Module Title Slide Title SLIDE 12
3. Processing
• Once data has been collected, it must be processed. Data processing
can refer to various activities, including:
• Data wrangling, in which a data set is cleaned and transformed from its
raw form into something more accessible and usable. This is also known
as data cleaning, data munging, or data remediation.
• Data compression, in which data is transformed into a format that can
be more efficiently stored.
• Data encryption, in which data is translated into another form of code to
protect it from privacy concerns.
4. Storage
• After data has been collected and processed, it must be stored for
future use.
• This is most commonly achieved through the creation of databases or
datasets.
• These datasets may then be stored in the cloud, on servers, or using
Module Code & Module Title Slide Title SLIDE 13
5. Management

• Data management, also called database management, involves


organizing, storing, and retrieving data as necessary over the life of a
data project.
• It includes encryption, track who has accessed data and what
changes they may have made.
6. Analysis

• Data analysis refers to processes that attempt to retrieve meaningful


insights from raw data.
• Commonly used methods for analysis include statistical modeling,
algorithms, artificial intelligence, data mining, and machine learning.

Module Code & Module Title Slide Title SLIDE 14


Module Code & Module Title Slide Title SLIDE 15
7. Visualization

• Data visualization refers to the process of creating graphical


representations of your information, typically through the use of one
or more visualization tools.
• Visualizing data makes it easier to quickly communicate the analysis

8. Interpretation

• The interpretation phase of the data life cycle provides the


opportunity to make sense of your analysis and visualization.
• Beyond simply presenting the data, this is when you investigate it
through the lens of your expertise and understanding

Module Code & Module Title Slide Title SLIDE 16


Review Questions

-What is Big Data?


-What is Data EcoSsytems?
-What are the components of Data Ecosystems?
-What is Data Life Cycle?
-What are the steps involved in the Data Life Cycle?

Module Code & Module Title Slide Title SLIDE 17


Summary / Recap of Main Points

• Data Ecosystems and its Components


• Data Lifecycles and steps/phases involved in Data Lifecycle

Module Code & Module Title Slide Title SLIDE 18


What To Expect Next Week

In Class Preparation for Class


• Overview of Data Analytics

Module Code & Module Title Slide Title SLIDE 19

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy