0% found this document useful (0 votes)

9 views21 pages

Data Science Methodology

The document outlines the methodology in data science, detailing a 10-stage iterative process for uncovering insights from data. It emphasizes the importance of business understanding, data collection, preparation, modeling, evaluation, deployment, and feedback in developing effective data-driven solutions. Each stage is crucial for ensuring that data scientists can address business problems efficiently and refine their models based on real-world performance.

Uploaded by

aniketsha784

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views21 pages

Data Science Methodology

Uploaded by

aniketsha784

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

M.

Sc (IT - AI/CC/Security) Semester I

DATA SCIENCE AND ANALYTICS

(DSA)

Ms. Pooja R. Tupe

Visiting Faculty ,UDIT, University of Mumbai.
TOPICS TO COVER
WHAT IS METHODOLOGY IN DATA
SCIENCE?

• Methodology :The best way to

organize your work, doing it better,
and without losing time.
• provides the data scientist with a
framework for how to proceed with
whatever methods, processes and
heuristics will be used to obtain
answers or results
WHAT IS METHODOLOGY IN DATA
SCIENCE?

• consists of 10 stages that form an iterative

process for using data to uncover insights.
• Each stage plays a vital role in the context of
the overall methodology.

1.From Problem to Approach

2.From Requirements to Collection
3.From Understanding to Preparation
4.From Modeling to Evaluation
5.From Deployment to Feedback

• It is highly iterative and never ends; that’s because in a real case study,
we have to repeat some steps to improve the model.
FROM PROBLEM TO APPROACH
STAGE 1: BUSINESS UNDERSTANDING

• Every customer’s request starts with a problem.

• Data Scientists’ job is first to understand it and approach this problem with statistical and
machine learning techniques.
STAGE 1: BUSINESS UNDERSTANDING

• Business Understanding : helps to clarify the goal of the customer.

• ask a lot of questions to the customer about every single aspect of the problem
• Clearly defined questions starts with understanding the goals.
• end of this stage, we will have a list of business requirements.
STAGE 1: BUSINESS UNDERSTANDING

• Goals :
• Specify the key variables that serve as the model targets. And specify the metrics of the
targets, which determine the success of the project.
• Identify the relevant data sources that the business has access to or needs to obtain.
STAGE 2: ANALYTIC APPROACH

• Analytic Approach: Define the analytic approach to solve the business problem.
• Contextual Expression: Express the problem in terms of statistical and machine-learning techniques.
• Pattern Identification: Identify the type of patterns needed to address the question effectively.
• Predictive Model: Used for determining probabilities.
• Descriptive Approach: Used for showing relationships.
• Statistical Analysis: Used for problems requiring counts.
• Algorithm Selection: Choose different algorithms based on the type of approach.
FROM REQUIREMENTS TO
COLLECTION

Once we have found a way to solve our problem, we will need

to discover the correct data for our model.
STAGE 3: DATA REQUIREMENTS

• identify the necessary data content, formats, and sources for initial data collection,
and we use this data inside the algorithm of the approach we chose.
• The chosen analytic approach determines the data requirements.
• Specifically, the analytic methods to be used require certain data content, formats
and representations, guided by domain knowledge
STAGE 4: DATA COLLECTION

• Identify and gather structured, unstructured, and semi-structured data relevant to the
problem domain.
• Data Scientists identify the available data resources relevant to the problem domain.
• Decide whether to invest in obtaining less-accessible data elements.
• To retrieve data, we can do web scraping on a related website, or we can use repository
with premade datasets ready to use.
• Usually, premade datasets are CSV files or Excel; if we want to collect data from any
website or repository, we should use Pandas, a useful tool to download, convert, and
modify datasets.
• Revise data requirements and collect new or additional data if there are gaps.
• Incorporating more data can help predictive models better represent rare events, such as
disease incidence or system failure.
FROM UNDERSTANDING TO
PREPARATION

data scientists
• use descriptive statistics and visualization techniques to
understand data better
• explore the dataset to understand its content, determine if
additional data is necessary to fill any gaps but also to
verify the quality of the data.
STAGE 5: DATA UNDERSTANDING

• data scientists use descriptive statistics and visualization techniques to understand

the data content, assess data quality and discover initial insights about the data.
• Additional data collection may be necessary to fill gaps.
• check the type of each data and to learn more about the attributes and their names.
STAGE 6: DATA PREPARATION

• It sets the foundation for the subsequent modeling phase as it encompasses all
activities to construct the data set.
• Data Cleaning: Handling missing or invalid values, removing duplicates, and
ensuring proper formatting.
• Combining Data: Integrating data from various sources like files, tables, and
platforms.
• Transforming Data: Creating more useful variables through feature engineering,
which involves using domain knowledge and existing variables.
• Text Analytics: Converting unstructured or semi-structured text data into
structured variables to enhance model accuracy.
• Automation: Automating common data preparation steps to save time and
improve efficiency.
• High-Performance Systems: Utilizing advanced systems and analytics to handle
large datasets more effectively.
STAGE 7: MODELING

• Model Development: This stage involves creating predictive or descriptive

models based on the analytic approach defined earlier.
• Training Set: Predictive models are built using historical data where the
outcome is known.
• Iterative Process: The modeling process is iterative, with intermediate
insights leading to refinements in both data preparation and model
specification.
• Algorithm Selection: Data scientists experiment with multiple algorithms
and their parameters to identify the best model for the given variables.
STAGE 8: EVALUATION

• In the Model Evaluation stage, data scientists can evaluate the model in two
ways: Hold-Out and Cross-Validation.
• In the Hold-Out method, the dataset is divided into three subsets: a training
set as we said in the modeling stage;
• a validation set that is a subset used to assess the performance of the model
built in the training phase;
• a test set is a subset to evaluate the likely future performance of a model.
FROM DEPLOYMENT TO
FEEDBACK

Data scientists have to make the stakeholders familiar with the tool
produced in different scenarios,
so once the model is evaluated and the data scientist is confident it
will work, it is deployed and put to the ultimate test.
STAGE 9: DEPLOYMENT

• Approval and Deployment: Once the model is approved by business sponsors, it

is deployed into a production or test environment.
• Limited Deployment: Initially, the model is deployed in a limited manner to fully
evaluate its performance.
• Deployment Methods: Deployment can range from generating reports with
recommendations to embedding the model into applications or systems.
• This stage ensures that the model is effectively integrated into the business
processes and its performance is closely monitored.
STAGE 10: FEEDBACK

• Feedback Collection: After deploying the model, organizations collect results

to evaluate its performance and impact. For example, response rates to a
promotional campaign can provide valuable feedback.
• Model Refinement: Analyzing this feedback helps data scientists refine the
model to enhance its accuracy and usefulness.
• Automation: Automating feedback collection, model assessment, refinement,
and redeployment can accelerate the process, leading to more timely and
effective model updates.
• This continuous loop of feedback and refinement ensures that the model
remains relevant and effective in changing environments.

PM Unit 1
No ratings yet
PM Unit 1
41 pages
Capstone Project - Unit2
No ratings yet
Capstone Project - Unit2
81 pages
Unit 2 PPT (BA)
No ratings yet
Unit 2 PPT (BA)
33 pages
CSCI946 w3 - DataPrep
No ratings yet
CSCI946 w3 - DataPrep
58 pages
Bsd1313 Chapter 3
No ratings yet
Bsd1313 Chapter 3
74 pages
Team1 - Data Science Methodology
No ratings yet
Team1 - Data Science Methodology
39 pages
Unit 3 (DS)
No ratings yet
Unit 3 (DS)
32 pages
Introduction To Data Science Methodology
No ratings yet
Introduction To Data Science Methodology
45 pages
Bd4151 Foundations of Data Science
No ratings yet
Bd4151 Foundations of Data Science
70 pages
DS Handout 2
No ratings yet
DS Handout 2
5 pages
Capstone Project
No ratings yet
Capstone Project
28 pages
Data Science
100% (2)
Data Science
33 pages
Liceria Tech
No ratings yet
Liceria Tech
12 pages
Module I (Introduction Data Analytics Life Cycle) Part II
No ratings yet
Module I (Introduction Data Analytics Life Cycle) Part II
103 pages
Module 1B
No ratings yet
Module 1B
65 pages
Data Science Process
No ratings yet
Data Science Process
101 pages
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
No ratings yet
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
63 pages
Ds 3
No ratings yet
Ds 3
9 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
16 pages
EDA in DATA Analytics
No ratings yet
EDA in DATA Analytics
11 pages
Unit 1.2 Layered Framework
No ratings yet
Unit 1.2 Layered Framework
32 pages
5 Data Science Project Lifecycle
No ratings yet
5 Data Science Project Lifecycle
33 pages
Ads Imp Qna 2025 15 04 06 06 35
No ratings yet
Ads Imp Qna 2025 15 04 06 06 35
33 pages
22UCS303 DS-Unit II-N
No ratings yet
22UCS303 DS-Unit II-N
71 pages
2 - BBDS - Decisions Management & Problem Framing
No ratings yet
2 - BBDS - Decisions Management & Problem Framing
78 pages
Data Science Tools Final
No ratings yet
Data Science Tools Final
11 pages
Data Science Methodology
No ratings yet
Data Science Methodology
4 pages
Life Cycle
No ratings yet
Life Cycle
35 pages
Data Science Methodology
No ratings yet
Data Science Methodology
3 pages
Part1 Ds ML Introduction
No ratings yet
Part1 Ds ML Introduction
61 pages
Data Science Process
No ratings yet
Data Science Process
7 pages
DS Unit 1
No ratings yet
DS Unit 1
26 pages
Lecture02 Frameworks Platforms-Part1
No ratings yet
Lecture02 Frameworks Platforms-Part1
40 pages
Module1 Data Science
No ratings yet
Module1 Data Science
15 pages
Unit 1
No ratings yet
Unit 1
11 pages
Cyber Awareness
No ratings yet
Cyber Awareness
12 pages
Mini Project
100% (1)
Mini Project
53 pages
6 - Data Science Methodology
No ratings yet
6 - Data Science Methodology
20 pages
Data Science Methodology
No ratings yet
Data Science Methodology
26 pages
ModelSim Users Manual v10.1c PDF
No ratings yet
ModelSim Users Manual v10.1c PDF
733 pages
W3 - DA Life Cycle
No ratings yet
W3 - DA Life Cycle
49 pages
FDS Introduction
No ratings yet
FDS Introduction
41 pages
Unit 2 - DS - 1st Year
No ratings yet
Unit 2 - DS - 1st Year
7 pages
2024 25 DCIG TOP 5 Enterprise VVA Global Edition Scale Computing FINAL
No ratings yet
2024 25 DCIG TOP 5 Enterprise VVA Global Edition Scale Computing FINAL
15 pages
EBook - Data Science 4
No ratings yet
EBook - Data Science 4
14 pages
Dsur Ea2352001010391 W3
No ratings yet
Dsur Ea2352001010391 W3
3 pages
DTS Modul Data Science Methodology
100% (1)
DTS Modul Data Science Methodology
56 pages
Module 5 - Data Science Methodologies
No ratings yet
Module 5 - Data Science Methodologies
9 pages
UNIT 6 Spreadsheets and Database Packages
No ratings yet
UNIT 6 Spreadsheets and Database Packages
15 pages
Exploratory Data Analysis (Eda)
No ratings yet
Exploratory Data Analysis (Eda)
10 pages
T Assignment
No ratings yet
T Assignment
5 pages
Unit2 DATA SCIENCE
No ratings yet
Unit2 DATA SCIENCE
8 pages
Big Data
No ratings yet
Big Data
4 pages
Data Science: Lesson 5
No ratings yet
Data Science: Lesson 5
6 pages
Xii Analytical Approach
No ratings yet
Xii Analytical Approach
3 pages
Odoo Development
No ratings yet
Odoo Development
151 pages
3 - The Data Science Method
No ratings yet
3 - The Data Science Method
8 pages
Obi Odi Lineage
No ratings yet
Obi Odi Lineage
31 pages
Data Science Lifecycle
No ratings yet
Data Science Lifecycle
3 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
HTTTTC - Final Exam
No ratings yet
HTTTTC - Final Exam
4 pages
Object Oriented ABAP Part2
No ratings yet
Object Oriented ABAP Part2
49 pages
Session Summary CRISP Data Mining: Business Understanding
No ratings yet
Session Summary CRISP Data Mining: Business Understanding
4 pages
Information System Development
No ratings yet
Information System Development
15 pages
IP Prctical File Class XI
No ratings yet
IP Prctical File Class XI
7 pages
Quoc
No ratings yet
Quoc
20 pages
Cloud Computing Infrastructure As A Service (IaaS)
No ratings yet
Cloud Computing Infrastructure As A Service (IaaS)
5 pages
Data Science Methodolgy
No ratings yet
Data Science Methodolgy
12 pages
Database Management System (Question and Solutions)
No ratings yet
Database Management System (Question and Solutions)
13 pages
Larman Chapter 1
No ratings yet
Larman Chapter 1
11 pages
AIS Notes
No ratings yet
AIS Notes
21 pages
VB Syllabus
No ratings yet
VB Syllabus
2 pages
Release Notes
No ratings yet
Release Notes
14 pages
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
No ratings yet
IBM Q1 Technical Marketing ASSET2 - Data Science Methodology-Best Practices For Successful Implementations Ov37176 PDF
6 pages
Power Bi Bits
No ratings yet
Power Bi Bits
25 pages
3 ERP Architecture L3
No ratings yet
3 ERP Architecture L3
5 pages
Uiu Manual
No ratings yet
Uiu Manual
5 pages
Rapid Bottleneck Identification: A Better Way To Load Test
No ratings yet
Rapid Bottleneck Identification: A Better Way To Load Test
8 pages
Im1 - Chapter 1
No ratings yet
Im1 - Chapter 1
8 pages
Azure GDPR
No ratings yet
Azure GDPR
2 pages
SLIMS: An Open Source Library Management System: March 2019
No ratings yet
SLIMS: An Open Source Library Management System: March 2019
11 pages
Transaction Sources
No ratings yet
Transaction Sources
6 pages
Installing Freepbx 2.8 With Asterisk 1.8 On Centos 5.5
No ratings yet
Installing Freepbx 2.8 With Asterisk 1.8 On Centos 5.5
6 pages
Netapp Management Solutions Suite For Vmware Vsphere: Datasheet
No ratings yet
Netapp Management Solutions Suite For Vmware Vsphere: Datasheet
4 pages
Abhiraj G Vinnakota
No ratings yet
Abhiraj G Vinnakota
1 page
Ankitseth SAP Basis
No ratings yet
Ankitseth SAP Basis
2 pages
Basic .NET, ASP - Net, OOPS and SQL Server Interview Questions and Answers
No ratings yet
Basic .NET, ASP - Net, OOPS and SQL Server Interview Questions and Answers
11 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
From Everand
IGNOU BCA System Analysis and Design Previous Year Solved Papers MCS 014
Manish Soni
No ratings yet
Elicitation Techniques for Business Analysis
From Everand
Elicitation Techniques for Business Analysis
Kadir Çamoğlu
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Science Methodology

Uploaded by

Data Science Methodology

Uploaded by

M.

Sc (IT - AI/CC/Security) Semester I

DATA SCIENCE AND ANALYTICS

Ms. Pooja R. Tupe

• Methodology :The best way to

• consists of 10 stages that form an iterative

1.From Problem to Approach

• Every customer’s request starts with a problem.

• Business Understanding : helps to clarify the goal of the customer.

Once we have found a way to solve our problem, we will need

• data scientists use descriptive statistics and visualization techniques to understand

• Model Development: This stage involves creating predictive or descriptive

• Approval and Deployment: Once the model is approved by business sponsors, it

• Feedback Collection: After deploying the model, organizations collect results

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.