5 Data Science Project Lifecycle
5 Data Science Project Lifecycle
347222 tweets
204,000,000.. emails
17361111 pictures
Transport
Government
In Data Science projects
we extract the knowledge and insights from
data by using scientific methods.
How do Data Scientist get useful insights
from data?
How do Data Scientist get useful insights
from data?
Curiosity
Curiosity: Only when you ask questions, you will have a better understanding of the
business problem.
Common Sense: To identify new ways to solve a business problems and to detect
priority problems.
Communication Skills: A Data Scientist needs to communicate their findings to
business teams to act upon the insights
Skills required for Data Scientist
Domain Knowledge:
• To get useful information out of raw data that
benefits a company’s business.
• Know about the business model of the
company .
• Ask the right questions to produce valuable
results.
Math Skills:
• Linear Algebra, Calculus, and other concepts
of mathematics help us to understand the
complex behavior of Machine Learning
algorithms.
• Probability and statistics are mainly used in
predictive modeling and clustering.
Skills required for Data Scientist
Computer Science:
• To implement Data Science techniques using programming
languages like Python, R, SQL, Scala, Julia, JavaScript, etc.
• To deal with varied databases and loud networks to process the
data.
• Knowledge about algorithms, relational and non-relational
databases, Distributed Computing, and Machine Learning.
Communication Skills:
• To have good communication when working in team.
• To draw conclusions from the data analysis and make
presentation.
Step 2: Data Collection
missing fields
improper values
setting the right format of the data
structuring data from raw files, etc.
Step 3:Cleaning data
Format the data into the desired structure, remove
unwanted columns and features.
Data preparation is the most time-consuming yet
arguably the most important step in the entire life
cycle.
Data collection, data understanding, and data
preparation take up to 70% — 90% of the overall
project time.
If you feel the data is not proper or enough for you to
proceed, you can go back to the data collection step.
Step 4: Analyzing Data
💡 Predictive Analytics
(what could happen in the future?)
Use statistical methods and other forecast techniques
💡 Prescriptive Analytics
(what should we do?)
Use optimization and simulation methods, what-if and
if- what analysis
Step 5: Data Modeling/ Machine Learning modeling
Step 5: Data Modeling
Supervised Learning:
• A technique to train the machine using labeled data.
• A few examples of Supervised Algorithms:
o Naive Bayes
o Random Forest
o Neural Network Algorithms
o k-Nearest Neighbor (kNN)
o Linear Regression
o Logistic Regression
o Support Vector Machines(SVM)
o Decision Trees
o Boosting
o Bagging
Step 5: Data Modeling
Unsupervised Learning:
• It involves training by using unlabeled data and allowing
the model to act on that information without guidance.
• Examples of Unsupervised Algorithms
o KMeans/ KMeans++
o Hierarchical Clustering
o Density Based Spatial Clustering of Applications with
Noise(DBSCAN)
Step 6: Model Evaluation