0% found this document useful (0 votes)
14 views41 pages

FDS Introduction

Data science uses data to answer questions through regression, classification, clustering, anomaly detection and recommendation. It involves gathering, cleaning, analyzing and interpreting data before visualizing results and taking action. The goal is to discover patterns and insights to help decision making.

Uploaded by

rajubai630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views41 pages

FDS Introduction

Data science uses data to answer questions through regression, classification, clustering, anomaly detection and recommendation. It involves gathering, cleaning, analyzing and interpreting data before visualizing results and taking action. The goal is to discover patterns and insights to help decision making.

Uploaded by

rajubai630
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Fundamentals of Data

Science
We typically use data science to answer five types of
questions:
1. How much or how many? (regression)
2. Which category? (classification)
3. Which group? (clustering)
4. is this weird? (anomaly detection)
5. Which option should be taken? (recommendation)
https://www.import.io/wp-content/uploads/2019/09/data-analysis-blog.jpg

Data analysis is the process of


collecting and organizing data in order
to draw helpful conclusions from it.

https://www.techrepublic.com/article/why-your-data-analysis-may-be-doomed-from-the-start/
What is Data Science?
Data Science is a combination of various tools, algorithms, and machine learning

principles with the goal to discover hidden patterns from the raw data.
▶ Data Analyst usually explains
what is going on by
processing history of the
data.


▶ Data Scientist not only does
the exploratory analysis to
discover insights from it, but
also uses various advanced
machine learning algorithms
to identify the occurrence of
a particular event in the
future.
https://data-flair.training/blogs/data-science-applications/
We typically use data science to answer five types of
questions:
1. How much or how many? (regression)
2. Which category? (classification)
3. Which group? (clustering)
4. is this weird? (anomaly detection)
5. Which option should be taken? (recommendation)
Data Analysis consists of the following phases:
1. Data Requirement Gathering
2. Data Collection
3. Data Cleaning
4. Data Analysis
5. Data Interpretation
6. Data Visualization
1. Data Requirement
Gathering:
To think about why do you want
to do this data analysis?

To decide which type of data


analysis you wanted to do!

In this phase, you have to decide


what to analyze and how to
measure it, you have to
understand why you are
investigating and what measures
you have to use to do this
Analysis.
To boost the sales of Mr. X’s retail store, factors
affecting the sales could be:

● Store location
● Staff
● Working hours
● Promotions
● Product placement
● Product pricing
● Competitors’ location and promotions, and so on
Every domain and business works with a
set of rules and goals.

Asking questions about the dataset will help


in narrowing it down to correct data
acquisition.

● Uber — What percentage of time do


drivers actually drive? How steady is
their income?
● Oyo Hotels — What is the average
occupancy of mediocre hotels?
2. Data Collection
• Now it's time to collect
your data based on
requirements.

• The collected data must be


processed or organized for
analysis.

• As the data is collected


from various sources, you
must have to keep a log
with a collection date and
source of the data. https://www.globalpatron.com/images/10-best-data-collection-forms-1024x576.png
Data might need to be collected from multiple types of
data sources.
● File format Data(Spreadsheet, CSV, Text files, XML,
jSON)
● Relational Database
● Non-relational Database(NoSQL)
● Scraping Website Data using tools

Big data is characterized by 4 different properties. These


properties are defined by 4 V’s.
● Volume: Data in terabytes
● Velocity: Streaming data with high throughput
● Variety: Structured, semi-structured, and unstructured
● Veracity: Quality of the data that is being analyzed
3. Data Cleaning:
Collected data may not be
useful or irrelevant to
Analysis, hence it should be
cleaned.

The data which is collected


may contain duplicate records,
white spaces or errors.

This phase must be done


before Analysis because based
on data cleaning, the output
of Analysis depends.
In this step, we understand more about
the data and prepare it for further
analysis.

Data collection, data understanding, and


data preparation take up to 70% — 90%
of the overall project time.
Some of the standard practices involved to
understand, clean, and prepare your data for
building your predictive model:

1. Variable Identification
2. Univariate Analysis
3. Bi-variate Analysis
4. Missing values treatment
5. Outlier treatment
6. Variable transformation
7. Variable creation
4. Data Analysis:
As we have manipulated the
data, now this is the time
analysis that the data have
the exact information what
we want or need collect
more data.

Use data analysis tools and


software which will help you
to understand, interpret,
and derive conclusions https://www.searchenginejournal.com/top-data-analysis-mistakes-digital-marketing/407040/#close

based on the requirements.


To understand the data a lot of people look at the
data statistics like mean, median, etc. and plot the
data and look at its distribution through plots like
histogram, spectrum analysis, population distribution,
etc.

Different types of analytics may include as below:


● Descriptive Analytics (what happened?)
● Diagnostic Analysis (why did it happened?)
● Predictive Analytics (what will happen?)
● Prescriptive Analytics (what should we do?)
Data Modelling/ Machine Learning modeling

Modeling is used to find patterns or behaviors in data.


These patterns either help us in one of two ways

1. Descriptive modeling (Unsupervised learning) --


Recommender systems that are if a person liked the
movie Matrix they would also like the movie
Inception or Avengers.

2. Predictive modeling (Supervised Learning)-- This


involves getting a prediction on future trends e.g.
linear regression where we might want to predict
stock exchange values
Supervised Learning: Unsupervised Learning:
Supervised learning is a technique Unsupervised learning involves
in which we teach or train the training by using unlabeled data and
machine using data, which is well allowing the model to act on that
labeled. information without guidance. Think of
eg unsupervised learning as a smart kid
1. Naive Bayes that learns without any guidance.
2. Random Forest eg
3. Neural Network Algorithms 1. PCA
4. k-Nearest Neighbor (kNN) 2. KMeans/Kmeans++
5. Linear Regression 3. Hierarchical Clustering
6. Logistic Regression 4. DBSCAN
7. Support Vector Machines(SVM) 5. Market Basket Analysis
8. Decision Trees
9. Boosting
10. Bagging
Model Evaluation
● Based on the business problem,
models could be selected.

● It is essential to identify what is


the task, is it a classification
problem, regression or
prediction problem, time
series forecasting, or a
clustering problem.
A few examples of Classification metrics: A few examples of Regression
1. Classification Accuracy metrics:
2. Confusion matrix 1. Mean Absolute Error (or MAE)
3. Logarithmic Loss(Log Loss) 2. Mean Square Error (MSE)
4. Area under curve (AUC) 3. Root Mean Squared Error
5. F-Measure (F1 Score) (RMSE)
6. Precision 4. MAPE
7. Recall

The model should be a robust one and not an overfitted model. If it is an


overfitted model then predictions for future data will not come out accurately.
Data Interpretation

Choose a way to express


or communicate either
using simply in words or
maybe a table or chart.

Then use the results of


your data analysis process
to decide your best
course of action.
https://www.datapine.com/blog/data-interpretation-methods-benefits-problems/
Importance of Data Interpretation
1. Make better decisions
2. Find trends and take action
3. Better resource allocation
Steps:

1. Gather the data


2. Develop your discoveries
3. Draw Conclusions
4. Give recommendations

Data interpretation is an essential factor in data-driven decision-making. It should be


performed on a regular basis as part of an iterative interpretation process
Data Visualization
Data visualization is very
common in your day to day
life; they often appear in the
form of charts and graphs.

Data visualization often used


to discover unknown facts and
trends.

By observing relationships and


comparing datasets, you can
https://www.techfunnel.com/martech/data-visualization/
find a way to find out
meaningful information.
Common visualization techniques
● Tables
● Pie charts and stacked bar charts
● Line charts and area charts
● Histograms
● Scatter plots
● Heat maps
● Tree maps
Heat Maps
Tree Maps
Driving insights and BI reports

In this process, technical skills only are not


sufficient. One essential skill you need is to be
able to tell a clear and actionable story.
Model Deployment

After building models it is first


deployed in a pre-production or
test environment before actually
deploying them into production.

Whatever the shape or form in


which your data model is deployed
it must be exposed to the real
world. Once real humans use it, you
are bound to get feedback.
Capturing this feedback translates
directly to life and death for any
project
Taking actions

Actionable insights from the model show


how Data Science has the power of doing
predictive analytics and prescriptive
analytics.

Few Examples are:


1. How much stock of item X do we need to
1.have in inventory? How much discount

should be given to item X to boost its


sales and maintain the trade-off between
discount and profit?
2. How much attrition is predicted and what
can be done to avoid the same?

Video
Applications of data analytics – Other applications

• Policing/Security
• Manage Risk
• Delivery Logistics
• Web Provision
• Customer Interactions
• Energy Management
• Gaming
• Speech Recognition

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy