0% found this document useful (0 votes)

7 views54 pages

Data Analytics Part 3 (1)

The document discusses the complexities of the machine learning application software development lifecycle (MLASDLC) and highlights the differences between traditional software engineering and machine learning methodologies. It emphasizes the importance of data analysis in driving business success and outlines the data analysis process, including identification, collection, cleaning, analysis, and interpretation. Additionally, it covers various machine learning algorithms and models, including supervised, unsupervised, and reinforcement learning, while providing examples of predictive data analysis using linear regression.

Uploaded by

nirthisingh58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views54 pages

Data Analytics Part 3 (1)

Uploaded by

nirthisingh58

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 54

THE MACHINE

LEARNING
LIFECYCLE
Linear Regression
Decision Tree Analysis
Read Article titled: Agile Software Development Lifecycle Model for Machine
Learning (ASDLMML)
Machine learning application software
development life cycle (MLASDLC).
■ The complexity of building and integrating machine learning
applications is challenging to software engineering teams.
■ The inherent differences between software engineering and machine
learning do not allow software engineering methodologies to be
applied uniformly.
■ Whereas software engineering is dependent on software design,
development and testing, machine learning model development is
based on data and model design, training, evaluation, deployment,
and monitoring. Machine learning systems are non-deterministic and
are therefore difficult to build using sequential development methods
■ Data, hidden technical debt and the need for iterative experimentation
are the main technical challenges of machine learning development.
ML & Data Analytics
■ In this data-rich age, understanding how to analyze and extract
true meaning from the business’s digital insights is one of the
primary drivers of success.
■ Despite the colossal volume of data created every day, a mere
0.5% is actually analyzed and used for data discovery,
improvement, and intelligence.
■ 1st there is data analytics – then there is the challenge of using
the analytics in an automated manner – resulting in ML
■ ML has a pivotal dependence on data analytics – because the
machine learning models are informed by data analytics
processes and models – the ML lifecycle model is different from
a traditional systems lifecycle model
Traditional vs ML
■ The traditional lifecycle model…

■ The ML lifecycle model…

The Integrated Model
■ Read the article by Ranuwana and Karunananda (2021) for a
full explanation of this model
Business meets Science
■ In business, project management takes precedence – activities are
highly manageable
■ In the world of science project management is the least of concerns -
data analysis uses a more complex approach with advanced
techniques to explore and experiment with data with no/little timeline
specifications
■ On the other hand, in a business context, data is used to create
management protocols so that there is an optimal and efficient use of
resources that will enable the company to improve its overall
performance and profit margin – science improves the quality of
knowledge
■ Ideally – examine an analysis of data from a business point of view
while still going through the scientific and statistical foundations that
are fundamental to understanding the basics of data analysis.
Why Is Data Analysis Important?
■ Informed decision-making: From a management perspective, you can benefit from
analyzing your data as it helps you make decisions based on facts and not simple
intuition. For instance, you can understand where to invest your capital, detect growth
opportunities, predict your incomes, or tackle uncommon situations before they become
problems…explicit vs tacit knowledge?
■ Reduce costs: Another great benefit is to reduce costs. With the help of advanced
technologies such as predictive analytics, businesses can spot improvement
opportunities, trends, and patterns in their data and plan their strategies accordingly. In
time, this will help you save money and resources on implementing the wrong strategies.
And not just that, by predicting different scenarios such as sales and demand you can
also anticipate production and supply.
■ Target customers better: Customers are arguably the most crucial element in any
business. By using analytics to get a 360° vision of all aspects related to your customers,
you can understand which channels they use to communicate with you, their
demographics, interests, habits, purchasing behaviors, and more. In the long run
What Is The Data Analysis Process?
The DA Process
■ Identify: Before you get your hands dirty with data, you first need to identify why do you need it in the
first place. The identification is the stage in which you establish the questions you will need to answer.
For example, what is the customer's perception of our brand? Or what type of packaging is more
engaging to our potential customers? Once the questions are outlined you are ready for the next step.
■ Collect: As its name suggests, this is the stage where you start collecting the needed data. Here, you
define which sources of information you will use and how you will use them. The collection of data can
come in different forms such as internal or external sources, surveys, interviews, questionnaires, focus
groups, among others. An important note here is that the way you collect the information will be
different in a quantitative and qualitative scenario.
■ Clean: Once you have the necessary data it is time to clean it and leave it ready for analysis. Not all the
data you collect will be useful, when collecting big amounts of information in different formats it is very
likely that you will find yourself with duplicate or badly formatted data. To avoid this, before you start
working with your data you need to make sure to erase any white spaces, duplicate records, or
formatting errors. This way you avoid hurting your analysis with incorrect data.
■ Analyze: With the help of various techniques such as statistical analysis, regressions, neural networks,
text analysis, and more, you can start analyzing and manipulating your data to extract relevant
conclusions. At this stage, you find trends, correlations, variations, and patterns that can help you
answer the questions you first thought of in the identify stage. Various technologies in the market assists
researchers and average business users with the management of their data. Some of them include
business intelligence and visualization software, predictive analytics, data mining, among others.
■ Interpret: Last but not least you have one of the most important steps: it is time to interpret your results.
This stage is where the researcher comes up with courses of action based on the findings. For example,
here you would understand if your clients prefer packaging that is red or green, plastic or paper, etc.
Additionally, at this stage, you can also find some limitations and work on them.
Quantitative Data Analysis
■ In Research
– 2 main types are descriptive and inferential
– Data analysis entails a population(entire
group of people/subject you’re interested in)
and a sample (subset of the population)
– Descriptive statistics focuses on describing the
sample, while inferential statistics aim to make
predictions about the population,
The Essential Types Of Data Analysis
Methods
■ 1) Descriptive analysis - What happened.
– The descriptive analysis method is the starting point to any analytic
reflection, and it aims to answer the question of what happened?
– It does this by ordering, manipulating, and interpreting raw data
from various sources to turn it into valuable insights for your
organization.
– Performing descriptive analysis is essential, as it allows us to
present our insights in a meaningful way.
– Although it is relevant to mention that this analysis on its own will
not allow you to predict future outcomes or tell you the answer to
questions like why something happened, it will leave your data
organized and ready to conduct further investigations.
Descriptive
■ Mean – this is simply the mathematical average of a range of numbers.
■ Median – this is the midpoint in a range of numbers when the numbers are
arranged in numerical order. If the data set makes up an odd number, then the
median is the number right in the middle of the set. If the data set makes up
an even number, then the median is the midpoint between the two middle
numbers.
■ Mode – this is simply the most commonly occurring number in the data set.
■ Standard deviation – this metric indicates how dispersed a range of numbers
is. In other words, how close all the numbers are to the mean (the average).
■ In cases where most of the numbers are quite close to the average, the
standard deviation will be relatively low.
■ Conversely, in cases where the numbers are scattered all over the place, the
standard deviation will be relatively high.
■ Visualisation – Pie, Bar, Histograms
Popular options for analysis
■ MsExcel
■ SPSS
Number Gender Age Weight

■ Python 1 Male
2 Male
28
27
65
65
3 Female 39 61

■ PowerBi 4 Female
5 Female
34
43
50
65
6 Male 48 72
7 Female 41 55
8 Female 52 55
9 Male 39 68
10 Female 48 68
Descriptive Sample
Machine learning Algorithms
■ Supervised learning: Supervised learning occurs when an algorithm is trained using
“labeled data,” or data that is tagged with a label so that an algorithm can successfully
learn from it. Training labels help the eventual machine learning model know how to
classify data in the manner that the researcher desires.
■ Unsupervised learning: Unsupervised algorithms use unlabeled data to train an algorithm.
In this process, the algorithm finds patterns in the data itself and creates its own data
clusters. Unsupervised learning and pattern recognition are helpful for researchers who
are looking to find patterns in data that are currently unknown to them.
■ Semi-supervised learning: Semi-supervised learning uses a mix of labeled and unlabeled
data to train an algorithm. In this process, the algorithm is first trained with a small
amount of labeled data before being trained with a much larger amount of unlabeled
data.
■ Reinforcement learning: Reinforcement learning is a machine learning technique in which
positive and negative values are assigned to desired and undesired actions. The goal is to
encourage programs to avoid the negative training examples and seek out the positive,
learning how to maximize rewards through trial and error. Reinforcement learning can be
used to direct unsupervised machine learning.
machine learning models

■ There are two types of problems that dominate

machine learning:
– classification and prediction.
■ Occasionally, the same algorithm can be used to
create either classification or regression models,
depending on how it is trained.
Classification & Prediction
■ Classification ■ Regression
models models
■ Logistic ■ Linear
regression regression
■ Naive Bayes ■ Ridge
regression
■ Decision trees ■ Decision trees
■ Random forest ■ Random forest
■ K-nearest ■ K-nearest
neighbor (KNN) neighbor (KNN)
■ Support vector ■ Neural network
machine regression
Pearson Correlation- a simple form of
prediction
■ import pandas as pd
■ df = pd.read_csv('Heights.csv')
■ df.plot(kind='scatter',x='Female_Height', y='Male_Height',figsize=(10,6));
■ print(df.corr())
Getting the significance of the correlation
■ import pandas as pd

■ from scipy.stats import pearsonr

■ df = pd.read_csv('Heights.csv')

■ df.plot(kind='scatter',x='Female_Height', y='Male_Height',figsize=(10,6));

■ print(df.corr())

■ corr,sig=pearsonr(df['Male_Height'],df['Female_Height'])

■ print("Male Heights vs Female Heights is SIGNIFICANT at the 95% confidence lebel", pearsonr(df['Male_Height'],df['Female_Height']))

■ print(round(sig,3))
Linear/multiple regression
■ Simple linear regression is a function that allows an analyst or
statistician to make predictions about one variable based on the
information that is known about another variable.
■ Linear regression can only be used when one has two continuous
variables—an independent variable and a dependent variable.
■ The independent variable is the parameter that is used to calculate
the dependent variable or outcome….
– y=mx+c…OR y=2x+3…y(profit) = 2*x(no of customers) + 3
■ A multiple regression model extends to several explanatory variables.
– Y=2x+3k+4z+c ….the dependent variable is determined by more
than one independent variable
Linear/multiple regression
■ This is an example of predictive data analysis –
also creates an opportunity for machine learning
(ML)
■ Take the example of vehicles and carbon
emission – let us suppose that we need to find a
model that will predict the level of carbon
emission of a vehicle if we know the weight and
engine capacity
The data as a csv
■ Use the pandas to arrange the data into data frames
■ import pandas as pd
■ df=pd.read_csv("cars.csv")
■ df
Get the main data items
■ To build the model – we need to isolate the
independent and the dependent variables
■ The volume and weight are the independent variables
■ The CO2 is the dependent variable…use X to
represent the independent variable and y to
represent the dependent variable
■ X = df[['Weight', 'Volume']]
y = df['CO2']
Using a ML Library
from sklearn import linear_model
regr = linear_model.LinearRegression()
X = df[['Weight','Volume']]
y = df['CO2']
regr.fit(X, y)
weight = input ('Enter the Car weight')
Engine=input('Enter the Engine Capacity in ccm')
predictedCO2 = regr.predict([[int(weight), int(Engine)]])
print(predictedCO2)
ML - The main data analysis route
■ Use pandas to import data file
■ Use seaborn and matplotlib to do the serious
stats
■ E.g. – multiple regression
– Similar to linear regression but with more than
one independent value, meaning that we try to
predict a value based on two or
more variables.
Getting the data
■ Another example involving more variables – The decision is to determine the most cost-effective method for advertising
■ The independent data is the cost of advertising via:
– TV
– Radio
– Newspaper
■ The dependent variable is an amount that represents sales

import pandas as pd
df=pd.read_csv("Advertising.csv", index_col="No")
df
Checking the head & tail
■ df.head(10)….df.tail(8)
Scatter plot TV vs Sales
df.plot(kind='scatter',x='TV', y='sales',figsize=(10,6),color='Red');
Matplot and seaborn
■ Pandas is quite good but there is another library
that is better for plotting – named matplotlib and
also seaborn which is even better
■ import seaborn as sns
■ sns.pairplot(df,x_vars=['TV','radio','newspaper'],y_vars='sales')
The color option
■ import seaborn as sns
■ sns.pairplot(df,x_vars=['TV','radio','newspaper'],y_vars='sales', kind='reg',plot_kws={'line_kws':{'color':'red'}} )
Tutorial Exercise
■ Use the Advertising data (csv download from
Learn) and build the predictive model that will
predict the sale given the cost of TV, Radio and
Newspaper
Another Example of ML
■ The file contains information about passengers who were on board the
Titanic when the collision took place.
■ We will use this data to perform exploratory data analysis
in Python and better understand the factors that contributed to a
passenger’s survival of the incident.
■ The idea here is to use the passengers details (independent variable)
to predict whether the passenger survived (dependent variable)
■ In this scenario – we have 2 datasets
– The 1st is a training dataset with full data including whether the
passenger has survived
– The 2nd is a sample dataset with the dependent variable missing
(i.e. the survival indicator is omitted)
The Titanic
import os
os.chdir("C:\SE_2025\Python\Data")
import pandas as pd
df=pd.read_csv('train.csv')
df.head()

■ There is a problem with missing Data!

df.info()
The Aggregates
■ Count: the number of rows in the dataset that are populated with non-null values. There
are 891 unique passenger IDs in this dataset. All the other variables also have 891 rows
of data populated, with the exception of ‘Age’which only has 714 rows. This means that
there are 177 passengers in the dataset who aren’t tagged with an age value.
■ Mean: the mean value in each column. The mean age of passengers aboard the Titanic,
for example, was 30.
■ Std: how much deviation each column has from the mean.
■ Min: the minimum value of each variable. For example, the minimum value for ‘SibSp’ is 0,
meaning that there were passengers who traveled without their siblings and spouses.
■ 25%, 50%, and 75%: the 1st quartile, 2nd quartile (median), and 3rd quartile.
■ Max: the highest value for each variable in the dataset. From the data frame above, we
can see that the oldest passenger aboard the Titanic was 80 years old.
Cleaning the Data – especially the missing ones!
■ Note: Notice that we are creating a copy of the data
frame before removing missing values. This is done
so that the original frame isn’t tampered with and we
can go back to it anytime without losing valuable
data. It is often a best practice to create a copy
before performing data manipulation.
df2 = df.copy()
df2 = df2.dropna()
df2.info()
Notice that earlier there were 418 rows. By dropping rows with
missing values, we have dramatically reduced the size of this
data frame by more than half. This isn’t a good practice. We
lose a lot of valuable data by simply removing rows that contain
missing values.
The missing values
■ Data preprocessing is one of the most important steps
when conducting any kind of data science activity. Earlier,
we noticed that the ‘Age’ column had some missing
values in it. Let’s dive deeper to see if there are any
further inconsistencies in our dataset.
■ df.isnull().sum()
■ As a result, we see that there are 3 columns with missing
values — Age, Cabin, and Embarked:
■ We can deal with these missing values in a few different
ways. The simplest option is to simply drop all the rows
that contain missing values.
Data Imputation
■ Let’s try a second approach — imputation. In other
words, the process of replacing missing data with
substituted values.
■ First, impute missing values in the ‘Age’ column. We
will use mean imputation in this case — substituting
all the missing age values with the average age in the
dataset.
■ We can do this by running the following line of code:
■ df3 = df.copy()
■ df3['Age'] = df3['Age'].fillna(df3['Age'].mean())
Imputation ….cont’d
■ Now, let’s move on to the ‘Cabin’ column. We will
replace the missing values in this column with
the majority class:
■ We can do the same for ‘Embarked’:
df3['Cabin'] = df3['Cabin'].fillna(df3['Cabin'].value_counts().index[0])
df3['Embarked'] = df3['Embarked'].fillna(df3['Embarked'].value_counts().index[0])
Univariate Analysis
■ Univariate analysis is the process of performing a
statistical review on a single variable.
■ We will start by creating a simple visualization to
understand the distribution of the ‘Survived’ variable
in the Titanic dataset.
■ Our aim is to answer simple questions with the help
of available data, such as:
– How many passengers survived the Titanic
collision?
■ Were there more fatalities than survivors?
Let’s get to know the Data set
df_num=train_data[['Age','SibSp','Parch','Fare']]
df_num
df_cat=train_data[['Survived','Pclass','Sex','Ticket','Cabin','Em
barked']]
df_cat
Histograms and Correlations
■ It is always a good idea to visualise
import matplotlib.pyplot as plt
for i in df_num.columns:
plt.hist(df_num[i])
plt.title(i)
plt.show()

print (df_num.corr())

sns.heatmap(df_num.corr(),annot=True)
Pivot the data
■ The Pivot tables are very good at providing a deeper insight

print(pd.pivot_table(train_data,index='Survived',columns='Pclass',values='Ticket', aggfunc='count'))
print('----------------------------------------------------------')
print(pd.pivot_table(train_data,index='Survived',columns='Sex',values='Ticket', aggfunc='count'))
print('----------------------------------------------------------')
print(pd.pivot_table(train_data,index='Survived',columns='Embarked',values='Ticket', aggfunc='count'))
Histogram of Categorical Variable
■ In the Seaborn library, we can create a count plot
to visualize the distribution of the
‘Survived’ variable.
■ Essentially, a count plot can be thought of as
a histogram across a categorical variable.
■ To do this, run the following code:

import seaborn as sns

sns.countplot(x='Survived',data=df)
Getting the exact values
■ By looking at the results, we can tell that a
majority of the passengers didn’t survive the
Titanic collision.
■ To get the exact breakdown of passengers who
survived and those who didn’t, we can use an in-
built function of the pandas library called
‘value_counts()’: df['Survived'].value_counts()
Analyze the Relationship Between Variables
■ In this case, we will run an analysis to try and answer the
following questions about Titanic survivors:
– Did a passenger’s age have any impact on what class they
traveled in?
– Did the class that these passengers traveled in have any
correlation with their ticket fares?
– Were passengers who paid higher ticket fares located in
different cabins as compared to passengers who paid lower
fares?
– Did ticket fare have any impact on a passenger’s survival?
■ Using the questions above as a rough guideline, let’s begin the
analysis.
Age vs Class
■ First, let’s create a boxplot to visualize the
relationship between a passenger’s age and the
class they were traveling in:
sns.boxplot(data=df,x='Pclass', y='Age')

Taking a look at the boxplot above, notice that passengers

traveling first class were older than passengers in the second
and third classes. The median age of first-class passengers is
around 35, while it is around 30 for second-class passengers,
and 25 for third-class passengers.
This makes sense since older individuals are likely to have
accumulated a larger amount of wealth and can afford to travel
first class. Of course, there are exceptions, which is why you
can observe passengers above 70 in the second and third
classes – our outliers.
Price vs Life!
■ Moving on, let’s look into the relationship between a
passenger’s ticket fare and survival:

sns.barplot(data=df,x='Survived',y='Fare')

As expected, passengers with higher ticket fares had a higher

chance of survival:
This is because they could afford cabins closer to lifeboats,
which meant they could make it out on time.
By extension, this should also mean that the first-class
passengers had a higher likelihood of survival. Let’s confirm
this:

sns.barplot(data=df,x='Pclass',y='Survived')
Answering the questions via plots
■ Did a passenger’s age have any impact on what class they
traveled in? Yes, older passengers were more likely to travel
first class.
■ Were passengers who paid higher ticket fares in different
cabins as opposed to passengers who paid lower fares? Yes,
passengers who paid higher ticket fares seemed to mostly
travel in cabin B. However, the relationship between ticket fare
and cabin isn’t too clear because there were many missing
values in the ‘Cabin’ This might have compromised the quality
of the analysis.
■ Did ticket fare have any impact on a passenger’s survival? Yes,
first-class passengers were more likely to survive the collision.
The challenge

■ Use the Titanic passenger data (name, age, price

of ticket, etc) to try to predict who will survive and
who will die.
■ Remember the goal: we want to find patterns
in train.csv that help us predict whether the
passengers in test.csv survived.
Exploratory Analysis - a possible pattern
(hypothesis)
train_data = pd.read_csv("train.csv")
train_data
women = train_data.loc[train_data.Sex == 'female']["Survived"]
rate_women = sum(women)/len(women)*100
print("% of women who survived:", rate_women)
men = train_data.loc[train_data.Sex == 'male']["Survived"]
rate_men = sum(men)/len(men)*100
print("% of men who survived:", rate_men)
Gender as a possible predictor of Survival
(b.t.w don’t forget to cast your vote!!)
■ From this you can see that almost 75% of the women on board
survived, whereas only 19% of the men lived to tell about it. Since
gender seems to be such a strong indicator of survival, it is a
pretty good guess of a predictor of survival rates
■ Using this kind of logic, ML uses a model – in this case, it is
known as the random forest model.
■ This model is constructed of several "trees" that will individually
consider each passenger's data and vote on whether the
individual survived. Then, the random forest model makes a
democratic decision: the outcome with the most votes wins
A 1st Encounter with a machine learning
(ML) model
■ We'll build what's known as a random forest
model.
Using 4 indicators (for now)
■ The code cell below looks for patterns in four
different columns ("Pclass", "Sex", "SibSp",
and "Parch") of the data.
■ It constructs the trees in the random forest
model based on patterns in the train.csv file,
before generating predictions for the passengers
in test.csv.
■ The code also saves these new predictions in a
CSV file submission.csv.
The ML Prediction Model
from sklearn.ensemble import RandomForestClassifier
y = train_data["Survived"]
features = ["Pclass", "Sex", "SibSp", "Parch"]
X = pd.get_dummies(train_data[features])
X_test = pd.get_dummies(test_data[features])
model = RandomForestClassifier(n_estimators=100, max_depth=5, random_state=1)
model.fit(X, y)
predictions = model.predict(X_test)
output = pd.DataFrame({'Passenger Name': test_data.Name, 'PassengerId':
test_data.PassengerId, 'Survived': predictions})
output.to_csv('submission.csv', index=False)
print("Your submission was successfully saved!")
df=pd.read_csv('submission.csv')
df

Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
100% (1)
Data Analytics - 4 Manuscripts - Data Science For Beginners, Data Analysis With Python, SQL Computer Programming For Beginners, Statistics For Beginners
481 pages
Module 1 - Introduction To Data Analytics
No ratings yet
Module 1 - Introduction To Data Analytics
21 pages
Data Analysis
No ratings yet
Data Analysis
34 pages
Module 3
No ratings yet
Module 3
137 pages
Additional Notes BADS
No ratings yet
Additional Notes BADS
9 pages
Thesis
No ratings yet
Thesis
160 pages
Presentation1 Revised [Autosaved]
No ratings yet
Presentation1 Revised [Autosaved]
83 pages
unit 2
No ratings yet
unit 2
40 pages
Chapter2 BI
No ratings yet
Chapter2 BI
77 pages
19. 100 Days of Machine Learning
No ratings yet
19. 100 Days of Machine Learning
45 pages
Google Certificate (Notes)
No ratings yet
Google Certificate (Notes)
10 pages
Final Doc of Two Stage Job Title Identification System for Online Job Advertisements-1
No ratings yet
Final Doc of Two Stage Job Title Identification System for Online Job Advertisements-1
59 pages
Business Analytics Process and Data Exploration
No ratings yet
Business Analytics Process and Data Exploration
38 pages
nasscom 1
No ratings yet
nasscom 1
211 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
13 pages
Unit - 1, Notes
No ratings yet
Unit - 1, Notes
38 pages
190602_24-DeepLearning-fa-NNDL-Tutorial-1-
No ratings yet
190602_24-DeepLearning-fa-NNDL-Tutorial-1-
32 pages
MC4301 - ML Unit 1 (Introduction)
No ratings yet
MC4301 - ML Unit 1 (Introduction)
47 pages
Data Processing and Analysis
100% (3)
Data Processing and Analysis
38 pages
Master Project
No ratings yet
Master Project
54 pages
Data Analytics Lecture Notes
100% (1)
Data Analytics Lecture Notes
10 pages
Unit-3
No ratings yet
Unit-3
20 pages
BUSINESS ANALYTICS UNIT I
No ratings yet
BUSINESS ANALYTICS UNIT I
45 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano,... (The LazyProgrammer)
No ratings yet
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano,... (The LazyProgrammer)
169 pages
2 Data Analytics
No ratings yet
2 Data Analytics
49 pages
Data Analytics Unit1-4
No ratings yet
Data Analytics Unit1-4
195 pages
DA-Unit-2-Trio-1
No ratings yet
DA-Unit-2-Trio-1
26 pages
Session1-DataCharacteristics
No ratings yet
Session1-DataCharacteristics
41 pages
Module_1B
No ratings yet
Module_1B
65 pages
W1.1_CBU5201
No ratings yet
W1.1_CBU5201
36 pages
Exkmc: Expanding Explainable K-Means Clustering
No ratings yet
Exkmc: Expanding Explainable K-Means Clustering
27 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
unit 2
No ratings yet
unit 2
81 pages
unit-1-221226040256-44f48981
No ratings yet
unit-1-221226040256-44f48981
32 pages
UNIT - 2 Data Analysis
No ratings yet
UNIT - 2 Data Analysis
19 pages
Data-Analysis-Chapter 1-compressed
No ratings yet
Data-Analysis-Chapter 1-compressed
20 pages
Introduction To Data Mining For Business Analytics
No ratings yet
Introduction To Data Mining For Business Analytics
51 pages
Data Analytics
No ratings yet
Data Analytics
32 pages
Top 65 SQL Data Analysis Q&A
No ratings yet
Top 65 SQL Data Analysis Q&A
53 pages
UNIT 3 NIVELACIÓN DE INGLÉS
No ratings yet
UNIT 3 NIVELACIÓN DE INGLÉS
34 pages
Unit I (Notes 2)
No ratings yet
Unit I (Notes 2)
16 pages
DATA ANALYTICS 1
No ratings yet
DATA ANALYTICS 1
13 pages
Data Analysis
No ratings yet
Data Analysis
50 pages
Data Analytics Unit1
No ratings yet
Data Analytics Unit1
24 pages
u1 c clsrm
No ratings yet
u1 c clsrm
30 pages
Data Analytics - Review 1
No ratings yet
Data Analytics - Review 1
7 pages
Week-1-Lecture
No ratings yet
Week-1-Lecture
26 pages
EXA Data Roadmap_ based on MIT Applied Data Science Program
No ratings yet
EXA Data Roadmap_ based on MIT Applied Data Science Program
14 pages
DS Mod 1 To 2 Complete Notes
No ratings yet
DS Mod 1 To 2 Complete Notes
63 pages
AI Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems
No ratings yet
AI Based Modeling: Techniques, Applications and Research Issues Towards Automation, Intelligent and Smart Systems
20 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
68 pages
MLT, Two Marks
No ratings yet
MLT, Two Marks
19 pages
Automatic Genre Classification of Music Content: (A Survey)
No ratings yet
Automatic Genre Classification of Music Content: (A Survey)
28 pages
UnderstandingDeepLearning 03-26-25 C 15 28
No ratings yet
UnderstandingDeepLearning 03-26-25 C 15 28
14 pages
data-science-unit-1
No ratings yet
data-science-unit-1
12 pages
Review of PHM Data Competitions From 2008 To 2017 Methodologies and Analytics
No ratings yet
Review of PHM Data Competitions From 2008 To 2017 Methodologies and Analytics
10 pages
DATA ANALYTICS (1)
No ratings yet
DATA ANALYTICS (1)
7 pages
1 Introduction to Data Analytics
No ratings yet
1 Introduction to Data Analytics
14 pages
Unit 05: Data Preparation & Analysis
100% (1)
Unit 05: Data Preparation & Analysis
26 pages
Data analytics_1
No ratings yet
Data analytics_1
21 pages
Machine Learning Models
100% (1)
Machine Learning Models
2 pages
ReID About Market
No ratings yet
ReID About Market
10 pages
PrE7 Chapter 8 Data Analytics
No ratings yet
PrE7 Chapter 8 Data Analytics
20 pages
HubSpots Guide To Data Analytics
No ratings yet
HubSpots Guide To Data Analytics
50 pages
Data Analytics
No ratings yet
Data Analytics
12 pages
2.Data analysis Vs analytics
No ratings yet
2.Data analysis Vs analytics
6 pages
Data Science & Machine Learning Applications in Oil & Gas
No ratings yet
Data Science & Machine Learning Applications in Oil & Gas
4 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Fuzzy Logic Control
No ratings yet
Fuzzy Logic Control
9 pages
Predictive Modeling Applications in Actuarial Science Volume 2 Case Studies in Insurance 1st Edition Edward W. Frees - Get instant access to the full ebook content
100% (3)
Predictive Modeling Applications in Actuarial Science Volume 2 Case Studies in Insurance 1st Edition Edward W. Frees - Get instant access to the full ebook content
42 pages
Lesson 1 Notes
No ratings yet
Lesson 1 Notes
14 pages
_unit2 DATA SCIENCE
No ratings yet
_unit2 DATA SCIENCE
8 pages
Cost Reduction Faster, Better Decision Making New Products and Services
No ratings yet
Cost Reduction Faster, Better Decision Making New Products and Services
50 pages
Data analysis course
No ratings yet
Data analysis course
11 pages
Article 4
No ratings yet
Article 4
7 pages
Unit 1-2
No ratings yet
Unit 1-2
8 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Data Analytics - TYBCS
No ratings yet
Data Analytics - TYBCS
6 pages
Data Analytics 1
No ratings yet
Data Analytics 1
4 pages
mldap
No ratings yet
mldap
6 pages
Unit 1
No ratings yet
Unit 1
50 pages
Difference Between Supervised and Unsupervised Learning
No ratings yet
Difference Between Supervised and Unsupervised Learning
3 pages
Google Coursera Data Analytics
No ratings yet
Google Coursera Data Analytics
37 pages
What Is Data Analysis
No ratings yet
What Is Data Analysis
5 pages
Flair Data Analytics Tutorial
No ratings yet
Flair Data Analytics Tutorial
9 pages
Data Analysis
No ratings yet
Data Analysis
5 pages
Unit-3 DS
No ratings yet
Unit-3 DS
21 pages
ML Answers
No ratings yet
ML Answers
2 pages
Introduction to Robotics
From Everand
Introduction to Robotics
Swarnalata Verma
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Analytics Part 3 (1)

Uploaded by

Data Analytics Part 3 (1)

Uploaded by

THE MACHINE

■ The ML lifecycle model…

■ There are two types of problems that dominate

■ from scipy.stats import pearsonr

■ There is a problem with missing Data!

import seaborn as sns

Taking a look at the boxplot above, notice that passengers

As expected, passengers with higher ticket fares had a higher

■ Use the Titanic passenger data (name, age, price

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.