0% found this document useful (0 votes)

18 views97 pages

Unit 3

Uploaded by

kharshavardhan710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views97 pages

Unit 3

Uploaded by

kharshavardhan710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 97

Unit 3

Data science components

Tools for data
science

2
3
Explanation
1. SAS It is one of those data science tools which are specifically
designed for statistical operations. SAS is a closed source proprietary
software that is used by large organizations to analyze data.

2. Apache Spark
Apache Spark or simply Spark is an all-powerful analytics engine and
it is the most used Data Science tool. Spark is specifically designed to
handle batch processing and Stream Processing.

3. BigML
It provides a fully interactable, cloud-based GUI environment that you
can use for processing Machine Learning Algorithms.

4. D3.js
Javascript is mainly used as a client-side scripting language. D3.js, a
Javascript library allows you to make interactive visualizations on your
web-browser.
Explanation
5. MATLAB
MATLAB is a multi-paradigm numerical computing environment for
processing mathematical information.
It is a closed-source software that facilitates matrix functions,
algorithmic implementation and statistical modeling of data. MATLAB
is most widely used in several scientific disciplines.

In Data Science, MATLAB is used for simulating neural networks and

fuzzy logic. Using the MATLAB graphics library, you can create
powerful visualizations.

6. Excel Probably the most widely used Data Analysis tool. Microsoft
developed Excel mostly for spreadsheet calculations and today, it is
widely used for data processing, visualization, and complex
calculations.
7. ggplot2 ggplot2 is an advanced data visualization package for the
R programming language. The developers created this tool to replace
the native graphics package of R and it uses powerful commands to
create illustrious visualizations. It is the most widely used library that
Data Scientists use for creating visualizations from analyzed data.
Explanation
8. Tableau
Tableau is a Data Visualization software that is packed with
powerful graphics to make interactive visualizations. It is focused
on industries working in the field of business intelligence. The
most
important aspect of Tableau is its ability to interface with
databases, spreadsheets, OLAP (Online Analytical Processing)
cubes, etc. Along with these features, Tableau has the ability to
visualize geographical data and for plotting longitudes and
latitudes in maps.

9. Jupyter Project Jupyter is an open-source tool based on

IPython for helping developers in making open-source software
and experiences interactive computing. Jupyter supports multiple
languages like Julia, Python, and R. It is a web-application tool
used for writing live code, visualizations, and presentations.
Jupyter is a widely popular tool that is designed to address the
requirements of Data Science.
Explanation
10. Matplotlib Matplotlib is a plotting and visualization library
developed for Python. It is the most popular tool for generating
graphs with the analyzed data. It is mainly used for plotting
complex graphs using simple lines of code. Using this, one can
generate bar plots, histograms, scatterplots etc. Matplotlib has
several essential modules. One of the most widely used modules is
pyplot. It offers a MATLAB like an interface. Pyplot is also an open-
source alternative to MATLAB‘s graphic modules.

13. TensorFlow TensorFlow has become a standard tool for

Machine Learning. It is widely used for advanced machine learning
algorithms like Deep Learning. Developers named TensorFlow after
Tensors which are multidimensional arrays. It is an open-source
and ever-evolving toolkit which is known for its performance and
high computational abilities.
8
Artificial intelligence
(AI)
• Artificial intelligence (AI) is intelligence
demonstrated by machines, unlike the natural
intelligence displayed by humans and animals, which
involves consciousness and emotionality
• Artificial intelligence (AI), the ability of a digital
computer or computer-controlled robot to perform
tasks commonly associated with intelligent beings.
• Artificial intelligence (AI) refers to the simulation of
human intelligence in machines that are
programmed to think like humans and mimic their
actions. 9
Machine
Learning
“Machine learning enables a
machineto automatically learn
from data, improve performance
from experiences, and predict
things without being explicitly
programmed.”

10
Machine
Learning

11
Traditional Programs VS
ML

12
Key differences between AI and
ML

13
Key differences between AI and
ML

14
15
Approach

3/24/2021 16
Applications

3/24/2021 17
Types of machine learning
(ML)

18
Supervised
learning
• Supervised learning as the name indicates the
presence of a supervisor as a teacher.
• Basically supervised learning is a learning in
which we teach or train the machine using data
which is well labeled that means some data is
already tagged with the correct answer.
• After that, the machine is provided with a new
set of examples(data) so that supervised learning
algorithm analyses the training data(set of training
examples) and produces a correct outcome from
labeled data.
19
Unsupervised
learning
• Unsupervised learning is the training of machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information
without guidance.
• Here the task of machine is to group unsorted
information according to similarities, patterns and
differences without any prior training of data.
• Unlike supervised learning, no teacher is provided that
means no training will be given to the machine.
• Therefore machine is restricted to find the hidden
structure in unlabeled data by our-self.
20
Semi-supervised learning
& Reinforcement learning
• Semi-supervised Learning is between the supervised
and unsupervised learning.
• It uses both labelled and unlabelled data for training.
• Reinforcement learning trains an algorithm with a
reward system, providing feedback when an artificial
intelligence agent performs the best action in a
particular situation.
• In Reinforcement learning , AI agents are attempting to
find the optimal way to accomplish a particular goal, or
improve performance on a specific task.
• As the agent takes action that goes toward the goal, it
receives a reward. 21
Examples /
Applications

22
Regressio
n
• Regression analysis is a statistical method to model
the relationship between a dependent (target) and
independent (predictor) variables with one or
more independent variables.
• Regression is a process of finding the correlations
between dependent and independent variables.
• It helps in predicting the continuous variables such
as prediction of Market Trends, prediction of
House prices, etc

23
ML Regression
Algorithms
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector Regression
• Decision Tree Regression
• Random Forest Regression

24
Classificatio
n
• Classification algorithm is a Supervised Learning
technique that is used to identify the category of
new observations on the basis of training data.
• In Classification, a program learns from the
given dataset or observations and then classifies
new observation into a number of classes or
groups.
• Such as, Yes or No, 0 or 1, Spam or Not Spam,
cat or dog, etc.
25
ML Classification
Algorithms
• Logistic Regression
• K-Nearest Neighbours
• Support Vector Machines
• Kernel SVM
• Naïve Bayes
• Decision Tree Classification
• Random Forest Classification

26
Difference between
Regression and Classification

27
Clusterin
g
• Grouping the similar data is called cluster
• Clustering or cluster analysis is a machine
learning technique, which groups the unlabelled
dataset.

28
Clustering
Algorithms
• K-Means algorithm
• Agglomerative Hierarchical algorithm
• Mean-shift algorithm
• DBSCAN Algorithm (Density-Based Spatial
Clustering of Applications with Noise)
• Expectation-Maximization (EM) Clustering
using GMM (Gaussian Mixture Model)

29
Feature
selection
• In machine learning and statistics, feature
selection, also known as variable selection,
attribute selection or variable subset selection
• It is the process of selecting a subset of relevant
features (variables, predictors) for use in model
construction.
• When the number of features are very large. No-
need not use every feature at your disposal for
creating an algorithm.
• You can assist your algorithm by feeding in only
those features that are really important.
30
Feature
selection
• Machine learning works on a simple rule – if you put
garbage in, you will only get garbage to come out.
(garbage -noise) - “Sometimes, less is better!”
Top reasons to use feature selection are:
• It enables the machine learning algorithm to train faster.
• It reduces the complexity of a model and makes it easier
to interpret.
• It improves the accuracy of a model if the right subset is
chosen.
• It reduces overfitting. 31
ML Feature selection
Algorithms
Filter Methods:In this method, the dataset is filtered,
and a subset that contains only the relevant
features is taken.
• Pearson’s Correlation
• Linear Discriminant Analysis (LDA)
• ANOVA (Analysis of variance)
• Chi-Square

32
Wrapper Methods
• The wrapper method has the same goal as
the ﬁlter method, but it takes a machine
learning model for its evaluation. In this
method, some features are fed to the ML
model, and evaluate the performance. The
performance decides whether to add those
features or remove to increase the
accuracy of the model. This method is more
accurate than the ﬁltering method but
complex to work.

• Forward Selection
ML Feature selection
Algorithms
Embedded Methods
Embedded methods check the different training
iterations of the machine learning model and
evaluate the importance of each feature.
• Decision Tree
• ID3
• C4.5
• Classification And Regression Tree (CART)

34
Linear
regression
• Linear regression is one of the easiest and most
popular Machine Learning algorithms.
• It is a statistical method that is used for predictive
analysis.
• Linear regression makes predictions for
continuous/real or numeric variables such as sales,
salary, age, product price, etc.
• Linear regression algorithm shows a linear
relationship between a dependent (y) and one or
more independent (x) variables 35
Linear
regression

36
Linear
regression
y= mx+c+ ε
• y= Dependent Variable (Target Variable)
• x= Independent Variable (predictor Variable)
• c= y intercept of the line
• m= slope
• ε= error

37
Logistic
regression
• Logistic regression is one of the most popular
Machine Learning algorithms, which comes under
the Supervised Learning technique.
• It is used for predicting the categorical dependent
variable using a given set of independent variables.
• The outcome must be a categorical or discrete
value. It can be either Yes or No, 0 or 1, true or
False, etc.
• but instead of giving the exact value as 0 and 1,
it gives the probabilistic values which lie between
0 and 1
38
Logistic
regression
• Logistic regression is used for solving the classification
problems.
• In Logistic regression, instead of fitting a regression line,
we fit an "S" shaped logistic function, which predicts two
maximum values (0 or 1).
• The curve from thelogistic function indicates
likelihood of something such as whether the
the
cancerousor
cells are not, a dog is puppy or not
based on its weight, etc.
• Logistic regression is based on the concept of Maximum
Likelihood estimation.
• According to this estimation, the observed data should
be most probable.

39
Logistic
regression
• it has the ability to provide probabilities and
classify new data using continuous and discrete
datasets

40
Logistic
regression

41
Logistic Function (Sigmoid
•Function)
The sigmoid function is a mathematical function used to
map the predicted values to probabilities.
• It maps any real value into another value within a range
of 0 and 1.
• The value of the logistic regression must be between 0
and 1, which cannot go beyond this limit, so it forms a
curve like the "S" form.
• The S-form curve is called the Sigmoid function or the
logistic function.
• In logistic regression, we use theconcept of
threshold
the value, which defines the probability of either
0 or 1.
• Such as values above the threshold value tends to 1,
and a value below the threshold values tends to 0.
42
Linear Vs Logistic
Regression

43
Introducing the Gaussian
Carl Friedrich GAUSS ranked among “history's
most inﬂuential mathematicians” discovered
normal distribution
It is also called Gaussian distribution.

It is often called the bell curve, because the graph

of its probability looks like a bell.

Normal distribution that occurs naturally in many

situations.

Examples: Heights of people, Measurement errors,

Blood pressure, Test marks, IQ scores, Salaries
Gaussian Distribution

68% of the data falls within one standard

deviation of the mean
95% of the data falls within two standard
deviations of the mean
99.7% of the data falls within three standard
deviations of the mean
Properties of normal
distribution
-The mean, mode and median are all
equal

-The curve is symmetric at the center

-Exactly half of the values are to the left

of center and exactly half the values are
to the right

-The total area under the curve is 1

Standard Deviation
▪ Standard Deviation is a measure of the amount of
variation
▪ A low standard deviation indicates that the values
tend to be close to the mean
▪ A high standard deviation indicates that the values
are spread out over a wider range
Example 1
Mean
Variance
Standard Deviation
Example 2
Solution
Mean=27
Variance=24.86
SD=4.96
Introduction to
Standardization
Standardization is scaling technique where the
values are centered around the mean with a unit
standard deviation.

It means that the mean of the attribute becomes 0

and the resultant distribution has a unit (1)
standard deviation

Standard scores are most commonly called z-

score
Standard Normal Probability Distribution
in Excel

The STANDARDIZE Function is available under

Excel Statistical functions.
It will return a normalized value (z-score) based on
the mean and standard deviation.

=NORMDIST(x,mean,standard_dev,cumulative)
1. X (required argument) – This is the value for which we wish to calculate the
distribution.

2. Mean (required argument) – The arithmetic mean of the distribution.

3. Standard_dev (required argument) – This is the standard deviation of the
distribution.
4. Cumulative (required argument) – This is a logical value. It specifies the type of
distribution to be used: TRUE (Cumulative Normal Distribution Function) or FALSE
(Normal
Probability Density Function).
Example
If we wish to calculate the probability
mass function for the data above, the
formula to use is:
We get
Conti….
STANDARDIZE Z-Score Function

The STANDARDIZE Function is available under Excel Statistical

functions. It will return a normalized value (z-score) based on the
mean and standard deviation.

=STANDARDIZE(x, mean, standard_dev)

The STANDARDIZE function uses the following arguments:

1. X (required argument) – This is the value that we want to
normalize.
2. Mean (required argument) – The arithmetic mean of the
distribution.
3. Standard_dev (required argument) – This is the standard
deviation of the distribution.
Example
Conti,,,,
Using z-Scores to ﬁnd a
Probability
Example:
The mean score for the population is
21, and the standard deviation is 5.
How will you determine the
probability that a score fall on
-higher than 30
-between the range of 23 and 27
-between 15 and 20.
- less than 20.
62
higher than 30

63
between the range of 23 and
27

64
between 15
and 20

65
less than 20

66
Central Limit Theorem

The Central Limit Theorem states that the sampling

distribution of the sample means approaches a normal
distribution as the sample size gets larger
Central
Limit
“Given a dataset with unknown distribution (it could
Theorem
be uniform, binomial or completely random), the
sample means will approximate the normal
distribution”

The Central limit theorem shows how the mean of a sample

distribution approaches the normal distribution when the
size of the sample gets larger.
Algebra with Gaussians
Gauss elimination method is used to solve a system of linear
equations
Gaussian elimination is the name of the method to
perform the three types of matrix row operations
Interchanging two rows
Multiplying a row by a constant (any
constant - not 0)
Adding a row to another row

This technique is also called row reduction and it

consists of two stages:
Forward elimination
Algebra with
Gaussians
The forward elimination step refers to the
row reduction needed to simplify the
matrix
Back substitution step refers to substitute
the value to solve the equation
Example
If we were to have the following system of linear equations
containing three equations for three unknowns:
Algebra with Gaussians
Algebra with Gaussians

Row reducing (applying the Gaussian elimination

method to) the augmented matrix
73
Markowitz Portfolio
Optimization

75
Terminologies
▪ Portfolio --- a collection of investments.
▪ Expected risk --- the total amount of money that
can be Lost.

▪ Expected return --- future income from invested

capital
▪ Portfolio effect --- portfolio that will reduce total
risk of Investment
▪ Portfolio manager --- project manager

▪ Eﬃcient portfolio --- provides the lowest

76
Markowitz Portfolio
Optimization - Approach
▪ According to theory, the effects of one
security purchase over the effects of
the other security purchase are taken
into consideration

▪ The results are evaluated and helpful

to reduce the risk minimization

77
Example
Security Expected Return R i % Proportion Xi %

1 10 25
2 20 75
3 30 80

The return on the portfolio on combining

the two securities will be
Rp = R1X1 + R2X2
Rp = 0.10(0.25)+ 0.20(0.75)
Rp = 17.5%
78
Advantages
▪ It is believed that holding multiple securities is
less risky than having only one investment in a
person’s portfolio

▪ When multiple stocks are taken on a portfolio

and if they have negative correlation, then risk
can be completely reduced because the gain on
one can offset the loss on the other

▪ The effect of multiple securities can also be

studied when one security is more risky when
compared to the other security
79
Standardizing x and y
Coordinates for Linear
Regression
• To standardize the set of coordinates that
are more deviated from the normal range of
values
• Standardization results in mean of the
entire coordinates becomes Zero and unit
standard deviation

• Mean = 0
• SD = 1

80
Example 1
Co-efficient corelation formula
Formula
Coefficient Co-relation
Regression equation
Regression equation
Example 2
Example

88
Standardization Simpliﬁes Linear
Regression
In order to simplify the standardization of
equation of line, Residual Sum of Square (SSR)
should be minimized

89
Residual Sum of Square

SSR - Residual Sum of Square

SSR = ∑(yi−y^)2
y^ = mx+c
where,
m= slope and c = intercept

90
Modeling Error in Linear
Regression
▪ The coeﬃcient of determination, or R2 is a measure
that provides information about the goodness of ﬁt
of a model

▪ In the context of regression, it is a statistical

measure of how well the regression line
approximates the actual data

▪ It is important when a statistical model is used

either to predict future outcomes or in the testing of
hypothesis
91
R2 Measure (co-eﬃcient of
determination)

92
Example

93
Information Gain from Linear
Regression
▪ Information gain is calculated by comparing the
entropy of the dataset before and after a
transformation

▪ Entropy of a random variable Y can be represented

as H(Y), which tells about the uncertainty about the
random variable

▪ Information gain provides a way to use entropy to

calculate how a change to the dataset impacts the
purity of the dataset

94
Information Gain from
Linear Regression
▪ For example, we may wish to evaluate the impact on
purity by splitting a dataset S by a random variable with
a range of values, then

▪ IG(Y, X) = H(Y) – H(Y | X)

▪ IG(Y, X) is the information for the dataset Y for the

variable X

▪ H(Y) is the entropy for the dataset before any change

and

▪ H(Y | X) is the conditional entropy for the dataset given

95
the variable X
96
Thank You

Jismo Math P5
100% (1)
Jismo Math P5
33 pages
Data Science
No ratings yet
Data Science
132 pages
Big Data Lecture # 08
No ratings yet
Big Data Lecture # 08
21 pages
Report Print
No ratings yet
Report Print
22 pages
Machine Learning Most Important Question For Mid Term Ipu University
No ratings yet
Machine Learning Most Important Question For Mid Term Ipu University
36 pages
ML Notes
No ratings yet
ML Notes
52 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Library
No ratings yet
Library
23 pages
Week 12 Intro To DS and ML
No ratings yet
Week 12 Intro To DS and ML
67 pages
ML - Part - A
No ratings yet
ML - Part - A
10 pages
Previous Lecture
No ratings yet
Previous Lecture
43 pages
Machine Learning: Upendra Verma
No ratings yet
Machine Learning: Upendra Verma
34 pages
LM #01-Introduction To ML
No ratings yet
LM #01-Introduction To ML
33 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Lecture 1
No ratings yet
Lecture 1
65 pages
Book of 843 - AI - Student - HandbookXI-104-127
No ratings yet
Book of 843 - AI - Student - HandbookXI-104-127
24 pages
Department of Electronics and Communication: Industrial Training Presentation
No ratings yet
Department of Electronics and Communication: Industrial Training Presentation
22 pages
Unit 2
No ratings yet
Unit 2
48 pages
1MS17EC021 Intership Report
No ratings yet
1MS17EC021 Intership Report
16 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
MLUnit - 1 Share
No ratings yet
MLUnit - 1 Share
162 pages
Lesson 4 - Introduction Machine Learning
No ratings yet
Lesson 4 - Introduction Machine Learning
44 pages
Lecture Notes 1 2 Intro Python
No ratings yet
Lecture Notes 1 2 Intro Python
13 pages
Machine Learning - Unit - 1
100% (1)
Machine Learning - Unit - 1
58 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
19 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Intro ML 1 Day
No ratings yet
Intro ML 1 Day
43 pages
Core Concepts of AI
No ratings yet
Core Concepts of AI
46 pages
R LabManual 6-8 Pgms
No ratings yet
R LabManual 6-8 Pgms
12 pages
Chapter 01 Machine Learning
No ratings yet
Chapter 01 Machine Learning
22 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
90 pages
Ethics, Uses and Abuses of ML
No ratings yet
Ethics, Uses and Abuses of ML
11 pages
PavicJakov WEKA
No ratings yet
PavicJakov WEKA
40 pages
Disruptive Technologies AI Lecture 2
No ratings yet
Disruptive Technologies AI Lecture 2
12 pages
Supervised Learning Final With Diagrams Cleaned
No ratings yet
Supervised Learning Final With Diagrams Cleaned
7 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Machine Learning
No ratings yet
Machine Learning
31 pages
L2 - Machine Learning Process
No ratings yet
L2 - Machine Learning Process
17 pages
Lecture 6 - Spark ML
No ratings yet
Lecture 6 - Spark ML
31 pages
Introduction To Emerging Technologies
No ratings yet
Introduction To Emerging Technologies
43 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
ML Unit1 (HKB)
No ratings yet
ML Unit1 (HKB)
7 pages
Aiml Notes
No ratings yet
Aiml Notes
12 pages
Machine Learning
No ratings yet
Machine Learning
42 pages
Machine Learning - Introduction
No ratings yet
Machine Learning - Introduction
36 pages
Machine Learning
No ratings yet
Machine Learning
54 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
6 pages
Machine Learning Notes From AWS
No ratings yet
Machine Learning Notes From AWS
5 pages
CLASS NOTES Unit 1 ML Material
No ratings yet
CLASS NOTES Unit 1 ML Material
42 pages
6th Sem Cse Data Science Analytics SM o
No ratings yet
6th Sem Cse Data Science Analytics SM o
40 pages
4.introductin To Machine Learning
No ratings yet
4.introductin To Machine Learning
28 pages
Unit 3
No ratings yet
Unit 3
33 pages
AI Bootcamp Sarris2024
No ratings yet
AI Bootcamp Sarris2024
64 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
BDA Unit 5
No ratings yet
BDA Unit 5
9 pages
Presenttion 33
No ratings yet
Presenttion 33
2 pages
Ad8552 ML Unit Ii
No ratings yet
Ad8552 ML Unit Ii
94 pages
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
No ratings yet
ML 7th Sem AIML ITE Notes Complete LONG (1) - 10-33
24 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Artificial Intelligence Algorithms
From Everand
Artificial Intelligence Algorithms
akosnemeth
No ratings yet
5NF and Other Normal Forms
No ratings yet
5NF and Other Normal Forms
22 pages
Expt 4 Conclusion and Applications
0% (2)
Expt 4 Conclusion and Applications
2 pages
DCF Techniques
No ratings yet
DCF Techniques
25 pages
Phase Plane Analysis - 3
No ratings yet
Phase Plane Analysis - 3
21 pages
University of Cambridge International Examinations General Certificate of Education Advanced Level
No ratings yet
University of Cambridge International Examinations General Certificate of Education Advanced Level
4 pages
Balance This Case Histories From Difficult Balance Jobs 1714274382
No ratings yet
Balance This Case Histories From Difficult Balance Jobs 1714274382
20 pages
Engineering Statistics
No ratings yet
Engineering Statistics
7 pages
Linear Algebra Toronto LectureNotes223
No ratings yet
Linear Algebra Toronto LectureNotes223
96 pages
Cooperative Object Transportation With Differential-Drive Mobile Robots Control and Exprimentation
No ratings yet
Cooperative Object Transportation With Differential-Drive Mobile Robots Control and Exprimentation
10 pages
Ca Foundation Maths Test-3 Set A
No ratings yet
Ca Foundation Maths Test-3 Set A
14 pages
Solid Geometry
No ratings yet
Solid Geometry
8 pages
The Classification of Stocks With Basic Financial Indicators An Application of Cluster Analysis On The BIST 100 Index
No ratings yet
The Classification of Stocks With Basic Financial Indicators An Application of Cluster Analysis On The BIST 100 Index
29 pages
Reduction Thesis Peirce
100% (3)
Reduction Thesis Peirce
7 pages
PDF Machine Learning
100% (1)
PDF Machine Learning
222 pages
Setting Out Notes
No ratings yet
Setting Out Notes
3 pages
THIRD-QUARTER-EXAM-IN-MATH-6-SY-2024-2025 - Edited
No ratings yet
THIRD-QUARTER-EXAM-IN-MATH-6-SY-2024-2025 - Edited
6 pages
Ed Ef: Design of Base Plate & Anchor Bolt: BP1, BP2, BP3, BP4, BP5, BP6, BP7, BP8, BP9 B
No ratings yet
Ed Ef: Design of Base Plate & Anchor Bolt: BP1, BP2, BP3, BP4, BP5, BP6, BP7, BP8, BP9 B
9 pages
Ex: Luggage / Baggage / Breakage / Advice / Furniture / Information / Scenery / Poetry / Work / Soap / Food / Bread / Fish / Paper / Machinery Etc
No ratings yet
Ex: Luggage / Baggage / Breakage / Advice / Furniture / Information / Scenery / Poetry / Work / Soap / Food / Bread / Fish / Paper / Machinery Etc
3 pages
DMO 2024 Team and Individual Quiz Mechanics
No ratings yet
DMO 2024 Team and Individual Quiz Mechanics
6 pages
CSAT - 01 - Explanation File
No ratings yet
CSAT - 01 - Explanation File
27 pages
Project Definition Rating Index (PDRI)
No ratings yet
Project Definition Rating Index (PDRI)
12 pages
A New Approach To Current Differential Protection For Transmission Lines
No ratings yet
A New Approach To Current Differential Protection For Transmission Lines
25 pages
Electronics - Number System & Logic Gates
No ratings yet
Electronics - Number System & Logic Gates
26 pages
Bangxi Li (Auth.) - Linear Theory of Fixed Capital and China's Economy - Marx, Sraffa and Okishio-Springer Singapore (2017)
No ratings yet
Bangxi Li (Auth.) - Linear Theory of Fixed Capital and China's Economy - Marx, Sraffa and Okishio-Springer Singapore (2017)
132 pages
NTA UGC NET Electronic Science Syllabus
No ratings yet
NTA UGC NET Electronic Science Syllabus
3 pages
A Comparative Study On Text Representation Schemes in Text Categorization
No ratings yet
A Comparative Study On Text Representation Schemes in Text Categorization
11 pages
80-20 Curve (Pareto) PDF
No ratings yet
80-20 Curve (Pareto) PDF
6 pages
CBSE Sample Papers For Class 4 Maths - Mock Paper 2
No ratings yet
CBSE Sample Papers For Class 4 Maths - Mock Paper 2
6 pages
Application of Statistics To Agriculture Using Greaco-Latin Square Design
No ratings yet
Application of Statistics To Agriculture Using Greaco-Latin Square Design
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit 3

Data science components

In Data Science, MATLAB is used for simulating neural networks and

9. Jupyter Project Jupyter is an open-source tool based on

13. TensorFlow TensorFlow has become a standard tool for

It is often called the bell curve, because the graph

Normal distribution that occurs naturally in many

Examples: Heights of people, Measurement errors,

68% of the data falls within one standard

-The curve is symmetric at the center

-Exactly half of the values are to the left

-The total area under the curve is 1

It means that the mean of the attribute becomes 0

Standard scores are most commonly called z-

The STANDARDIZE Function is available under

2. Mean (required argument) – The arithmetic mean of the distribution.

The STANDARDIZE Function is available under Excel Statistical

=STANDARDIZE(x, mean, standard_dev)

The STANDARDIZE function uses the following arguments:

The Central Limit Theorem states that the sampling

The Central limit theorem shows how the mean of a sample

This technique is also called row reduction and it

Row reducing (applying the Gaussian elimination

▪ Expected return --- future income from invested

▪ Eﬃcient portfolio --- provides the lowest

▪ The results are evaluated and helpful

The return on the portfolio on combining

▪ When multiple stocks are taken on a portfolio

▪ The effect of multiple securities can also be

SSR - Residual Sum of Square

▪ In the context of regression, it is a statistical

▪ It is important when a statistical model is used

▪ Entropy of a random variable Y can be represented

▪ Information gain provides a way to use entropy to

▪ IG(Y, X) = H(Y) – H(Y | X)

▪ IG(Y, X) is the information for the dataset Y for the

▪ H(Y) is the entropy for the dataset before any change

▪ H(Y | X) is the conditional entropy for the dataset given

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.