0% found this document useful (0 votes)
2 views

Machine Learning with python(3&4)

The document discusses linear regression as a predictive analysis tool in data science, emphasizing the importance of model evaluation and out-of-sample accuracy. It also covers multiple linear regression, classification methods, and various algorithms like decision trees and k-nearest neighbors. Additionally, it highlights the challenges of overfitting and the significance of using appropriate metrics for model performance assessment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine Learning with python(3&4)

The document discusses linear regression as a predictive analysis tool in data science, emphasizing the importance of model evaluation and out-of-sample accuracy. It also covers multiple linear regression, classification methods, and various algorithms like decision trees and k-nearest neighbors. Additionally, it highlights the challenges of overfitting and the significance of using appropriate metrics for model performance assessment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

 Linear regression is an algorithm that provides a linear relationship between an

independent variable and a dependent variable to predict the outcome of future


events. It is a statistical method used in data science and machine learning for
predictive analysis.
Model evaluation aims to define how well the model performs its task. The model's
performance can vary both across use cases and within a single use case, e.g., by
defining different parameters for the algorithm or data selections. Accordingly, we
need to evaluate the model's accuracy at each training run.

Types of evaluations
 Train and test on same Dataset
 Train/Test Split
Regression metrics are quantitative measures used to evaluate the nice of a
regression model. Scikit-analyze provides several metrics, each with its strengths
and boundaries, to assess how well a model suits the statistics.
 High training accuracy isn't necessarily a good thing
 Result of over-fitting
• Over-fit: the model is overly trained to the dataset, which may capture noise
and produce a non-generalized model

 Our models must have a high, out-of-sample accuracy


 How can we improve out-of-sample accuracy?
Multiple linear regression is an extension of simple linear regression, where multiple
independent variables are used to predict the dependent variable.
Example
 Independent Variables Effectiveness on Prediction
 Does revision time, test anxiety, lecture attendance, and gender have any effect on the
exam performance of students?
 Predicting impacts of changes
 How much does blood pressure go up (or down) for every unit increase (or decrease) in
the BMI of a patient?
 • How to estimate?
Ordinary Least Squares
Linear algebra operations
Takes a long time for large datasets (10K+ rows)
 • An optimization algorithm
Gradient Descent
The proper approach if you have a very large dataset
Classification is a supervised machine learning method where the model tries to
predict the correct label of a given input data.
 A supervised learning approach
 Categorizing some unknown items into a discrete set of categories or "classes"
 The target attribute is a categorical variable
 Decision Trees (ID3, C4.5, C5.0)
 Naïve Bayes Linear Discriminant Analysis
 k-Nearest Neighbor
 Logistic Regression Neural Networks
 Support Vector Machines (SVM)
 A method for classifying cases based on their similarity to other
cases
 Cases that are near each other are said to be "neighbors“
 Based on similar cases with the same class labels are near each
other
 Pick a value for K.
 Calculate the distance of unknown cases from all cases.
 Select the K-observations in the training data that are "nearest"
to the unknown data point.
 Predict the response of the unknown data point using the most
popular response value from the K-nearest neighbors.
 A decision tree is a type of
supervised machine learning used
to categorize or make predictions
based on how a previous set of
questions were answered.
 Choose an attribute from your Drug A Drug B
dataset.
 Calculate the significance of the
attribute in splitting of data
 Split data based on the value of the
best attribute
 Go to step 1.
Entropy=? Entropy=?
“A baby learns to crawl, walk, and then
run.We are in the crawling stage when it
comes to applying machine learning”
Dave Waters

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy