0% found this document useful (0 votes)
20 views101 pages

ML Notes

Machine learning, introduced by Arthur Samuel in 1959, is a subset of artificial intelligence that enables computers to learn from data without explicit programming. It encompasses various types such as supervised, unsupervised, and reinforcement learning, with applications in fields like healthcare, finance, and autonomous vehicles. The machine learning lifecycle includes data gathering, preparation, analysis, model training, testing, and deployment.

Uploaded by

ashisharma0507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views101 pages

ML Notes

Machine learning, introduced by Arthur Samuel in 1959, is a subset of artificial intelligence that enables computers to learn from data without explicit programming. It encompasses various types such as supervised, unsupervised, and reinforcement learning, with applications in fields like healthcare, finance, and autonomous vehicles. The machine learning lifecycle includes data gathering, preparation, analysis, model training, testing, and deployment.

Uploaded by

ashisharma0507
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 101

MACHINE LEARNING

 Machine Learning was Introduced by Arthur Samuel in


1959
What is Machine Learning?
 Machine learning is a type of artificial intelligence (AI) that allows computers to
learn and improve from data without being explicitly programmed.
 Sample Historical data is known as Training Data.
 ML is a combination of computer science and statistical Data.

• How it works
Machine learning uses algorithms to analyze data, identify patterns, and make
predictions. The more data a machine learning system is exposed to, the better it
performs.

• Applications
Machine learning is used in many areas, including healthcare, entertainment,
shopping carts, and homes. For example, a financial institution can use machine
learning to classify transactions as fraudulent or genuine.
FLOW CHART:
Features of Machine Learning:
• Machine learning uses data to detect various patterns in a given dataset.
• It can learn from past data and improve automatically.
• It is a data-driven technology
• Machine learning is similar to data mining as it also
• deals with a huge amount of data.

Key Points:
• Rapid increment in the production of data
• solving complex problems, which are difficult for a human
• Decision-making in various sectors including finance
• Finding hidden patterns and extracting useful information from data.
Applications of Machine Learning:
1. Self-Driving Cars
2. Fraud Detection
3. Face Recognition
4. Stock Prediction

Classification of Machine Learning:


1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
SUPERVISED LEARNING:

• Supervised learning is a type of machine learning method in which we provide


sample labeled data to the machine learning system in order to train it, and on
that basis, it predicts the output.

• The goal of supervised learning is to map input data with the Output data.

• The supervised learning is based on supervision, and it is the same as when a


student learns things under the supervision of the teacher. An example of
supervised learning is spam filtering.

• It can be divided into two categories of algorithms:


1. Classification
2. Regression
UNSUPERVISED LEARNING:

• Unsupervised learning is a learning method in which a machine Learns


without any supervision of being labeled, classified, or categorized, and
the algorithm needs to act on that data Without any supervision.

• The goal of unsupervised learning is to restructure the input data Into new
features or a group of objects with similar patterns.

• It can be classified into two categories of algorithms:


1. Clustering
2. Association
REINFORCEMENT LEARNING:

• Reinforcement learning is a feedback-based learning method, in which a


learning agent gets a reward for each right action and gets a penalty for each
wrong action. The agent learns automatically with this feedback and improves
its performance. In reinforcement learning, the agent interacts with the
environment and explores it.

• The goal of an agent is to get the most reward points,and hence, it improves its
performance.
ex: Robotic Dog
Thank you
Speech
Image Recognition Traffic prediction
Recognition

Product Email Spam and


Self-driving cars
recommendations Malware Filtering

APPLICATIONS Virtual Personal Online Fraud Stock Market


Assistant Detection trading

Automatic
Medical Diagnosis Language
Translation
IMAGE
RECOGNITION:
• Image recognition in machine
learning involves using
algorithms to identify and
classify objects, patterns, or
features within an image. It is a
subset of computer vision,
which aims to enable computers
to interpret and make sense of
the visual world.

• Ex:
Social Media,
Face Locks etc
SPEECH
RECOGNITION:
• Speech recognition is the process
of converting spoken language into
text. This technology enables
machines to understand and
respond to human speech, and it
has numerous applications such as
voice-activated assistants,
transcription services, and
accessibility tools. Below is an
overview of the key concepts,
steps, popular algorithms,
applications, and challenges
associated with speech
recognition.
• Ex: Siri, Alexa
TRAFFIC PREDICTION:
• Traffic prediction is a process of estimating traffic
conditions, such as congestion levels, travel times, or
traffic volumes, for a specific road network. It is
widely used in smart transportation systems,
navigation applications, and urban planning.

• Ex: Google Maps.


PRODUCT RECOMMENDATION:
• Product recommendation
systems are a cornerstone of e-
commerce, streaming platforms,
and digital marketing. These
systems analyze user behavior,
preferences, and item
characteristics to suggest
products that users are likely to
find interesting or useful.

• Ex: Amazon, Myntra.


SELF-DRIVING
CARS:
• Self-driving cars, also known as
autonomous vehicles (AVs), are a
transformative transportation
innovation that aims to navigate and
operate without human intervention.
These vehicles use a combination of
sensors, algorithms, machine
learning, and advanced hardware to
perceive their environment and
make driving decisions. It will be
using Unsupervised learning.
• Ex:
Tesla
EMAIL SPAM AND
MALWARE FILTERING:
• Email spam and malware filtering is a
crucial application of machine learning
and cybersecurity that aims to protect
users from unsolicited emails, phishing
attacks, and malware. These systems
use advanced algorithms to identify
and block harmful emails before they
reach the user.

• Algorithms we use:
• 1.Multilayer Perceptron
• 2.Decision Tree
• 3.Naïve bayes classifier
VIRTUAL
PERSONAL
ASSISTANT:
• Virtual Personal Assistants
(VPAs) are AI-powered
applications designed to
assist users in managing
tasks, accessing information,
and controlling smart
devices. They utilize natural
language processing (NLP),
machine learning, and
automation to provide
human-like interaction and
support.
ONLINE FRAUD
DETECTION:
• Online fraud detection uses
advanced data analysis,
machine learning, and pattern
recognition to identify and
prevent fraudulent activities in
real time. Fraudulent activities
can include identity theft,
financial fraud, phishing, and
payment fraud in e-commerce,
banking, and other digital
platforms.
• Algorithms used:
Feed-forward neural network
STOCK MARKET
TRADING:
• Stock market trading involves
buying and selling financial
securities, such as stocks,
bonds, and derivatives.
Machine learning is widely
used in stock market analysis
to predict prices, optimize
trading strategies, and
manage risks.
MEDICAL
DIAGNOSIS:
• Machine learning (ML) plays
a transformative role in
medical diagnosis by
analyzing complex medical
data to assist in detecting,
diagnosing, and managing
diseases. These models can
process large datasets,
identify patterns, and provide
insights that support clinical
decision-making.
AUTOMATIC
LANGUAGE
TRANSLATION:
• Automatic Language
Translation is the process of
translating text or speech
from one language to
another using computational
models. Machine learning
(ML) has significantly
improved translation quality,
making it more accurate and
scalable.
Gathering Data Data
Data preparation Wrangling

Analyse Train the Test the


Data model model

LIFE CYCLE
Deployment
1. Gathering Data

This is the first and most crucial step in the machine learning lifecycle. Data is the
backbone of any machine learning model, and gathering relevant, accurate, and high-
quality data is vital to the success of the project.

Key Points:
• Sources of Data: Data can come from multiple sources such as databases, APIs, files
(CSV, Excel), or online platforms (social media, websites).
• Types of Data: Depending on the problem, the data can be structured (tables,
spreadsheets), unstructured (text, images, videos), or semi-structured (JSON, XML).
• Data Relevance: The data gathered should be relevant to the problem you're trying to
solve. It needs to represent the patterns or information you want the model to learn.
• Data Quantity: More data is generally better, but quality is more important than
quantity. The data should be sufficiently representative of the problem domain.
Example: If you are building a recommendation system, you might collect data on user
behavior, product interactions, and preferences.
2. Data Preparation
Once the data is gathered, it must be prepared for use in training the machine learning
model. This involves cleaning and transforming the data into a form that the model can use
effectively.

Key Points:
• Data Cleaning: This step involves handling missing values, dealing with duplicates,
correcting errors, and managing inconsistencies in the data. For example, if some rows
have missing values, you might fill them with the mean value or remove those rows
entirely.
• Data Transformation: This involves converting raw data into a form that can be easily
processed by machine learning algorithms. For example, you might normalize or scale
features to ensure that variables have comparable ranges, or you might encode
categorical data into numeric values.
• Data Splitting: Typically, data is split into three sets:
• Training Set: Used to train the model.
• Validation Set: Used to tune the model’s hyperparameters.
• Test Set: Used to evaluate the model’s performance after training.
Example: If you have a dataset with different scales (e.g., income in dollars and age in
years), you might scale the features using standardization techniques.
3. Data Wrangling

Data wrangling (also known as data munging) is a part of data preparation but deserves
special attention. It refers to the process of cleaning, transforming, and mapping raw data
into a more suitable format for analysis.

Key Points:
• Handling Missing Data: You might impute missing values, drop rows or columns with
excessive missing data, or use predictive models to estimate missing values.
• Outlier Detection and Removal: Outliers are extreme values that deviate significantly
from the rest of the data. Identifying and handling them is crucial, as they can skew the
results.
• Data Merging: Often, data comes from different sources, and combining these datasets
(merging or joining) is needed to create a comprehensive dataset for analysis.
• Normalizing/Scaling: Many algorithms (like distance-based algorithms) require
normalized or standardized data, so the features can be compared on the same scale.
Example: In a financial dataset, you might remove rows where transaction values are
excessively high (outliers) that could distort trends.
4. Analyze Data

After the data has been prepared, it’s time to explore and analyze it. This step involves
understanding the relationships between different features and identifying patterns that
might be useful for building a model.

Key Points:
• Exploratory Data Analysis (EDA): EDA is used to summarize the main characteristics of
the dataset, often with visual methods. This step helps identify trends, patterns,
correlations, and anomalies.
• Use histograms, boxplots, and scatter plots to understand the distribution of variables
and the relationships between them.
• Correlation matrices can help identify how different features are related to each other.
• Identifying Key Features: Through EDA, you may identify which features are most
influential in predicting the target variable. For example, in a house price prediction
problem, square footage and number of bedrooms might be more influential than the color
of the house.
Example: If you predict customer churn, analyzing which demographic or behavior-related
features are most associated with churn can guide the selection process.
5. Train the Model

Training a model is the step where the actual machine-learning algorithm is applied to the
data. The goal is to enable the model to learn patterns and relationships in the data so it can
make accurate predictions.

Key Points:
• Model Selection: Choose an appropriate machine learning algorithm based on the nature
of the problem (e.g., classification, regression, clustering).
• Examples: Decision Trees, Logistic Regression, Random Forest, Support Vector
Machines, and Neural Networks.
• Hyperparameter Tuning: Hyperparameters are settings or configurations that control the
model's training process (e.g., learning rate, number of layers in a neural network). Fine-
tuning these hyperparameters is crucial for getting the best performance.
• Training Process: The model is trained by feeding the training data and letting it adjust
internal parameters to minimize errors or loss. The training process may take some time
depending on the size of the dataset and the complexity of the model.
Example: In a classification problem, you may train a support vector machine (SVM) to
predict whether an email is spam or not based on features like word frequency and sender
details.
6. Test the Model

After training the model, it's time to evaluate its performance using the test dataset. This step
helps assess how well the model generalizes to unseen data.

Key Points:
• Model Evaluation Metrics: The choice of evaluation metrics depends on the type of
model and the problem you're solving.
• Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC.
• Regression: Mean Squared Error (MSE), Mean Absolute Error (MAE), R-squared.
• Overfitting/Underfitting: A model that performs well on the training data but poorly on the
test data is overfitting. A model that performs poorly on both the training and test data is
underfitting. Fine-tuning the model and selecting the right features can help avoid these
issues.
Example: You might test a classification model to see how accurately it classifies emails as
spam or non-spam, calculating precision and recall to ensure it's not falsely labeling non-
spam as spam.
7. Deployment

Once the model has been trained and tested, the final step is deployment—putting the
model into production so it can be used to make predictions on new data.

Key Points:
• Integration: The model is integrated into an existing system or application. For example,
an e-commerce site might integrate a recommendation model into its website to suggest
products to users.
• Monitoring: Once deployed, the model’s performance should be monitored regularly to
ensure that it continues to perform well. This may involve setting up dashboards or logs to
track model outputs.
• Model Updates: In many cases, models degrade over time as new data becomes
available. It’s important to retrain the model periodically with fresh data and potentially
update it to improve accuracy.
Example: After training a model to predict customer churn, you deploy it as an API in a
customer relationship management (CRM) tool to predict which customers are likely to
churn, helping businesses take proactive measures.
ThankYou
RETRIEVING OF THE
DATASETS
What is a dataset?

A dataset is a collection of data in


which data is arranged in some order.
A dataset can contain any data from a
series of an array to a database table

1. Each variable is a field in which the acquired data


comes under that field.
2. The tabular data is saved under the commonly
separated file i.e.., CSV file and the pre-typed
data is stored under the JSON files.
Below is the example of the
dataset:
Ordinal data:These
data are similar to
Numerical data:Such Categorical data:Such categorical data but
TYPES OF THE can be measured on
as house price, as Yes/No, True/False,
DATASETS: the basis of
temperature, etc. Blue/green, etc.
comparison.
• NEEDS OF DATASETS:

• Datasets are divided into 2 types:


• 1.Training dataset
• 2.Testing dataset

FLOW CHART OF DATASET


FREE SOURCE AND SERVICES PROVIDING DATASETS:

1. Kaggle Datasets
The link for the Kaggle dataset is
https://www.kaggle.com/datasets.
2. UCI Machine Learning Repository
The link for the UCI machine learning repository is
https://archive.ics.uci.edu/ml/index.php.
3. Datasets via AWS
The link for the resource is https://registry.opendata.aws/.
4. Google's Dataset Search Engine
https://toolbox.google.com/datasetsearch.
5. Microsoft Datasets The link to download or use the dataset
from this resource is https://msropendata.com/.

6. Awesome Public Dataset Collection The link to download the


dataset from Awesome public dataset collection is
https://github.com/awesomedata/awesom....

7. Government Datasets link is https://data.gov.in/

8. Computer Vision Datasets The link for downloading the dataset


from this source is https://www.visualdata.io/.

9. Scikit-learn dataset The link to download datasets from this


source is https://scikit-learn.org/stable/datas....
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy