0% found this document useful (0 votes)

14 views50 pages

Unit 3 Big Data

Big data chapter 3 bba ca

Uploaded by

vramoshi72

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views50 pages

Unit 3 Big Data

Big data chapter 3 bba ca

Uploaded by

vramoshi72

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

Introduction to

Machine Learning
Assistant Professor – Pratiksha Kadam
S.N.B.P College of Arts, Commerce and Science,
Morwadi, Pimpri, Pune.
Syllabus

3.1 Basics Of Machine Learning.

3.2 Supervised Machine Learning
3.2.1 K- Nearest-Neighbours,
3.2.2 Naïve Bayes
3.2.3 Decision tree
3.2.4 Support Vector Machines.
Machine Learning
What is machine learning? (2M) *2 times
Explain Machine learning.(4M) *2 times
• Machine Learning is defined as a technology that is used to train machines to perform various
actions such as predictions, recommendations, estimations, etc., based on historical data or past
experience
• Without being explicitly programmed, machine learning enables a machine to automatically
learn from data, improve performance from experiences, and predict things.
• Machine Learning enables computers to behave like human beings by training them with the
help of past experience and predicted data.
There are three key aspects of Machine Learning, which are as follows:
• Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and
estimations, etc.
• Experience: It is defined as learning from historical or past data and used to
estimate and resolve future tasks.
• Performance: It is defined as the capacity of any machine to resolve any
machine learning task or problem and provide the best outcome for the same.
However, performance is dependent on the type of machine learning
problems.
• The Machine learning can be splitted in three parts Supervised Machine
Learning, Unsupervised Machine Learning, and Reinforcement learning.
• ****NOTE ( Include applications and types of Machine learning)***
• Write any two needs of Machine Learning. (2M)
• Give advantages of Machine Learning (4M)
• Give Disadvantages of Machine Learning.
Applications of Machine Learning
Supervised Learning
In supervised learning, the machine is trained on a set of labeled data, Labeled data is
data that has been tagged with a correct answer or classification. The machine then
learns to predict the output for new input data. Supervised learning is often used for
tasks such as classification and regression.
Key Points:
• Supervised learning involves training a machine from labeled data.
• Labeled data consists of examples with the correct answer or classification.
• The machine learns the relationship between inputs and outputs.
• The trained machine can then make predictions on new, unlabeled data.
(Example of cat and dogs with label)
Classification
Classification is a type of supervised learning that is used to predict categorical
values, such as whether an email is spam or not, there will be rain or not.
The output variable of Classification is a category and not a value.
Some common classification algorithms are:
i) K- nearest Neighbors
ii) Decision Tree
iii) Naive Bayesian
iv) Support Vector Machines.
K-Nearest Neighbors
What is KNN? (2M)
It is said to be “Lazy Learners Algorithm” which is mostly used to classify the new data point
to two or more categories.

Let us consider the example for selecting the size of jersey of the new player in the team
based on the previous jersey data.

Height of the player Height of the player

L-size
L-size
S-size S-size

Weight of the player Weight of the player

Decision Tree
Explain Decision tree with example. (4M) 2 times
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
• Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
• A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
• Below diagram explains the general structure of a decision tree:
Naïve Baysian
How Naive Bayes algorithm works.(2M) 2 times
Explain Naive Bayes with the help of example. (4M) 2 times
Example of Bayes’ Theorem
Working of Naïve Bayes..
Support Vector Machines
Explain support vector machine with example. (4M) 2 times
Define SVM? (2M)
• Support Vector Machine(SVM) is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best decision boundary that can separate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future.
• This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Support Vector Machine
Example : Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm.
We will first train our model with lots of images of cats and dogs so that it can learn about different features
of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the
below diagram:
Advantages of SVM algorithm. (4M)
State advantages and disadvantages of SVM. (4M)
Regression
Define Regression Analysis(2M) OR What is regression? Explain
with its type. (4M)
Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables.
Regression analysis helps us to understand how the value of the dependent variable is
changing corresponding to an independent variable
It is mainly used for prediction, forecasting, time series modelling, and
determining the causal-effect relationship between variables.
1. Simple Regression analysis: It helps you estimate the relationship between a dependent variable and
one independent variable. For example, how much money someone earns based on their level of
education.
2. Multiple Regression analysis : It helps you determine the relationship between a dependent variable
and more than one independent variable. For example review salary earnings for education,
experience and proximity to a metropolitan area.
Example of linear regression is Experience and salary.
Explain types of regression models.(4M)
Unsupervised learning
In unsupervised learning, the machine is trained on a set of unlabeled data, which means that the input
data is not paired with the desired output. The machine then learns to find patterns and relationships in
the data. Unsupervised learning is often used for tasks such as clustering, dimensionality reduction etc.
Key Points
• Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points together based on their inherent characteristics.
• Feature extraction captures essential information from the data, enabling the model to make meaningful
distinctions.
• Label association assigns categories to the clusters based on the extracted patterns and characteristics.
(Example of cat and dogs without labeled category), auto tagging on facebook.
Clustering
Explain cluster analysis with its types. (4M) 2 times
• A cluster is nothing but a collection of similar data which is grouped together.
• Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group than those in
other groups.
• A clustering problem is where you want to discover the inherent groupings in the data without
labelling it.
• This process is often used for exploratory data analysis and can help identify patterns or
relationships within the data that may not be immediately obvious.
• It is used to group similar data points together.
• For example in a grocery shop the clusters can be Frequent customer, rare customer etc.
Types of cluster analysis.
1. Hierarchical clustering
In this method, first, a cluster is made and then added to another cluster (the most similar and closest
one) to form one single cluster. This process is repeated until all subjects are in one cluster. This
method is known as Agglomerative method. It starts with single objects and starts grouping them into
clusters.
The divisive method is another kind of Hierarchical method in which clustering starts with the
complete data set and then starts dividing into partitions.
The Hierarchical clustering is represented by the “Dendrogram”
2. Partition clustering
• It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
• In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another
cluster centroid.
K means
It is an iterative algorithm that divides the unlabeled dataset into k different
clusters in such a way that each dataset belongs only one group that has
similar properties.
3. DBSCAN (Density-Based Spatial
Clustering Of Applications With Noise)
In this type of clustering, clusters are defined by the areas of density that are
higher than the remaining of the data set. Objects in sparse areas are usually
required to separate clusters. The objects in these sparse points are usually
noise and border points in the graph. The most popular method in this type
of clustering is DBSCAN.
Market Basket Analysis
Define market basket analysis.(2M) 2 times
• Market basket analysis is a data mining technique used by retailers to increase sales by better
understanding customer purchasing patterns. It involves analysing large data sets, such as purchase
history, to reveal product groupings, as well as products that are likely to be purchased together.
• Implementation of market basket analysis requires a background in statistics and data science, as
well as some algorithmic computer programming skills. For those without the needed technical
skills, commercial, off-the-shelf tools exist.
Association Rule Mining
Explain Association rule mining. (4M) 2 times.
• Association Rule Mining is an Unsupervised Non-linear algorithm to uncover
how the items are associated with each other.
• In it, frequent Mining shows which items appear together in a transaction or
relation.
• It’s majorly used by retailers, grocery stores, an online marketplace that has a
large transactional database.
• The same way when any online social media, marketplace, and e-commerce
websites know what you buy next using recommendations engines.
• The recommendations you get on item or variable, while you check out the order is because of
Association rule mining boarded on past customer data.
• The association rule learning is one of the very important concepts of machine learning and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc.
• We can understand it by taking an example of a supermarket, as in a supermarket, all products that are
purchased together are put together. For example, if a customer buys bread, he most likely can also
buy butter, eggs, or milk, so these products are stored within a shelf or mostly nearby.

• What are the applications of Association Rule Mining (2M)

Apriori algorithms
What is Apriori algorithm? (2M)
• This algorithm uses frequent datasets to generate association rules. It is designed to work
on the databases that contain transactions. This algorithm uses a breadth-first search
and Hash Tree to calculate the itemset efficiently.
• It is mainly used for market basket analysis and helps to understand the products that can
be bought together. It can also be used in the healthcare field to find drug reactions for
patients.
Explain the advantages and disadvantages of
Apriori algorithm.(4M)
THANK YOU….

Sample Certificate of Non-Claim (Car Insurance Claim)
71% (7)
Sample Certificate of Non-Claim (Car Insurance Claim)
1 page
Mahanakhon Structural Design Presentation
100% (1)
Mahanakhon Structural Design Presentation
42 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
ML Notes
No ratings yet
ML Notes
10 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
No ratings yet
Fulldoc - Dsec Mca - Crime Prediction (1) - 051521
65 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
Exploring Machine Learning Algorithms - A Beginner's Guide
No ratings yet
Exploring Machine Learning Algorithms - A Beginner's Guide
10 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
Introduction of Machine Learning
No ratings yet
Introduction of Machine Learning
9 pages
M Learning
No ratings yet
M Learning
11 pages
Machine Learning - Part - 1
No ratings yet
Machine Learning - Part - 1
17 pages
Module 1 & 2
No ratings yet
Module 1 & 2
21 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
UNIT1
No ratings yet
UNIT1
38 pages
Unit 3
No ratings yet
Unit 3
61 pages
Classification Algorithms 3rd
No ratings yet
Classification Algorithms 3rd
15 pages
Unit 1
No ratings yet
Unit 1
15 pages
Aiya Session 4
No ratings yet
Aiya Session 4
42 pages
Understanding Machine Learning Algorithms - in Depth
No ratings yet
Understanding Machine Learning Algorithms - in Depth
167 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Machine Learning Classification, Regression and Clustering
No ratings yet
Machine Learning Classification, Regression and Clustering
77 pages
Unit 1-1
No ratings yet
Unit 1-1
32 pages
ML - Machine Learning PDF
No ratings yet
ML - Machine Learning PDF
13 pages
How To Perform Clustering Algorithms in Machine Learning
No ratings yet
How To Perform Clustering Algorithms in Machine Learning
9 pages
ML & DL Notes
No ratings yet
ML & DL Notes
30 pages
1 - Supervised Learning & Its Types
No ratings yet
1 - Supervised Learning & Its Types
24 pages
Bike Buyer Prediction Using Classification Algorithm
No ratings yet
Bike Buyer Prediction Using Classification Algorithm
19 pages
Machine Learning For Beginners PDF
No ratings yet
Machine Learning For Beginners PDF
29 pages
3 Introduction To Machine Learning
No ratings yet
3 Introduction To Machine Learning
21 pages
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
No ratings yet
Machine Learning (Part 1) : Iykra Data Fellowship Batch 3
28 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
17 pages
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
No ratings yet
Unit 3 and Unit 4 Notes - Data Science - III BCA 2
27 pages
Unit 5
No ratings yet
Unit 5
16 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
AIML
No ratings yet
AIML
30 pages
Machine Learning and Regression
No ratings yet
Machine Learning and Regression
8 pages
ML Assignment 2 PDF
No ratings yet
ML Assignment 2 PDF
9 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Machine Learning
No ratings yet
Machine Learning
56 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
Classification
No ratings yet
Classification
7 pages
Machine Learning
100% (6)
Machine Learning
115 pages
ML Unit 1
No ratings yet
ML Unit 1
74 pages
Unit 1
No ratings yet
Unit 1
8 pages
ARTIFICIAL INTE-WPS Office
No ratings yet
ARTIFICIAL INTE-WPS Office
29 pages
Machine Learning Theory
100% (1)
Machine Learning Theory
12 pages
Machine Learning
No ratings yet
Machine Learning
15 pages
Interview Preparing - ML Draft
No ratings yet
Interview Preparing - ML Draft
12 pages
Meta Motion Fitness Tracker 241109 213742 (1) Removed
No ratings yet
Meta Motion Fitness Tracker 241109 213742 (1) Removed
20 pages
Unit 1
100% (1)
Unit 1
13 pages
Machine Learning
No ratings yet
Machine Learning
35 pages
Supervised Learning
No ratings yet
Supervised Learning
46 pages
Business Data Mining Week 5
No ratings yet
Business Data Mining Week 5
19 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
PHP 1
No ratings yet
PHP 1
2 pages
Research Project PDF
No ratings yet
Research Project PDF
11 pages
Web Tech - Oct - 2022 - QP
No ratings yet
Web Tech - Oct - 2022 - QP
3 pages
Linked List
No ratings yet
Linked List
13 pages
Programs in C Practical
No ratings yet
Programs in C Practical
7 pages
Aptitude Cracker Syllabus
No ratings yet
Aptitude Cracker Syllabus
3 pages
Introduction To Computer Network-Unit1
No ratings yet
Introduction To Computer Network-Unit1
24 pages
Mentor Mentee Form
No ratings yet
Mentor Mentee Form
3 pages
Assignment
No ratings yet
Assignment
16 pages
Plagiarism Report
No ratings yet
Plagiarism Report
4 pages
ExcelR - 104061-Vanshika Mallikarjun Ramoshi
No ratings yet
ExcelR - 104061-Vanshika Mallikarjun Ramoshi
1 page
IIC Workshop Poster - 09012025
No ratings yet
IIC Workshop Poster - 09012025
1 page
Physiology Pneumonics
No ratings yet
Physiology Pneumonics
9 pages
Don Mariano Marcos Memorial State University College of Graduate Studies
No ratings yet
Don Mariano Marcos Memorial State University College of Graduate Studies
4 pages
Force of Friction
No ratings yet
Force of Friction
30 pages
Lesson Planning in Teaching
No ratings yet
Lesson Planning in Teaching
10 pages
Chapter 08 - Sampling Methods and The Central Limit Theorem
No ratings yet
Chapter 08 - Sampling Methods and The Central Limit Theorem
16 pages
Structural Foundation Sections Sheet 1 of 2
No ratings yet
Structural Foundation Sections Sheet 1 of 2
1 page
Subtitle
No ratings yet
Subtitle
4 pages
Work at Height Permit
No ratings yet
Work at Height Permit
1 page
Financial Instruments Managed Through DTC
No ratings yet
Financial Instruments Managed Through DTC
6 pages
Mathura Vrindavan Tour
No ratings yet
Mathura Vrindavan Tour
1 page
A2 Chapter 4 Notes & HW 6
No ratings yet
A2 Chapter 4 Notes & HW 6
36 pages
A FAREWELL TO VIROLOGY (EXPERT EDITION) DR Mark Bailey
No ratings yet
A FAREWELL TO VIROLOGY (EXPERT EDITION) DR Mark Bailey
67 pages
OSS Engine Parts Section
No ratings yet
OSS Engine Parts Section
28 pages
Family Waste Inventory
No ratings yet
Family Waste Inventory
2 pages
ANCHORE
No ratings yet
ANCHORE
2 pages
All About Bohol
No ratings yet
All About Bohol
5 pages
Safe Work Procedure
No ratings yet
Safe Work Procedure
2 pages
PAS Report 556
No ratings yet
PAS Report 556
264 pages
TX Planning Presentation
No ratings yet
TX Planning Presentation
18 pages
AllPack Cataloque - 11.10.24
No ratings yet
AllPack Cataloque - 11.10.24
8 pages
UEFA Euro 2020 Case Study
No ratings yet
UEFA Euro 2020 Case Study
3 pages
Lesson 1
No ratings yet
Lesson 1
4 pages
1.1 Identify Ty
No ratings yet
1.1 Identify Ty
7 pages
The Threats To The Objectivity in Internal Auditing
No ratings yet
The Threats To The Objectivity in Internal Auditing
2 pages
Gsu100 6648-0.0
No ratings yet
Gsu100 6648-0.0
16 pages
NS & Tech - Grade 4 - Terminology List - IsiZulu
No ratings yet
NS & Tech - Grade 4 - Terminology List - IsiZulu
11 pages
991.20 Nitrogeno Total en Leche - Kjeldahl
No ratings yet
991.20 Nitrogeno Total en Leche - Kjeldahl
2 pages
Sessional Marks (Theory)
0% (1)
Sessional Marks (Theory)
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 3 Big Data

Uploaded by

Unit 3 Big Data

Uploaded by

Introduction to

3.1 Basics Of Machine Learning.

Height of the player Height of the player

Weight of the player Weight of the player

• What are the applications of Association Rule Mining (2M)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.