0% found this document useful (0 votes)
14 views50 pages

Unit 3 Big Data

Big data chapter 3 bba ca

Uploaded by

vramoshi72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views50 pages

Unit 3 Big Data

Big data chapter 3 bba ca

Uploaded by

vramoshi72
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Introduction to

Machine Learning
Assistant Professor – Pratiksha Kadam
S.N.B.P College of Arts, Commerce and Science,
Morwadi, Pimpri, Pune.
Syllabus

3.1 Basics Of Machine Learning.


3.2 Supervised Machine Learning
3.2.1 K- Nearest-Neighbours,
3.2.2 Naïve Bayes
3.2.3 Decision tree
3.2.4 Support Vector Machines.
Machine Learning
What is machine learning? (2M) *2 times
Explain Machine learning.(4M) *2 times
• Machine Learning is defined as a technology that is used to train machines to perform various
actions such as predictions, recommendations, estimations, etc., based on historical data or past
experience
• Without being explicitly programmed, machine learning enables a machine to automatically
learn from data, improve performance from experiences, and predict things.
• Machine Learning enables computers to behave like human beings by training them with the
help of past experience and predicted data.
There are three key aspects of Machine Learning, which are as follows:
• Task: A task is defined as the main problem in which we are interested. This
task/problem can be related to the predictions and recommendations and
estimations, etc.
• Experience: It is defined as learning from historical or past data and used to
estimate and resolve future tasks.
• Performance: It is defined as the capacity of any machine to resolve any
machine learning task or problem and provide the best outcome for the same.
However, performance is dependent on the type of machine learning
problems.
• The Machine learning can be splitted in three parts Supervised Machine
Learning, Unsupervised Machine Learning, and Reinforcement learning.
• ****NOTE ( Include applications and types of Machine learning)***
• Write any two needs of Machine Learning. (2M)
• Give advantages of Machine Learning (4M)
• Give Disadvantages of Machine Learning.
Applications of Machine Learning
Supervised Learning
In supervised learning, the machine is trained on a set of labeled data, Labeled data is
data that has been tagged with a correct answer or classification. The machine then
learns to predict the output for new input data. Supervised learning is often used for
tasks such as classification and regression.
Key Points:
• Supervised learning involves training a machine from labeled data.
• Labeled data consists of examples with the correct answer or classification.
• The machine learns the relationship between inputs and outputs.
• The trained machine can then make predictions on new, unlabeled data.
(Example of cat and dogs with label)
Classification
Classification is a type of supervised learning that is used to predict categorical
values, such as whether an email is spam or not, there will be rain or not.
The output variable of Classification is a category and not a value.
Some common classification algorithms are:
i) K- nearest Neighbors
ii) Decision Tree
iii) Naive Bayesian
iv) Support Vector Machines.
K-Nearest Neighbors
What is KNN? (2M)
It is said to be “Lazy Learners Algorithm” which is mostly used to classify the new data point
to two or more categories.

Let us consider the example for selecting the size of jersey of the new player in the team
based on the previous jersey data.

Height of the player Height of the player

L-size
L-size
S-size S-size

Weight of the player Weight of the player


Decision Tree
Explain Decision tree with example. (4M) 2 times
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes represent
the features of a dataset, branches represent the decision rules and each leaf node
represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf
Node. Decision nodes are used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions and do not contain any further
branches.
• Decision Trees usually mimic human thinking ability while making a decision, so it is
easy to understand.
• A decision tree simply asks a question, and based on the answer (Yes/No),
it further split the tree into subtrees.
• Below diagram explains the general structure of a decision tree:
Naïve Baysian
How Naive Bayes algorithm works.(2M) 2 times
Explain Naive Bayes with the help of example. (4M) 2 times
Example of Bayes’ Theorem
Working of Naïve Bayes..
Support Vector Machines
Explain support vector machine with example. (4M) 2 times
Define SVM? (2M)
• Support Vector Machine(SVM) is one of the most popular Supervised Learning algorithms,
which is used for Classification as well as Regression problems. However, primarily, it is used for
Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best decision boundary that can separate n-
dimensional space into classes so that we can easily put the new data point in the correct category
in the future.
• This best decision boundary is called a hyperplane.
• SVM chooses the extreme points/vectors that help in creating the hyperplane. These extreme cases
are called as support vectors, and hence algorithm is termed as Support Vector Machine.
Support Vector Machine
Example : Suppose we see a strange cat that also has some features of dogs, so if we want a model that can
accurately identify whether it is a cat or dog, so such a model can be created by using the SVM algorithm.
We will first train our model with lots of images of cats and dogs so that it can learn about different features
of cats and dogs, and then we test it with this strange creature. So as support vector creates a decision
boundary between these two data (cat and dog) and choose extreme cases (support vectors), it will see the
extreme case of cat and dog. On the basis of the support vectors, it will classify it as a cat. Consider the
below diagram:
Advantages of SVM algorithm. (4M)
State advantages and disadvantages of SVM. (4M)
Regression
Define Regression Analysis(2M) OR What is regression? Explain
with its type. (4M)
Regression analysis is a set of statistical methods used for the estimation of
relationships between a dependent variable and one or more independent variables.
Regression analysis helps us to understand how the value of the dependent variable is
changing corresponding to an independent variable
It is mainly used for prediction, forecasting, time series modelling, and
determining the causal-effect relationship between variables.
1. Simple Regression analysis: It helps you estimate the relationship between a dependent variable and
one independent variable. For example, how much money someone earns based on their level of
education.
2. Multiple Regression analysis : It helps you determine the relationship between a dependent variable
and more than one independent variable. For example review salary earnings for education,
experience and proximity to a metropolitan area.
Example of linear regression is Experience and salary.
Explain types of regression models.(4M)
Unsupervised learning
In unsupervised learning, the machine is trained on a set of unlabeled data, which means that the input
data is not paired with the desired output. The machine then learns to find patterns and relationships in
the data. Unsupervised learning is often used for tasks such as clustering, dimensionality reduction etc.
Key Points
• Unsupervised learning allows the model to discover patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points together based on their inherent characteristics.
• Feature extraction captures essential information from the data, enabling the model to make meaningful
distinctions.
• Label association assigns categories to the clusters based on the extracted patterns and characteristics.
(Example of cat and dogs without labeled category), auto tagging on facebook.
Clustering
Explain cluster analysis with its types. (4M) 2 times
• A cluster is nothing but a collection of similar data which is grouped together.
• Clustering is the task of dividing the population or data points into a number of groups such that
data points in the same groups are more similar to other data points in the same group than those in
other groups.
• A clustering problem is where you want to discover the inherent groupings in the data without
labelling it.
• This process is often used for exploratory data analysis and can help identify patterns or
relationships within the data that may not be immediately obvious.
• It is used to group similar data points together.
• For example in a grocery shop the clusters can be Frequent customer, rare customer etc.
Types of cluster analysis.
1. Hierarchical clustering
In this method, first, a cluster is made and then added to another cluster (the most similar and closest
one) to form one single cluster. This process is repeated until all subjects are in one cluster. This
method is known as Agglomerative method. It starts with single objects and starts grouping them into
clusters.
The divisive method is another kind of Hierarchical method in which clustering starts with the
complete data set and then starts dividing into partitions.
The Hierarchical clustering is represented by the “Dendrogram”
2. Partition clustering
• It is a type of clustering that divides the data into non-hierarchical groups. It is also
known as the centroid-based method. The most common example of partitioning
clustering is the K-Means Clustering algorithm.
• In this type, the dataset is divided into a set of k groups, where K is used to define the
number of pre-defined groups. The cluster center is created in such a way that the
distance between the data points of one cluster is minimum as compared to another
cluster centroid.
K means
It is an iterative algorithm that divides the unlabeled dataset into k different
clusters in such a way that each dataset belongs only one group that has
similar properties.
3. DBSCAN (Density-Based Spatial
Clustering Of Applications With Noise)
In this type of clustering, clusters are defined by the areas of density that are
higher than the remaining of the data set. Objects in sparse areas are usually
required to separate clusters. The objects in these sparse points are usually
noise and border points in the graph. The most popular method in this type
of clustering is DBSCAN.
Market Basket Analysis
Define market basket analysis.(2M) 2 times
• Market basket analysis is a data mining technique used by retailers to increase sales by better
understanding customer purchasing patterns. It involves analysing large data sets, such as purchase
history, to reveal product groupings, as well as products that are likely to be purchased together.
• Implementation of market basket analysis requires a background in statistics and data science, as
well as some algorithmic computer programming skills. For those without the needed technical
skills, commercial, off-the-shelf tools exist.
Association Rule Mining
Explain Association rule mining. (4M) 2 times.
• Association Rule Mining is an Unsupervised Non-linear algorithm to uncover
how the items are associated with each other.
• In it, frequent Mining shows which items appear together in a transaction or
relation.
• It’s majorly used by retailers, grocery stores, an online marketplace that has a
large transactional database.
• The same way when any online social media, marketplace, and e-commerce
websites know what you buy next using recommendations engines.
• The recommendations you get on item or variable, while you check out the order is because of
Association rule mining boarded on past customer data.
• The association rule learning is one of the very important concepts of machine learning and it is
employed in Market Basket analysis, Web usage mining, continuous production, etc.
• We can understand it by taking an example of a supermarket, as in a supermarket, all products that are
purchased together are put together. For example, if a customer buys bread, he most likely can also
buy butter, eggs, or milk, so these products are stored within a shelf or mostly nearby.

• What are the applications of Association Rule Mining (2M)


Apriori algorithms
What is Apriori algorithm? (2M)
• This algorithm uses frequent datasets to generate association rules. It is designed to work
on the databases that contain transactions. This algorithm uses a breadth-first search
and Hash Tree to calculate the itemset efficiently.
• It is mainly used for market basket analysis and helps to understand the products that can
be bought together. It can also be used in the healthcare field to find drug reactions for
patients.
Explain the advantages and disadvantages of
Apriori algorithm.(4M)
THANK YOU….

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy