0% found this document useful (0 votes)
24 views31 pages

Introduction To Machine Learning

Machine Learning (ML) is a branch of artificial intelligence that enables machines to learn from data and make predictions with minimal human intervention. It encompasses various types such as supervised, unsupervised, semi-supervised, and reinforcement learning, each with unique applications and challenges. ML is increasingly utilized across multiple sectors, including healthcare, finance, retail, travel, and social media, to enhance efficiency, improve decision-making, and personalize user experiences.

Uploaded by

cajewen566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views31 pages

Introduction To Machine Learning

Machine Learning (ML) is a branch of artificial intelligence that enables machines to learn from data and make predictions with minimal human intervention. It encompasses various types such as supervised, unsupervised, semi-supervised, and reinforcement learning, each with unique applications and challenges. ML is increasingly utilized across multiple sectors, including healthcare, finance, retail, travel, and social media, to enhance efficiency, improve decision-making, and personalize user experiences.

Uploaded by

cajewen566
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 31

Module-2

Chapter-1
INTRODUCTION TO MACHINE LEARNING
(ML) is a discipline of artificial intelligence (AI) that provides
machines with the ability to automatically learn from data and
past experiences while identifying patterns to make predictions
with minimal human intervention.
Machine learning methods enable computers to operate
autonomously without explicit programming. ML applications
are fed with new data, and they can independently learn, grow,
develop, and adapt.
Machine learning derives insightful information from large
volumes of data by leveraging algorithms to identify patterns
and learn in an iterative process. ML algorithms use
computation methods to learn directly from data instead of
relying on any predetermined equation that may serve as a
model.
The performance of ML algorithms adaptively improves with an
increase in the number of available samples during the
‘learning’ processes. For example, deep learning is a sub-
domain of machine learning that trains computers to imitate
natural human traits like learning from examples. It offers better
performance parameters than conventional ML algorithms.
Today, with the rise of big data, IoT, and ubiquitous computing,
machine learning has become essential for solving problems
across numerous areas, such as
 Computational finance (credit scoring, algorithmic
trading)
 Computer vision (facial recognition, motion tracking,
object detection)
 Computational biology (DNA sequencing, brain tumor
detection, drug discovery)
 Automotive, aerospace, and manufacturing (predictive
maintenance)
 Natural language processing (voice recognition)

How does machine learning work?


Machine learning algorithms are molded on a training dataset to
create a model. As new input data is introduced to the trained
ML algorithm, it uses the developed model to make a prediction.
Types of Machine Learning
Machine learning algorithms can be trained in many ways, with
each method having its pros and cons. Based on these methods
and ways of learning, machine learning is broadly categorized
into four main types:
Types of Machine Learning

1. Supervised machine learning


This type of ML involves supervision, where machines are
trained on labeled datasets and enabled to predict outputs based
on the provided training. The labeled dataset specifies that some
input and output parameters are already mapped. Hence, the
machine is trained with the input and corresponding output. A
device is made to predict the outcome using the test dataset in
subsequent phases.
For example, consider an input dataset of parrot and crow
images. Initially, the machine is trained to understand the
pictures, including the parrot and crow’s color, eyes, shape, and
size. Post-training, an input picture of a parrot is provided, and
the machine is expected to identify the object and predict the
output. The trained machine checks for the various features of
the object, such as color, eyes, shape, etc., in the input picture, to
make a final prediction. This is the process of object
identification in supervised machine learning.
The primary objective of the supervised learning technique is to
map the input variable (a) with the output variable (b).
Supervised machine learning is further classified into two broad
categories:
 Classification: These refer to algorithms that address
classification problems where the output variable is
categorical; for example, yes or no, true or false, male
or female, etc. Real-world applications of this category
are evident in spam detection and email filtering.
Some known classification algorithms include the
Random Forest Algorithm, Decision Tree Algorithm, Logistic
Regression Algorithm, and Support Vector Machine Algorithm.
 Regression: Regression algorithms handle regression
problems where input and output variables have a
linear relationship. These are known to predict
continuous output variables. Examples include weather
prediction, market trend analysis, etc.
Popular regression algorithms include the
Simple Linear Regression Algorithm, Multivariate Regression
Algorithm, Decision Tree Algorithm, and Lasso Regression.
2. Unsupervised machine learning
Unsupervised learning refers to a learning technique that’s
devoid of supervision. Here, the machine is trained using an
unlabeled dataset and is enabled to predict the output without
any supervision. An unsupervised learning algorithm aims to
group the unsorted dataset based on the input’s similarities,
differences, and patterns.
For example, consider an input dataset of images of a fruit-filled
container. Here, the images are not known to the machine
learning model. When we input the dataset into the ML model,
the task of the model is to identify the pattern of objects, such as
color, shape, or differences seen in the input images and
categorize them. Upon categorization, the machine then predicts
the output as it gets tested with a test dataset.
Unsupervised machine learning is further classified into two
types:
 Clustering: The clustering technique refers to grouping
objects into clusters based on parameters such as
similarities or differences between objects. For
example, grouping customers by the products they
purchase.
Some known clustering algorithms include the K-Means
Clustering Algorithm, Mean-Shift Algorithm, DBSCAN
Algorithm, Principal Component Analysis, and Independent
Component Analysis.
 Association: Association learning refers to identifying
typical relations between the variables of a large
dataset. It determines the dependency of various data
items and maps associated variables. Typical
applications include web usage mining and market data
analysis.
Popular algorithms obeying association rules include the
Apriori Algorithm, Eclat Algorithm, and FP-Growth Algorithm.
3. Semi-supervised learning
Semi-supervised learning comprises characteristics of both
supervised and unsupervised machine learning. It uses the
combination of labeled and unlabeled datasets to train its
algorithms. Using both types of datasets, semi-supervised
learning overcomes the drawbacks of the options mentioned
above.
Consider an example of a college student. A student learning a
concept under a teacher’s supervision in college is termed
supervised learning. In unsupervised learning, a student self-
learns the same concept at home without a teacher’s guidance.
Meanwhile, a student revising the concept after learning under
the direction of a teacher in college is a semi-supervised form of
learning.
4. Reinforcement learning
Reinforcement learning is a feedback-based process. Here, the
AI component automatically takes stock of its surroundings by
the hit & trial method, takes action, learns from experiences, and
improves performance. The component is rewarded for each
good action and penalized for every wrong move. Thus, the
reinforcement learning component aims to maximize the
rewards by performing good actions.
Unlike supervised learning, reinforcement learning lacks labeled
data, and the agents learn via experiences only. Consider video
games. Here, the game specifies the environment, and each
move of the reinforcement agent defines its state. The agent is
entitled to receive feedback via punishment and rewards,
thereby affecting the overall game score. The ultimate goal of
the agent is to achieve a high score.
Reinforcement learning is applied across different fields such as
game theory, information theory, and multi-agent systems.
Reinforcement learning is further divided into two types of
methods or algorithms:
 Positive reinforcement learning: This refers to adding a
reinforcing stimulus after a specific behavior of the
agent, which makes it more likely that the behavior
may occur again in the future, e.g., adding a reward
after a behavior.
 Negative reinforcement learning: Negative
reinforcement learning refers to strengthening a
specific behavior that avoids a negative outcome.
Challenges of Machine Learning
1. Poor Quality of Data

Data plays a significant role in the machine learning process. One of the
significant issues that machine learning professionals face is the absence
of good quality data. Unclean and noisy data can make the whole
process extremely exhausting.

2. Under fitting of Training Data

This process occurs when data is unable to establish an accurate


relationship between input and output variables.

It simply means trying to fit in undersized jeans. It signifies the data is


too simple to establish a precise relationship. To overcome this issue:
 Maximize the training time
 Enhance the complexity of the model
 Add more features to the data
 Reduce regular parameters
 Increasing the training time of model

3. Over fitting of Training Data

Overfitting refers to a machine learning model trained with a massive


amount of data that negatively affect its performance.

It is like trying to fit in Oversized jeans. Unfortunately, this is one of the


significant issues faced by machine learning professionals.
This means that the algorithm is trained with noisy and biased data,
which will affect its overall performance.
We can tackle this issue by:
 Analyzing the data with the utmost level of perfection
 Use data augmentation technique
 Remove outliers in the training set
 Select a model with lesser features

4. Machine Learning is a Complex Process

The machine learning industry is young and is continuously changing.


Rapid hit and trial experiments are being carried on.

The process is transforming, and hence there are high chances of error
which makes the learning complex.

It includes analyzing the data, removing data bias, training data,


applying complex mathematical calculations, and a lot more.

Hence it is a really complicated process which is another big challenge


for Machine learning professionals.

5. Lack of Training Data

The most important task you need to do in the machine learning process
is to train the data to achieve an accurate output. Less amount training
data will produce inaccurate or too biased predictions.
Let us understand this with the help of an example. Consider a machine
learning algorithm similar to training a child.
One day you decided to explain to a child how to distinguish between an
apple and a watermelon. You will take an apple and a watermelon and
show him the difference between both based on their color, shape, and
taste. In this way, soon, he will attain perfection in differentiating
between the two.
But on the other hand, a machine-learning algorithm needs a lot of data
to distinguish. For complex problems, it may even require millions of
data to be trained. Therefore we need to ensure that Machine learning
algorithms are trained with sufficient amounts of data.

6. Slow Implementation
This is one of the common issues faced by machine learning
professionals. The machine learning models are highly efficient in
providing accurate results, but it takes a tremendous amount of time.
Slow programs, data overload, and excessive requirements usually take a
lot of time to provide accurate results. Further, it requires constant
monitoring and maintenance to deliver the best output.

7. Imperfections in the Algorithm When Data Grows

So you have found quality data, trained it amazingly, and the predictions
are really concise and accurate.
Yay, you have learned how to create a machine learning algorithm!! But
wait, there is a twist; the model may become useless in the future as data
grows.
The best model of the present may become inaccurate in the coming
Future and require further rearrangement.
So you need regular monitoring and maintenance to keep the algorithm
working. This is one of the most exhausting issues faced by machine
learning professionals.

Machine Learning Process


Machine Learning Applications
Industry verticals handling large amounts of data have realized
the significance and value of machine learning technology. As
machine learning derives insights from data in real-time,
organizations using it can work efficiently and gain an edge over
their competitors.
Every industry vertical in this fast-paced digital world, benefits
immensely from machine learning tech. Here, we look at the top
five ML application sectors.
1. Healthcare industry
Machine learning is being increasingly adopted in the healthcare
industry, credit to wearable devices and sensors such as
wearable fitness trackers, smart health watches, etc. All such
devices monitor users’ health data to assess their health in real-
time.
Moreover, the technology is helping medical practitioners in
analyzing trends or flagging events that may help in improved
patient diagnoses and treatment. ML algorithms even allow
medical experts to predict the lifespan of a patient suffering
from a fatal disease with increasing accuracy.
Additionally, machine learning is contributing significantly to
two areas:
 Drug discovery: Manufacturing or discovering a new
drug is expensive and involves a lengthy process.
Machine learning helps speed up the steps involved in
such a multi-step process. For example, Pfizer uses
IBM’s Watson to analyze massive volumes of disparate
data for drug discovery.
 Personalized treatment: Drug manufacturers face the
stiff challenge of validating the effectiveness of a
specific drug on a large mass of the population. This is
because the drug works only on a small group in
clinical trials and possibly causes side effects on some
subjects.
To address these issues, companies like Genentech have
collaborated with GNS Healthcare to leverage machine learning
and simulation AI platforms, innovating biomedical treatments
to address these issues. ML technology looks for patients’
response markers by analyzing individual genes, which provides
targeted therapies to patients.
2. Finance sector
Today, several financial organizations and banks use machine
learning technology to tackle fraudulent activities and draw
essential insights from vast volumes of data.
ML-derived insights aid in identifying investment opportunities
that allow investors to decide when to trade.
Moreover, data mining methods help cyber-surveillance systems
zero in on warning signs of fraudulent activities, subsequently
neutralizing them. Several financial institutes have already
partnered with tech companies to leverage the benefits of
machine learning.
For example,
 Citibank has partnered with fraud detection company
Feedzai to handle online and in-person banking frauds.
 PayPal uses several machine learning tools to
differentiate between legitimate and fraudulent
transactions between buyers and sellers.
3. Retail sector
Retail websites extensively use machine learning to recommend
items based on users’ purchase history. Retailers use ML
techniques to capture data, analyze it, and deliver personalized
shopping experiences to their customers. They also implement
ML for marketing campaigns, customer insights, customer
merchandise planning, and price optimization.
Common day-to-day examples of recommendation systems
include:
 When you browse items on Amazon, the product
recommendations that you see on the homepage result
from machine learning algorithms. Amazon
uses artificial neural networks (ANN) to offer
intelligent, personalized recommendations relevant to
customers based on their recent purchase history,
comments, bookmarks, and other online activities.
 Netflix and YouTube rely heavily on recommendation
systems to suggest shows and videos to their users
based on their viewing history.
Moreover, retail sites are also powered with virtual assistants or
conversational chatbots that leverage ML, natural language
processing (NLP), and natural language understanding (NLU) to
automate customer shopping experiences.
4. Travel industry
Machine learning is playing a pivotal role in expanding the
scope of the travel industry. Rides offered by Uber, Ola, and
even self-driving cars have a robust machine learning backend.
Consider Uber’s machine learning algorithm that handles the
dynamic pricing of their rides. Uber uses a machine learning
model called ‘Geosurge’ to manage dynamic pricing parameters.
It uses real-time predictive modeling on traffic patterns, supply,
and demand. If you are getting late for a meeting and need to
book an Uber in a crowded area, the dynamic pricing model
kicks in, and you can get an Uber ride immediately but would
need to pay twice the regular fare.
Moreover, the travel industry uses machine learning to analyze
user reviews. User comments are classified through sentiment
analysis based on positive or negative scores. This is used for
campaign monitoring, brand monitoring, compliance
monitoring, etc., by companies in the travel industry.
5. Social media
With machine learning, billions of users can efficiently engage
on social media networks. Machine learning is pivotal in driving
social media platforms from personalizing news feeds to
delivering user-specific ads.
For example, Facebook’s auto-tagging feature employs image
recognition to identify your friend’s face and tag them
automatically. The social network uses ANN to recognize
familiar faces in users’ contact lists and facilitates automated
tagging.
Similarly, LinkedIn knows when you should apply for your next
role, whom you need to connect with, and how your skills rank
compared to peers. All these features are enabled by machine
learning.
Need for Machine Learning
(a) The question now is why Machine Learning technology is so
important. The likely answer is the brisk demand for Machine Learning
technology.

(b) Machine Learning can reduce costs, mitigate risks, and improve
quality of life by recommending products/services, detecting cyber
security breaches, and enabling self-driving cars. It is becoming more
common and will soon integrate into many facets of life.

(c) Machine Learning is a popular subfield of Artificial Intelligence used


in various fields, including healthcare, finance, infrastructure, marketing,
self-driving cars, recommendation systems, chatbots, social sites,
gaming, cyber security, and others.

(d) Machine Learning is critical because it allows businesses to interpret


customer behavior trends and understand business operation patterns in a
broader context. Furthermore, today’s top companies, such as Facebook,
Google, and Uber, are prioritizing Machine Learning in their operations.
Machine Learning in Relation to other Fields

Machine Learning and AI


In simplest terms, AI is computer software that mimics the ways
that humans think in order to perform complex tasks, such as
analyzing, reasoning, and learning. Machine learning,
meanwhile, is a subset of AI that uses algorithms trained on data
to produce models that can perform such complex tasks.

Most AI is performed using machine learning, so the two terms


are often used synonymously, but AI actually refers to the
general concept of creating human-like cognition using computer
software, while ML only one method of doing so.

Machine Learning and Data Science


Data science is a concept used to tackle big data and
includes data cleansing, preparation, and analysis. A data
scientist gathers data from multiple sources and applies machine
learning, predictive analytics, and sentiment analysis to extract
critical information from the collected data sets. They
understand data from a business point of view and can provide
accurate predictions and insights that can be used to power
critical business decisions.
Machine Learning and Data Mining

S.No. Data Mining Machine Learning

Extracting useful
Introduce algorithm from data as
1. information from large
well as from past experience
amount of data

Used to understand the Teaches the computer to learn and


2.
data flow understand from the data flow

Huge databases with Existing data as well as


3.
unstructured data algorithms

machine learning algorithm can


Models can be developed
be used in the decision tree,
4. for using data mining
neural networks and some other
technique
area of artificial intelligence

human interference is No human effort required after


005.
more in it. design

It is used in web Search, spam


It is used in cluster
6. filter, fraud detection and
analysis
computer design

Data mining abstract from


7. Machine learning reads machine
the data warehouse

Data mining is more of a


Self learned and trains system to
8. research using methods
do the intelligent task
like machine learning
S.No. Data Mining Machine Learning

9. Applied in limited area Can be used in vast area

Uncovering hidden Making accurate predictions or


10.
patterns and insights decisions based on data

Exploratory and
11. Predictive and prescriptive
descriptive

12. Historical data Historical and real-time data

Patterns, relationships, Predictions, classifications, and


13 .
and trends recommendations

Clustering, association
Regression, classification,
14. rule mining, outlier
clustering, deep learning
detection

Data cleaning,
Data cleaning, transformation,
15. transformation, and
and feature engineering
integration

Strong domain knowledge Domain knowledge is helpful, but


16.
is often required not always necessary

Can be used in a wide Primarily used in applications


range of applications, where prediction or decision-
17. including business, making is important, such as
healthcare, and social finance, manufacturing, and
science cybersecurity
Machine Learning and Data Analytics

Data analytics, also known as data analysis, is the process of cleaning,


inspecting, modelling, and transforming data for finding valuable
information, informing conclusions and enhancing the decision-
making process.

Data analytics focuses on generating valuable insights from the


available data. Companies use data analytics to make better-informed
decisions regarding various matters including marketing, production,
etc. Data analytics helps you take raw data and extract helpful
information from the same.
Module-2

Chapter-2
Understanding Data

2.1 What is Data?


Data is different types of information usually formatted in a particular
manner. All software is divided into two major categories: programs and
data. We already know what data is, and programs are collections of
instructions used to manipulate data.

We use data science to make it easier to work with data. Data science is
defined as a field that combines knowledge of
mathematics, programming skills, domain expertise, scientific methods,
algorithms, processes, and systems to extract actionable knowledge and
insights from both structured and unstructured data, then apply the
knowledge gleaned from that data to a wide range of uses and domains.

What is Information?

Information is defined as classified or organized data that has some


meaningful value for the user. Information is also the processed data
used to make decisions and take action. Processed data must meet the
following criteria for it to be of any significant use in decision-making:

 Accuracy: The information must be accurate.


 Completeness: The information must be complete.
 Timeliness: The information must be available when it’s needed.
2.1.1 Types of data
Growth in the field of technology, specifically in
smartphones has led to text, video, and audio is included under data plus
the web and log activity records as well. Most of this data is
unstructured.

The term Big Data is used in the data definition to describe the data that
is in the petabyte range or higher. Big Data is also described as 5Vs:
variety, volume, value, veracity, and velocity. Nowadays, web-based
eCommerce has spread vastly, business models based on Big Data have
evolved, and they treat data as an asset itself. And there are many
benefits of Big Data as well, such as reduced costs, enhanced efficiency,
enhanced sales, etc.

2.1.2 Data Storage and Representation

How is Data Stored?

Computers represent data (e.g., text, images, sound, video), as binary


values that employ two numbers: 1 and 0. The smallest unit of data is
called a “bit,” and it represents a single value. Additionally, a byte is
eight bits long. Memory and storage are measured in units such as
megabytes, gigabytes, terabytes, petabytes, and exabytes. Data scientists
keep coming up with newer, larger data measurements as the amount of
data our society generates continues to grow.

Data can be stored in file formats using mainframe systems such as


ISAM and VSAM, though there are other file formats for data
conversion, processing, and storage, like comma-separated values. These
data formats are currently used across a wide range of machine types,
despite more structured-data-oriented approaches gaining a greater
foothold in today’s IT world.
The field of data storage has seen greater specialization develop as the
database, the database management system, and more recently, relational
database technology, each made their debut and provided new ways to
organize information.
2.2 BIG DATA ANALYTICS AND TYPES

2.2.1 What is Data Analytics?


In this new digital world, data is being generated in an enormous
amount which opens new paradigms. As we have high computing
power and a large amount of data we can use this data to help us
make data-driven decision making. The main benefits of data-driven
decisions are that they are made up by observing past trends which
have resulted in beneficial results.
In short, we can say that data analytics is the process of manipulating
data to extract useful trends and hidden patterns that can help us
derive valuable insights to make business predictions.
2.2.2Types of Data Analytics
There are four major types of data analytics:
1. Predictive (forecasting)
2. Descriptive (business intelligence and data mining)
3. Prescriptive (optimization and simulation)
4. Diagnostic analytics

Data Analytics and its Types


Predictive Analytics
Predictive analytics turn the data into valuable, actionable
information. predictive analytics uses data to determine the probable
outcome of an event or a likelihood of a situation occurring. Predictive
analytics holds a variety of statistical techniques from
modeling, machine learning, data mining, and game theory that
analyze current and historical facts to make predictions about a future
event. Techniques that are used for predictive analytics are:
 Linear Regression
 Time Series Analysis and Forecasting
 Data Mining
Basic Cornerstones of Predictive Analytics
 Predictive modeling
 Decision Analysis and optimization
 Transaction profiling
Descriptive Analytics
Descriptive analytics looks at data and analyze past event for insight
as to how to approach future events. It looks at past performance and
understands the performance by mining historical data to understand
the cause of success or failure in the past. Almost all management
reporting such as sales, marketing, operations, and finance uses this
type of analysis.
The descriptive model quantifies relationships in data in a way that is
often used to classify customers or prospects into groups. Unlike a
predictive model that focuses on predicting the behavior of a single
customer, Descriptive analytics identifies many different relationships
between customer and product.

Common examples of Descriptive analytics are company


reports that provide historic reviews like:
 Data Queries
 Reports
 Descriptive Statistics
 Data dashboard
Prescriptive Analytics
Prescriptive Analytics automatically synthesize big data,
mathematical science, business rule, and machine learning to make a
prediction and then suggests a decision option to take advantage of
the prediction.
Prescriptive analytics goes beyond predicting future outcomes by also
suggesting action benefits from the predictions and showing the
decision maker the implication of each decision option. Prescriptive
Analytics not only anticipates what will happen and when to happen
but also why it will happen. Further, Prescriptive Analytics can
suggest decision options on how to take advantage of a future
opportunity or mitigate a future risk and illustrate the implication of
each decision option.
For example, Prescriptive Analytics can benefit healthcare strategic
planning by using analytics to leverage operational and usage data
combined with data of external factors such as economic data,
population demography, etc.
Diagnostic Analytics
In this analysis, we generally use historical data over other data to
answer any question or for the solution of any problem. We try to find
any dependency and pattern in the historical data of the particular
problem.

For example, companies go for this analysis because it gives a great


insight into a problem, and they also keep detailed information about
their disposal otherwise data collection may turn out individual for
every problem and it will be very time-consuming. Common
techniques used for Diagnostic Analytics are:
 Data discovery
 Data mining
 Correlations

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy