Chapter 1 - Introduction
Chapter 1 - Introduction
INTRODUCTION:
With the popularity and ready adaption of machine learning by business organizations, it
has become a dominant technology trend now. Before starting the machine learning journey,
let us establish these terms – data, information, knowledge, intelligence, and wisdom. A
knowledge pyramid is shown in Figure 1.1.
1
MODULE 1
All facts are data. Data can be numbers or text that can be processed by a computer. Today,
organizations are accumulating vast and growing amounts of data with data sources such as flat
files, databases, or data warehouses in different storage formats.
Here comes the need for machine learning. The objective of machine learning is to process
these archival data for organizations to take better decisions to design new products, improve
the business processes, and to develop effective decision support systems.
The key to this definition is that the systems should learn by itself without explicit
programming. It is widely known that to perform a computation, one needs to write
programs that teach the computers how to do that computation.
2
MODULE 1
after converting the expert knowledge of many doctors into a system. However, this
approach did not progress much as programs lacked real intelligence. The word MYCIN is
derived from the fact that most of the antibiotics' names end with 'mycin'.
Often, the quality of data determines the quality of experience and, therefore, the quality of
the learning system. In statistical learning, the relationship between the input x and output
y is modelled as a function in the form y=f(x). Here, f is the learning function that maps the
input x to output y. Learning of function f is the crucial aspect of forming a model in
statistical learning. In machine learning, this is simply called mapping of input to output.
The learning program summarizes the raw data in a model. Formally stated, a model is an
explicit description of patterns within the data in the form of:
1. Mathematical equation
2. Relational diagrams like trees/graphs
3. Logical if/else rules, or
4. Groupings called clusters
In summary, a model can be a formula, procedure or representation that can generate data
decisions. The difference between pattern and model is that the former is local and
applicable only to certain attributes but the latter is global and fits the entire dataset. For
example, a model can be helpful to examine whether a given email is spam or not. The
point is that the model is generated automatically from the given data.
3
MODULE 1
Machine learning uses the concepts of Artificial Intelligence, Data Science, and Statistics
primarily. It is the resultant of combined ideas of diverse fields.
Machine learning is an important branch of Al, which is a much broader subject. The aim of AI is
to develop intelligent agents. An agent can be a robot, humans, or any autonomous systems.
Initially, the idea of Al was ambitious, that is, to develop intelligent systems like human beings.
The focus was on logic and logical inferences. It had seen many ups and downs. These down
periods were called Al winters.
The resurgence in Al happened due to development of data driven systems. The aim is to find
relations and regularities present in the data. Machine learning is the subbranch of Al, whose
aim is to extract the patterns for prediction. It is a broad field that includes learning from
examples and other areas like reinforcement learning. The relationship of Al and machine
learning is shown in Figure 1.3. The model can take an unknown instance and generate results.
Deep learning is a subbranch of machine learning. In deep learning, the models are
constructed using neural network technology. Neural networks are based on the human neuron
models. Many neurons form a network connected with the activation functions that trigger
further neurons to perform tasks.
4
MODULE 1
1.3.2 Machine Learning Data Science, Data Mining, and Data Analytics
Data science is an ‘Umbrella’ term that encompasses many fields. Machine learning starts with
data. Therefore, data science and machine learning are interlinked. Machine learning is a
branch of data science. Data science deals with gathering of data for analysis. It is a broad field
that includes:
Big Data: Data science concerns about collection of data. Big data is a field of data science that
deals with data's following characteristics:
1. Volume: Huge amount of data is generated by big companies like Facebook, Twitter,
YouTube.
2. Variety: Data is available in variety of forms like images, videos, and in different formats.
3. Velocity: It refers to the speed at which the data is generated and processed.
Big data is used by many machine learning algorithms for applications such as language
translation and image recognition. Big data influences the growth of subjects like Deep learning.
Deep learning is a branch of machine learning that deals with constructing models using neural
networks.
Data Mining Data mining's original genesis is in the business. Like while mining the earth one
gets into precious resources, it is often believed that unearthing of the data produces hidden
information that otherwise would have eluded the attention of the management. Nowadays,
many consider that data mining and machine learning are same. There is no difference between
these fields except that data mining aims to extract the hidden patterns that are present in the
data, whereas, machine learning aims to use it for prediction.
Data Analytics Another branch of data science is data analytics. It aims to extract useful
knowledge from crude data. There are different types of analytics. Predictive data analytics is
used for making predictions. Machine learning is closely related to this branch of analytics and
shares almost all algorithms.
Pattern Recognition It is an engineering field. It uses machine learning algorithms to extract the
features for pattern analysis and pattern classification. One can view pattern recognition as a
specific application of machine learning.
5
MODULE 1
Statistics requires knowledge of the statistical procedures and the guidance of a good
statistician. It is mathematics intensive and models are often complicated equations and
involve many assumptions. Statistical methods are developed in relation to the data
being analysed. In addition, statistical methods are coherent and rigorous. It has strong
theoretical foundations and interpretations that require a strong statistical knowledge.
Machine learning, comparatively, has less assumptions and requires less statistical
knowledge. But, it often requires interaction with various tools to automate the process
of learning.
Nevertheless, there is a school of thought that machine learning is just the latest version
of 'old Statistics' and hence this relationship should be recognized.
6
MODULE 1
Learning, like adaptation, occurs as the result of interaction of the program with its
environment. It can be compared with the interaction between a teacher and a student.
There are four types of machine learning as shown in Figure 1.5.
Labelled and Unlabelled Data: Data is a raw fact. Normally, data is represented in the form
of a table. Data also can be referred to as a data point, sample, or an example. Each row of
the table represents a data point. Features are attributes or characteristics of an object.
Normally, the columns of the table are attributes. Out of all attributes, one attribute is
important and is called a label. Label is the feature that we aim to predict. Thus, there are
two types of data - labelled and unlabelled. A
Labelled Data To illustrate labelled data, let us take one example dataset called Iris flower
dataset or Fisher's Iris dataset. The dataset has 50 samples of Iris with four attributes,
length and width of sepals and petals. The target variable is called class. There are three
classes - Iris setosa, Iris virginica, and Iris versicolor.
7
MODULE 1
(a)
(b)
Supervised algorithms use labelled dataset. As the name suggests, there is a supervisor
or teacher component in supervised learning. A supervisor provides labelled data so
that the model is constructed and generates test data.
8
MODULE 1
Classification
In classification, learning takes place in two stages. During the first stage, called training stage,
the learning algorithm takes a labelled dataset and starts learning. After the training set,
samples are processed and the model is generated. In the second stage, the constructed model
is tested with test or unknown sample and assigned a label. This is the classification process.
This is illustrated in the above Figure 1.7. Initially, the classification learning algorithm learns
with the collection of labelled data and constructs the model. Then, a test case is selected, and
the model assigns a label.
The classification models can be categorized based on the implementation technology like
decision trees, probabilistic methods, distance measures, and soft computing methods.
9
MODULE 1
Classification models can also be classified as generative models and discriminative models.
Generative models deal with the process of data generation and its distribution. Probabilistic
models are examples of generative models. Discriminative models do not care about the
generation of data. Instead, they simply concentrate on classifying the given data.
Decision Tree
Random Forest
Support Vector Machines
Naïve Bayes
Artificial Neural Network and Deep Learning networks like CNN
Regression Models
Regression models, unlike classification algorithms, predict continuous variables like price. In
other words, it is a number. A fitted regression model is shown in Figure 1.8 for a dataset that
represent weeks input x and product sales y.
The regression model takes input x and generates a model in the form of a fitted line of the
form y=f(x). Here, x is the independent variable that may be one or more attributes and y is the
dependent variable. In Figure 1.8, linear regression takes the training set and tries to fit it with a
10
MODULE 1
line – product sales = 0.66* Week + 0.54. Here, 0.66 and 0.54 are all regression coefficients that
are learnt from data. The advantage of this model is that prediction for product sales (y) can be
made for unknown week data (x). For example, the prediction for unknown eight week can be
made by substituting x as 8 in that regression formula to get y. Both regression and
classification models are supervised algorithms. Both have a supervisor and the concepts of
training and testing are applicable to both.
Cluster Analysis
11
MODULE 1
Dimensionality Reduction
The differences between supervised and unsupervised learning are listed in the following Table
1.2.
12
MODULE 1
There are circumstances where the dataset has a huge collection of unlabelled data and
some labelled data. Labelling is a costly process and difficult to perform by the humans.
Semi –supervised algorithms use unlabelled data by assigning a pseudo-label. Then, the
labelled and pseudo-labelled dataset can be combined.
Reinforcement learning mimics human beings. Like human beings use ears and eyes to
perceive the world and take actions, reinforcement learning allows the agent to interact
with the environment to get rewards. The agent can be human, animal, robot, or any
independent program. The rewards enable the agent to gain experience. The agent aims
to maximize the reward.
The reward can be positive or negative( Punishment). When the rewards are more,
the behaviour gets reinforced and learning becomes possible.
Consider the following example of a Grid game as shown in Figure 1.10.
In this grid game, the gray tile indicates the danger, black is a block, and the tile with
diagonal lines is the goal. The aim is to start, say from bottom-left grid, using the actions
left, right, top and bottom to reach the goal state.
To solve this sort of problem, there is no data. The agent interacts with the
environment to get experience. In the above case, the agent tries to create a model by
simulating many paths and finding rewarding paths. This experience helps in
constructing a model.
It can be said in summary, compared to supervised learning, there is no supervisor
or labelled dataset. Many sequential decisions need to be taken to reach the final
decision. Therefore, reinforcement algorithms are reward based , goal- oriented
algorithms.
13
MODULE 1
However, humans are better than computers in many aspects like recognition. But, deep
learning systems challenge human beings in this aspect as well. Machines can recognize
human faces in a second. Still, there are tasks where humans are better as machine learning
systems still require quality data for model construction. The quality of learning system
depends on the quality of data. This is a challenge. Some of the challenges are listed below:
1. Problems- Machine learning can deal with the ‘well-posed’ problems where
specifications are complete and available. Computers cannot solve ‘ill-posed’ problems.
2. Huge data- This is a primary requirement of machine learning. Availability of a quality
data is a challenge. A quality data means it should be large and should not have data
problems such as missing data or incorrect data.
3. High computation power- With the availability of Big Data, the computational resource
requirement has also increased. Systems with Graphics Processing Unit (GPU) or even
Tensor Processing Unit (TPU) are required to execute machine learning algorithms. Also,
machine learning tasks have become complex and hence time complexity has increased,
and that can be solved only with high computing power.
4. Complexity of the algorithms- The selection of algorithms, describing the algorithms,
application of algorithms to solve machine learning task, and comparison of algorithms
have become necessary for machine learning or data scientists now. Algorithms have
become a big topic of discussion and it is a challenge for machine learning professionals
to design, select, and evaluate optimal algorithms.
5. Bias/Variance- Variance is the error of the model. This leads to a problem called
bias/variance tradeoff. A model that fits the training data correctly but fails for test
data, in general lacks generalization, is called overfitting. The reverse problem is called
underfitting where the model fails for training data but has good generalization.
Overfitting and underefitting are great challenges for machine learning algorithms.
14
MODULE 1
1. Understanding the business This step involves understanding the objectives and
requirements of the business organization. Generally, a single data mining algorithm is
enough for giving the solution. This step also involves the formulation of the problem
statement for the data mining process.
2. Understanding the data - It involves the steps like data collection, study of the charac
teristics of the data, formulation of hypothesis, and matching of patterns to the selected
hypothesis.
3. Preparation of data - This step involves producing the final dataset by cleaning the
raw data and preparation of data for the data mining process. The missing values may
cause problems during both training and testing phases. Missing data forces classifiers
to produce inaccurate results. This is a perennial problem for the classification models.
Hence, suitable strategies should be adopted to handle the missing data.
4. Modelling-This step plays a role in the application of data mining algorithm for the
data to obtain a model or pattern.
15
MODULE 1
5. Evaluate - This step involves the evaluation of the data mining results using statistical
analysis and visualization methods. The performance of the classifier is determined by
evaluating the accuracy of the classifier. The process of classification is a fuzzy issue. For
example, classification of emails requires extensive domain knowledge and requires
domain experts. Hence, performance of the classifier is very crucial.
6. Deployment -This step involves the deployment of results of the data mining algorithm
to improve the existing process or for a new situation.
Machine Learning technologies are used widely now in different domains. Machine learning
applications are everywhere! One encounters many machine learning applications in the
day-to-day life. Some applications are listed below:
1. Sentiment analysis- This is an application of natural language processing (NLP) where
the words of documents are converted to sentiments like happy, sad, and angry
which are captured by emoticons effectively. For more reviews or product reviews,
five stars or one star are automatically attached using sentiment analysis programs.
2. Recommendation systems- These are systems that make personalized purchases
possible. For example, Amazon recommends users to find related books or books
bought by people who have the same taste like you, and Netflix suggests shows or
related movies of your taste. The recommendation systems are based on machine
learning.
3. Voice assistants- Products like Amazon Alexa, Microsoft Cortana, Apple Siri, and
Google Assistant are all examples of voice assistants. They take speech commands
and perform tasks. These chatbots are the result of machine learning technologies.
4. Technologies like Google Maps and those by user by User are all examples of
machine learning which offer to locate and navigate shortest paths to reduce time.
The machine learning applications are enormous. The following Table 1.4 summarizes
some of the machine learning applications.
16
MODULE 1
17