0% found this document useful (0 votes)
53 views189 pages

Material For Student CAIEC™ (V062021A) EN

The document outlines the learning objectives and structure of the Artificial Intelligence Expert Certificate (CAIEC™) program by CertiProf®, focusing on deep learning fundamentals, neural networks, and machine learning methodologies. It describes the collaborative network of CertiProf®, including roles such as Knowledge Ambassadors and Lifelong Learners, and emphasizes the importance of continuous learning in the digital age. The agenda includes various topics related to deep learning, machine learning projects, and practical implementations using tools like TensorFlow and Keras.

Uploaded by

b3t0app
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views189 pages

Material For Student CAIEC™ (V062021A) EN

The document outlines the learning objectives and structure of the Artificial Intelligence Expert Certificate (CAIEC™) program by CertiProf®, focusing on deep learning fundamentals, neural networks, and machine learning methodologies. It describes the collaborative network of CertiProf®, including roles such as Knowledge Ambassadors and Lifelong Learners, and emphasizes the importance of continuous learning in the digital age. The agenda includes various topics related to deep learning, machine learning projects, and practical implementations using tools like TensorFlow and Keras.

Uploaded by

b3t0app
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 189

Learning Objectives

• Understand the fundamental keys of the deep learning approach


• Master the theoretical and practical bases of architecture and convergence of neural networks
• Depicting different existing fundamental architectures and mastering fundamental implementations
• Master the methodologies for setting up neural networks, the strengths and limitations of existing
tools and libraries (pandas, numpy, scikit-learn

Who is CertiProf®?
CertiProf® is an Examination Institute founded in Unites States in 2015.Located in Sunrise, Florida.

Our philosophy is based on the creation of knowledge in community and for this purpose its
collaborative network is made up of:

• CKA's (CertiProf Knowledge Ambassadors), are influential people in their fields of expertise or
mastery, coaches, trainers, consultants, bloggers, community builders, organizers and evangelists,
who are willing to contribute in the improvement of content
• CLL's (CertiProf Lifelong Learners), Certification candidates are identified as Continuing Learner
proven their unwavering commitment to lifelong learning, which is vitally important in today's ever-
changing and expanding digitalized world. Regardless of whether they win or fail the exam
• ATP's (Accredited Trainer Partners), Universities, training centers and facilitators around the world
that make up the partner network
• Authors (co-creators). Industry experts or practitioners who, with their knowledge, develop content
for the creation of new certifications that respond to the needs of the industry
• Internal Staff, our distributed team with operations in India, Brazil, Colombia and the United States
that support day by day the execution of the purpose of CertiProf®
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Who should attend to this certification?


• Engineers, Analysts, Marketing Managers
• Data Analysts, Data Scientists, Data Steward
• Anyone interested in Data Mining and Machine Learning techniques

2
Presentation
Welcome!
Report in the following format:
• Name
• Company
• Job title and experience
• Expectations of this course

Badge

https://www.credly.com/org/certiprof/badge/artificial-intelligence-expert-certificate-caiec

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

3
Lifelong Learning

The holders of this badge have demonstrated


their unwavering commitment to lifelong
learning, which is vitally important in today's ever-
changing and expanding digitized world. It also
identifies the qualities of an open, disciplined and
constantly evolving mind, capable of using and
contributing its knowledge to the development
of a more equal and better world.

Earning criteria:
• Be a candidate for CertiProf® certification
• Be a continuous and focused learner
• Identify with the concept of lifelong learning
• Believe and genuinely identify with the
concept that knowledge and education can
and should change the world
• Wanting to enhance your professional growth
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

4
5
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
Agenda
I. Deep Learning Fundamentals 7
I.1 Representing Neural Networks 8
I.2 Nonlinear Activation Functions 19
I.3 Hidden Layers 25
I.4 Guided Project: Building A Handwritten Digits Classifier 37
II. Machine Learning Project 41
II.1 Machine Learning Project Walkthrough: Data Cleaning 42
II.2 Machine Learning Project Walkthrough: Preparing the Features 50
II.3 Machine Learning Project Walkthrough: Making Predictions 57
Key Points 70
III. Kaggle Fundamentals 72
Kaggle Fundamentals 73
III.1 Getting Started with Kaggle 73
III.2 Feature Preparation, Selection and Engineering 89
III.3 Model Selection and Tuning 106
III.4 Guided Project: Creating a Kaggle Workflow 119
IV. TensorFlow Concepts 129
IV.1 Presentation of TensorFlow 130
IV.2 TensorFlow Basics 138
IV.3 Classification of Neural Network in TensorFlow 155
IV.4 Linear Regression in TensorFlow 160
V. Keras Basis 165
V. Keras Basis 166
V.1 Kears Layers 166
V.2 Deep Learning with Keras Implementation and Example 178
V.3 Keras Vs Tensorflow – Difference Between Keras and Tensorflow 183
VI. References 186
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

6
I. Deep Learning
Fundamentals

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

I. Deep Learning Fundamentals

7
I.1 Representing Neural Networks
I.1.1 Nonlinear models

The inspiration for artificial neural networks (or "neural networks" for short) comes partially from biological
neural networks. The cells in most brains (including ours) connect and work together. We call each of these
cells in a neural network a neuron. Neurons in human brains communicate by exchanging electrical signals.

Neural network models draw inspiration from the structure of neurons in our brains — and the way they pass
messages. However, the similarities between biological neural networks and artificial neural networks end
here.

A deep neural network is a specific type of neural network that excels at capturing nonlinear relationships in
data. Deep neural networks have surpassed many benchmarks in audio and image classification. Previously,
linear models were often used with nonlinear transformations discovered meticulously by hand.

In this lesson, we'll explore deep neural networks. Here are a few takeaways you can expect by the end of
this lesson:
• How to represent neural networks visually
• How to implement linear and logistic regression as neural networks
• The differences between the nonlinear activation functions

To get the most out of this lesson, you'll need to know the NumPy, sklearn, and pandas libraries. You'll also
need to be comfortable programming in Python. We'll rely on statistics, calculus, and linear algebra. If you
understand the traditional machine learning workflow, as well as linear and logistic regression models.
I.1.2 Graphs

We usually represent neural networks as graphs. A graph is a data structure that consists
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

of nodes (represented as circles) connected by edges (represented as lines).

8
We commonly use graphs to represent relations or links between components of a system. For example,
the Facebook Social Graph describes the interconnection of all the users on Facebook (and this graph
changes constantly as users add and remove friends). Google Maps uses graphs to represent locations
in the physical world as nodes and roads as edges.

Graphs are a highly flexible data structure; you can even represent a list of values as a graph. We often
categorize graphs by their properties, which act as constraints. You can read about the many different
ways to categorize graphs on Wikipedia.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


One way to categorize graphs is the presence of edge direction. Within directed graphs, graphs are
either cyclic or acyclic.

9
I.1.3 Computational Graphs

Graphs provide a mental model for thinking about a specific class of models — those that consist of
a series of functions executed in a specific order. In the context of neural networks, graphs help us
express the execution of a pipeline of functions in succession.

This pipeline has 2 stages of functions that happen in sequence:


• In the first stage, L1 is computed:

• In the second stage, L2 is computed:

The second stage can't happen without the first stage because L1 is an input to the second stage. The
successive computation of functions is at the heart of neural network models. This is a computational
graph. A computational graph uses nodes to describe variables and edges to describe the combination
of variables. Here's a simple example:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The computational graph is a powerful representation because it allows us to represent models with
many layers of nesting. In fact, a decision tree is really a specific type of computational graph. There's
no compact way to express a decision tree model using only equations and standard algebraic notation.

10
To better understand this representation, we'll represent a linear regression model using neural network
notation. This will help you learn this unique representation, and it will allow us to explore some of the
neural network terminology.
I.1.4 Neural network vs linear regression

ŷ Here is a representation of a linear regression model:

• ŷ â 0 represents the intercept (also known as the bias)


• ŷ a1 to an represent the trained model weights
• ŷ x1 to xn represent the features
• ŷ̂ represents the predicted value

The first step is to rewrite this model using linear algebra notation, as a product of two vectors.

Here's an example of this model:

Neural Network Representation

In the neural network representation of this model, we see the following:


• An input neuron represents each feature column in a dataset
• Each weight value is represented as an arrow from the feature column it multiples to the output

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


neuron

The neurons and arrows represent the weighted sum, which is the combination of the feature columns
and weights.

Inspired by biological neural networks, an activation function determines if the neuron fires or not. In
a neural network model, the activation function transforms the weighted sum of the input values. For
this network, the activation function is the identity function. The identity function returns the same
value that was passed in the following: f(x)=x

11
While the activation function isn't interesting for a network that performs linear regression, it's useful
for logistic regression and more complex networks. Here's a comparison of both representations of
the same linear regression model:

I.1.5 Manipulation of regression Data

We'll begin working with data that we'll generate ourselves instead of an external dataset. Generating
data ourselves gives us more control over the properties of the dataset (e.g., like the number of features,
observations, and the noise in the features). Datasets we create where neural networks excel contain
the same non-linearity as real-world datasets, so we can apply what we learn here.

Scikit-learn contains the following convenience functions for generating data:


• sklearn.datasets.make_regression()
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• sklearn.datasets.make_classification()
• sklearn.datasets.make_moons()

The following code generates a regression data set with 3 features, 1000 observations, and a random
seed of 1:

12
The function make_regression() returns a tuple of two NumPy objects.

The features are in the first NumPy array, and the labels are in the second NumPy array:

We can then use the pandas.DataFrame() constructor to create DataFrames:

Let's generate some data for the network we're building.

Instructions

1. Generate a dataset for regression that includes the following:


• Exactly 100 observations
• Exactly 3 features
• The random seed 1
2. Convert the NumPy array of generated features into a pandas DataFrame, and assign to features

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


3. Convert the NumPy array of generated labels into a pandas series and assign to labels

Solutions

I.1.6 Fitting a linear regression Neural Network

Because the inputs from one layer of neurons feed into the next layer of the single output neuron,
we call this network a feedforward network. In the language of graphs, a feedforward network is
a directed, acyclic graph.

13
Fitting A Network

Here are two different approaches to training a linear regression model:


• Gradient descent
• Ordinary least squares

Gradient descent is the most common technique for fitting neural network models. We'll rely on the
scikit-learn implementation of gradient descent in this lesson.

This implementation is in the SGDRegressor class. We use it the same way we do with
the LinearRegression class:

We now have everything we need to implement this network. Because we're focusing on building
intuition, we'll be training and testing on the same data set. In real-life scenarios, you always want to
use a cross validation technique of some kind.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Instructions

1. Add a column named bias containing the value 1 for each row to the features DataFrame
2. Import SGDRegressor from sklearn.linear_model
3. Define two functions:
• train(features, labels): takes in the features DataFrame and labels series and performs model
fitting
• Use the SGDRegressor class from scikit-learn to handle model fitting
• This function should return only a NumPy 1D array of weights for the linear regression model
• feedforward(features, weights): takes in the features DataFrame and the weights NumPy array
• Perform matrix multiplication between features (100 rows by 4 columns) and weights (4 rows by
1 column) and assign the result to predictions
• Return predictions. We'll skip implementing the identity function since it simply returns the
same value that was passed in
4. Uncomment the code we have added for you and run the train() and feedforward() functions. The
final predictions will be in linear_predictions

14
Solutions

I.2.7 Generating Classification Data

To generate a dataset friendly for classification, we can use the make_classification() function from
scikit-learn.

The following code generates a classification data set with 4 features, 1000 observations, and a
random seed of 1:

The function make_classification() returns a tuple of two NumPy objects.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


As with the data generated from make_regression(), the features are in the first NumPy array, and the
labels are in the second NumPy array:

We can then use the pandas.DataFrame() constructor to create DataFrames:

Let's generate some classification data for the network we're building.

15
Instructions

1. Generate a dataset for classification that includes the following:


• Exactly 100 observations
• Exactly 4 features
• The random seed 1
2. Convert the NumPy array of generated features into a pandas DataFrame and assign to class_
features
3. Convert the NumPy array of generated labels into a pandas series, and assign to class_labels

Solutions

I.2.8 Implementing Neural Network for classification

On the previous few screens, we replicated linear regression as a feedforward neural network model
and learned about nonlinear activation functions. We now have a better idea of what defines a neural
network. So far, we know that neural networks need the following:

• A network structure (How do the nodes connect? In which direction does the data and computation
flow?)
• A feedforward function (How are the node weights and observation values combined?)
• An activation function (What transformations are performed on the data?)
• A model fitting function (How does the model fit?)

Now, we'll explore how to build a neural network that replicates a logistic regression model. We'll start
with a quick recap.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Binary Classification and Logistic Regression

In binary classification, we want to find a model that can differentiate between two categorical values
(usually 0 and 1). The values 0 and 1 don't have any numerical weight and instead act as numerical
placeholders for the two categories. We can try to learn the probability that a given observation
belongs in either category.

In the language of conditional probability, we're interested in the probability that a given
observation x belongs to each category:
P(y=0|x)=0.3P(y=1|x)=0.7

16
Because the universe of possibilities only consists of these two categories, the probabilities for both
must add up to 1. This lets us simplify what we want a binary classification model to learn:

P(y=1|x)=?
If P(y=1|x)>0.5, we want the model to assign it to category 1.
If P(y=1|x)<0.5, we want the model to assign it to category 0.

Implementing A Logistic Regression Model

A logistic regression model consists of two main components:


• Computing the weighted linear combination of weights and features (as with a linear regression
model):
• Applying a transformation function to squash the result so it varies between 0 and 1:

Combining these two steps yields the following definition of a logistic regression model:
Neural networks literature usually refers to this function as the sigmoid function:

Here's a plot of the sigmoid function:


You'll notice that the sigmoid function has
horizontal asymptotes at 0 and 1, which means
any input value will always output a value
between 0 and 1.

To implement a network that performs


classification, the only thing we need to
change from the linear regression network we
implemented is the activation function. Instead
of using the identity function, we need to use the

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


sigmoid function.
Here's a diagram of this network:

17
Instructions

1. Add a column named bias containing the value 1 for each row to the class_features DataFrame.
2. Define three functions:
• log_train(class_features, class_labels): takes in the class_features DataFrame and class_labels series
and performs model fitting
• Use the SGDClassifier class from scikit-learn to handle model fitting
• This function should return a NumPy 2D array of weights for the logistic regression model
• sigmoid(linear_combination): takes in a NumPy 2D array and applies the sigmoid function for every
value: 11+e−x
• log_feedforward(class_features, log_train_weights): takes in the class_features DataFrame and
the log_train_weights NumPy array
• Perform matrix multiplication between class_features (100 rows by 5 columns) and log_train_
weights (1 row by 5 columns) transposed, and assign to linear_combination
• Use the sigmoid() function to transform linear_combinations and assign the result to log_predictions
• Convert each value in log_predictions to a class label:
• If the value is greater than or equal to 0.5, overwrite the value to 1
• If the value is less than 0.5, overwrite the value to 0
• Return log_predictions
3. Uncomment the code we have added for you and run the log_train() and log_feedforward() functions.
The final predictions will be in log_predictions

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

18
I.2 Nonlinear Activation Functions
In the last mission, we became familiar with computational graphs and how neural network models are
represented. We also became familiar with neural network terminology like:

• forward pass
• input neurons
• output neurons

In this mission, we'll dive deeper into the role nonlinear activation functions play. To help motivate our
exploration, let's start by reflecting on the purpose of a machine learning model.

The purpose of a machine learning model is to transform training data inputs to the model (which are
features) to approximate the training output values. We accomplish this by:

• Selecting a specific model to use


• Finding the right parameters for this model that work the best
• Testing the model to understand how well it generalizes to new data

Linear Regression

We use linear regression when we think that the output values can be best approximated by a linear
combination of the features and the learned weights. This model is a linear system, because any change
in the output value is proportional to the changes in the input values.

When the target values y can be approximated by a linear combination of the features x1 to xn,
linear regression is the ideal choice. Here's a GIF that visualizes the potential expressability of a linear
regression model (by conceptually mimicking what gradient descent does.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Let's now look at a situation where the output values can't be approximated effectively using a linear
combination of the input values.

19
Logistic Regression

In a binary classification problem, the target values are 0 and 1 and the relationship between the
features and the target values is nonlinear. This means we need a function that can perform a nonlinear
transformation of the input features.

The sigmoid function is a good choice since all of its input values are squashed to range between 0 and 1.

Adding the sigmoid transformation helps the model approximate this nonlinear relationship underlying
common binary classification tasks. The following GIF shows how the shape of the logistic regression
model changes as we increase the single weight (by conceptually mimicking what gradient descent
does):

Neural Networks
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Logistic regression models learn a set of weights that impact the linear combination phase and then
are fed through a single nonlinear function (the sigmoid function). In this mission, we'll dive into the
most commonly used activation functions. The three most commonly used activation functions in
neural networks are:

• the sigmoid function


• the ReLU function
• the tanh function

Since we've covered the sigmoid function already, we'll focus on the latter two functions.

20
1.2.1 ReLU Activation Function

We'll start by introducing the ReLU activation function, which is a commonly used activation function
in neural networks for solving regression problems. ReLU stands for rectified linear unit and is defined
as follows:
ReLU(x)=max(0,x)

The max(0,x) function call returns the maximum value between 0 and x. This means that:
• When x is less than 0, the value 0 is returned
• When x is greater than 0, the value x is returned

Here's a plot of the function:

The ReLU function returns the positive component of the input value. Let's visualize the expressivity
of a model that performs a linear combination of the features and weights followed by the ReLU
transformation:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

21
There are a few different ways we can implement the ReLU function in code. We'll leave it as an
exercise for you to implement.

Instructions

• Define the relu() function


• This function should be able to work with a single value or a list of values
• Call the relu() function, pass in x, and assign the returned value to relu_y
• Print both x and relu_y
• Generate a line chart with x on the x-axis and relu_y on the y-axis

Solutions

1.2.2 Trigonometric Activation

The last commonly used activation function in neural networks we'll discuss is the tanh function (also
known as the hyperbolic tangent function). We'll start by reviewing some trigonometry by discussing
the tan (short for tangent) function and then work our way up to the tanh function (in the next screen).
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

While we won't provide the depth here needed to learn trigonometry from scratch, we do recommend
the Trigonometry Series on Khan Academy if you're new to trigonometry.

What is trigonometry?

Trigonometry is short for triangle geometry and provides formulas, frameworks, and mental models for
reasoning about triangles. Triangles are used extensively in theoretical and applied mathematics, and
build on mathematical work done over many centuries. Let's start by clearly defining what a triangle is.

22
A triangle is a polygon that has the following properties:
• 3 edges
• 3 vértices
• angles between edges add up to 180 degrees

Two main ways that triangles can be classified is by the internal angles or by the edge lengths. The
following diagram outlines the three different types of triangles by their edge length properties:

An important triangle that's classified by the internal angles is the right angle triangle. In a right angle
triangle, one of the angles is 90 degrees (also known as the right angle). The edge opposite of the right
angle is called the hypotenuse.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


A trigonometric function is a function that inputs Let's define these terms further:
an angle value (usually represented as θ) and • Hypotenuse describes the line that isn't
outputs some value. These functions compute touching the right angle
ratios between the edge lengths. Here are the • Opposite refers to the line opposite the angle
first 3 trigonometric functions: • Adjacent refers to the line touching the angle
that isn't the hypotenuse

23
Here's an example of the tangent function.
Instructions

• Use the numpy.tan() function to compute


the tangent of the values in x and assign the
returned value to tan_y
• Print both x and tan_y
• Generate a line plot with x on the x-axis
and tan_y on the y-axis

Solutions

1.2.3 Reflecting on the Tangent Function

The tangent function from the last screen generated the following plot:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

24
The periodic sharp spikes that you see in the plot are known as vertical asymptotes. At those points,
the value isn't defined but the limit approaches either negative or positive infinity (depending on
which direction you're approaching the x value from).

The key takeaway from the plot is how the tangent function is a repeating, periodic function. A periodic
function is one that returns the same value at regular intervals. Let's look at a table of some values
from the tangent function:

The tangent function repeats itself every π, which is known as the period. The tangent function isn't
known to be used as an activation function in neural networks (or any machine learning model really)
because the periodic nature isn't a pattern that's found in real datasets.

While there have been some experiments with periodic functions as the activation function for neural
networks, the general conclusion has been that period functions like tangent don't offer any unique
benefits for modeling.

Generally speaking, the activation functions that are used in neural networks are increasing functions.
An increasing function f is a function where f(x) always stays the same or increases as x increases.

All of the activation functions we've looked at (and will look at) in this mission meet this criteria.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


I.3 Hidden Layers
In the last 2 missions, we worked with single layer
neural networks. These networks had a single
layer of neurons. To make a prediction, a single
layer of neurons in these networks directly fed
their results into the output neuron(s).

In this mission, we'll explore how multi-layer


networks (also known as deep neural networks)
are able to better capture nonlinearity in the data.
In a deep neural network, the first layer of input
neurons feeds into a second, intermediate layer
of neurons. Here's a diagram representing this
architecture:

25
We included both of the functions that are used to
compute each hidden neuron and output neuron
to help clear up any confusion. You'll notice that
the number of neurons in the second layer was
more than those in the input layer. Choosing the
number of neurons in this layer is a bit of an art
form and not quite a science yet in neural network
literature. We can actually add more intermediate
layers, and this often leads to improved model
accuracy (because of an increased capability in
learning nonlinearity).

The intermediate layers are known as hidden layers, because they aren't directly represented in the
input data or the output predictions. Instead, we can think of each hidden layer as intermediate features
that are learned during the training process. Comparison With Decision Tree Models.

This is actually very similar to how decision trees are structured. The branches and splits represent
some intermediate features that are useful for making predictions and are analogous to the hidden
layers in a neural network:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

26
Each of these hidden layers has its own set of weights and biases, which are discovered during the
training process. In decision tree models, the intermediate features in the model represented something
more concrete we can understand (feature ranges).

Decision tree models are referred to as white box models because they can be observed and understood
but not easily altered. After we train a decision tree model, we can visualize the tree, interpret it, and
have new ideas for tweaking the model. Neural networks, on the other hand, are much closer to being
a black box. In a black box model, we can understand the inputs and the outputs but the intermediate
features are actually difficult to interpret and understand. Even harder and perhaps more importantly,
it's difficult to understand how to tweak a neural network based on these intermediate features.

In this mission, we'll learn how adding more layers to a network and adding more neurons in the
hidden layers can improve the model's ability to learn more complex relationships.

I.3.1 Generating Data with nonlinearity function

To generate data with nonlinearity in the features (both between the features and between the features
and the target column), we can use the make_moons() function from scikit-learn:

By default, make_moons() will generate 100 rows of data with 2 features. Here's a plot that visualizes
one feature against the other:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

27
To make things interesting, let's add some Just like in a previous mission, we can separate
Gaussian noise to the data. Gaussian noise is a the resulting NumPy object into 2 pandas
kind of statistical noise that follows the Gaussian dataframes:
distribution, and it's a common way to try to
recreate the noise that's often found in real world
data.

We can use the noise parameter to specify the


standard deviation of the Gaussian noise we want
added to the data. Let's also set the random_
state to 3 so the generated data can be recreated:

Instructions

• Use the make_moons() function to generate data with nonlinearity:


• Generate 100 values
• Set the random seed to 3
• Set the noise parameter to 0.04
• Convert the NumPy array of generated features into a pandas dataframe and assign to features
• Convert the NumPy array of generated labels into a pandas series and assign to labels
• Generate a 3d scatter plot of the data:
• Create a matplotlib figure object and set to figsize to (8,8)
• Create and attach single axes object to this figure using the 3d projection: ax = fig.add_
subplot(111, projection='3d’)
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Generate a 3d scatter plot with the first column from features on the x-axis, the second column
from features on the y-axis and labels on the z-axis
• Set the labels 'x1', 'x2' and 'y', respectively

Solutions

28
I.3.2 Hidden Layer with single neuron

In the last mission, we learned how adding a nonlinear activation function expanded the range of
patterns that a model could try to learn. The following GIF demonstrates how adding the sigmoid
function enables a logistic regression model to capture nonlinearity more effectively:

We can think of a logistic regression model as a neural network with an activation function but no
hidden layers. To make predictions, a linear combination of the features and weights is performed
followed by a single sigmoid transformation.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

29
To improve the expressive power, we can add a hidden layer of neurons in between the input layer and
the output layer. Here's an example where we've added a single hidden layer with a single neuron in
between the input layer and the output layer:

This network contains two sets of weights that are learned during the training phase:

• 4 weights between the input layer and the hidden layer


• 1 weight between the hidden layer and the output layer

In the next screen, we'll learn how to train a neural network with a hidden layer using scikit-learn. We'll
compare this model with a logistic regression model.

I.3.3 Training A Neural Network Using Scikit-learn


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Scikit-learn contains two classes for working with neural networks:

• MLPClassifier
• MLPRegressor

Let's focus on the MLPClassifier class. As with all of the model classes in scikit-learn, MLPClassifier follows
the standard model.fit() and model.predict() pattern:

30
We can specify the number of hidden neurons we want to use in each layer using the hidden_layer_
sizes parameter. This parameter accepts a tuple where the index value corresponds to the number of
neurons in that hidden layer. The parameter is set to the tuple (100,) by default, which corresponds to
a hundred neurons in a single hidden layer. The following code specifies a hidden layer of six neurons:

We can specify the activation function we want used in all layers using the activation parameter. This
parameter accepts only the following string values:

• 'identity': the identity function


• 'logistic': the sigmoid function
• 'tanh': the hyperbolic tangent (tanh) function
• 'relu': the ReLU function

Here's a model instantiated with the sigmoid activation function:

While scikit-learn is friendly to use when learning new concepts, it has a few limitations when it comes
to working with neural networks in production.
• At the time of writing, scikit-learn only supports using the same activation function for all layers
• Scikit-learn also struggles to scale to larger datasets
• Libraries like Theano and TensorFlow support offloading some computation to the GPU to
overcome bottlenecks

Instructions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• Train two different models using scikit-learn on the training set:
• A standard logistic regression model
• A neural network with:
• A single hidden layer
• A single neuron in the hidden layer
• The sigmoid activation function
• Make and assign predictions (for answer checking purposes the order should be respected):
• Make predictions on the test set using the neural network model and assign to nn_predictions
• Make predictions on the test set using the logistic regression model and assign to log_
predictions
• Compute the accuracy score for log_predictions and assign to log_accuracy
• Compute the accuracy score for nn_predictions and assign to nn_accuracy
• Print both log_accuracy and nn_accuracy

31
Solutions

I.3.4 Hidden Layer with Multiple Neurons


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

In the last screen, we trained a logistic regression model and a neural network model with a hidden
layer containing a single neuron. While we don't recommend using the accuracy scores to benchmark
classification models in a production setting, they can be helpful when we're learning and experimenting
because they are easy to understand.

The logistic regression model performed much better (accuracy of 88%) compared to the neural
network model with one hidden layer and one neuron (48%). This network architecture doesn't
give the model much ability to capture nonlinearity in the data unfortunately, which is why logistic
regression performed much better.

Let's take a look at a network with a single hidden layer of multiple neurons:

32
This network has 3 input neurons, 6 neurons in the single hidden layer, and 1 output neuron. You'll
notice that there's an arrow between every input neuron and every hidden neuron (3 x 6 = 18
connections), representing a weight that needs to be learned during the training process. You'll notice
that there's also a weight that needs to be learned between every hidden neuron and the final output
neuron (6 x 1 = 6 connections).

Because every neuron has a connection between itself and all of the neurons in the next layer, this
is known as a fully connected network. Lastly, because the computation flows from left (input layer)
to right (hidden layer then to output layer), we can call this network a fully connected, feedforward
network.

There are two weight matrices (a1 and a2) that need to be learned during the training process, one for
each stage of the computation. Let's look at the linear algebra representation of this network.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

33
While we've discussed different architectures in this course, a deep neural network boils down to a
series of matrix multiplications paired with nonlinear transformations! These are the key ideas that
underlie all neural network architectures. Take a look at this conceptual diagram from the Asimov
Institute that demonstrates a variety of neural network architectures:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

34
Instructions

• Create the following list of neuron counts and assign to neurons: [1, 5, 10, 15, 20, 25]
• Create an empty list named accuracies
• For each value in neurons:
• Train a neural network:
• With the number of neurons in the hidden layer set to the current value
• Using the sigmoid activation function on the training set
• Make predictions on the test set and compute the accuracy value
• Append the accuracy value to accuracies
• Print accuracies

Solutions

I.3.5 Multiple Hidden Layer

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


It seems like the test set prediction accuracy improved to 0.86 when using ten or fifteen neurons
in the hidden layer. As we increased the number of neurons in the hidden layer, the accuracy vastly
improved between the models:

Next, we can observe the effect of increasing the number of hidden layers on the overall accuracy of
the network. Here's a diagram representing a neural network with six neurons in the first hidden layer
and four neurons in the second hidden layer:

35
To determine the number of weights between the layers, multiply the number of neurons between
those two layers. Remember that these weights will be represented as weight matrices.

To specify the number of hidden layers and the number of neurons in each hidden layer, we change
the tuple we pass in to the hidden_layer_sizes parameter:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The number of hidden layers and number of neurons in each hidden layer are hyperparameters that act
as knobs for the model behavior. Hyperparameter optimization for neural networks is unfortunately
outside the scope of this course, as it requires a stronger mathematical foundation which we plan to
provide in future courses.

Let's train the following neural network models:

• Model with two hidden layers, each with one neuron


• Model with two hidden layers, each with five neurons
• Model with two hidden layers, each with ten neurons
• Model with two hidden layers, each with fifteen neurons
• Model with two hidden layers, each with twenty neurons
• Model with two hidden layers, each with twenty five neurons

Let's also switch the activation function used in the hidden layers to the ReLU function.

36
Neural networks often tend to take a long time to converge during the training process and many
libraries have default values for the number of iterations of gradient descent to run. We can increase
the number of iterations of gradient descent that's performed during the training process by modifying
the max_iter parameter, which is set to 200 by default.

Instructions

• Create the following list of neuron counts and assign to neurons: [1, 5, 10, 15, 20, 25]
• Create an empty list named nn_accuracies
• For each value in neurons:
• Train a neural network:
• With two hidden layers, each containing the same number of neurons (the current value
in neurons)
• Using the relu activation function
• Using 1000 iterations of gradient descent
• On the training set
• Make predictions on the test set and compute the accuracy value
• Append the accuracy value to nn_accuracies
• Print nn_accuracies

Solutions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


I.4 Guided Project: Building A Handwritten Digits Classifier
In the last mission, we learned how adding hidden layers of neurons to a neural network can improve
its ability to capture nonlinearity in the data. We tested different neural networks models on a dataset
that we generated with deliberate nonlinearity.

In this Guided Project, we’ll:


• Explore why image classification is a hard task
• Observe the limitations of traditional machine learning models for image classification
• Train, test, and improve a few different deep neural networks for image classification

37
As we mentioned in the first mission in this course, deep neural networks have been used to reach state-
of-the-art performance on image classification tasks in the last decade. For some image classification
tasks, deep neural networks actually perform as well as or slightly better than the human benchmark.
You can read about the history of deep neural networks here.

To end this course, we'll build models that can classify handwritten digits. Before the year 2000,
institutions like the United States Post Office used handwriting recognition software to read addresses,
zip codes, and more. One of their approaches, which consists of pre-processing handwritten images
then feeding to a neural network model is detailed in this paper.

Why is image classification a hard task?

Within the field of machine learning and pattern recognition, image classification (especially for
handwritten text) is towards the difficult end of the spectrum. There are a few reasons for this.

First, each image in a training set is high dimensional. Each pixel in an image is a feature and a separate
column. This means that a 128 x 128 image has 16384 features.

Second, images are often downsampled to lower resolutions and transformed to grayscale (no color).
This is a limitation of compute power unfortunately. The resolution of a 8 megapixel photo has 3264 by
2448 pixels, for a total of 7,990,272 features (or about 8 million). Images of this resolution are usually
scaled down to between 128 and 512 pixels in either direction for significantly faster processing. This
often results in a loss of detail that's available for training and pattern matching.

Third, the features in an image don't have an obvious linear or nonlinear relationship that can be
learned with a model like linear or logistic regression. In grayscale, each pixel is just represented as a
brightness value ranging from 0 to 256.

Here's an example of how an image is represented across the different abstractions we care about:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

38
Why is deep learning effective in image
classification?

Deep learning is effective in image classification


because of the models' ability to learn hierarchical
representations. At a high level, an effective
deep learning model learns intermediate
representations at each layer in the model and uses
them in the prediction process. Here's a diagram
that visualizes what the weights represent at each
layer of a convolutional neural network, a type of
network that's often used in image classification
and unfortunately out of scope for this course,
which was trained to identify faces.
You'll notice in the first hidden layer the network learned to represent edges and specific features of
faces. In the second hidden layer, the weights seemed to represent higher level facial features like eyes
and noses. Finally, the weights in the last hidden layer resemble faces that could be matched against.
Each successive layer uses weights from previous layers to try to learn more complex representations.

In this Guided Project, we'll explore the effectiveness of deep, feedforward neural networks at
classifying images.

Scikit-learn contains a number of datasets pre-loaded with the library, within the namespace of sklearn.
datasets. The load_digits() function returns a copy of the hand-written digits dataset from UCI.

Because dataframes are a tabular representation of data, each image is represented as a row of pixel
values. To visualize an image from the dataframe, we need to reshape the image back to its original
dimensions (28 x 28 pixels). To visualize the image, we need to reshape these pixel values back into the

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


28 by 28 and plot them on a coordinate grid.

To reshape the image, we need to convert a training example to a numpy array (excluding
the label column) and pass the result into that into the numpy.reshape() function:

Now that the data is in the right shape, we can visualize it using pyplot.imshow() function:

39
To display multiple images in one matplotlib figure, we can use the equivalent axes.imshow() function.
Let's use what we've learned to display images from both classes.

Instructions

• Import load_digits() from the sklearn.datasets package


• Transform the NumPy 2D array into a pandas dataframe
• Use matplotlib to visualize some of the images in the dataset
• Generate a scatter plot grid, with 2 rows and 4 columns
• In the first row:
• Display the images corresponding to rows 0, 100, 200, and 300
• In the second row:
• Display the images corresponding to rows 1000, 1100, 1200, and 1300
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

40
II.Machine
Learning Project

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

II. Machine Learning Project

41
II.1 Machine Learning Project Walkthrough: Data Cleaning
In this course, we will go through the full data science life cycle, from data cleaning and feature
selection to machine learning. We will focus on credit modelling, a well-known data science problem
that focuses on modeling a borrower's credit risk. Credit has played a key role in the economy for
centuries and some form of credit has existed since the beginning of commerce. We'll be working
with financial lending data from Lending Club. Lending Club is a marketplace for personal loans that
matches borrowers who are seeking a loan with investors looking to lend money and make a return.
You can read more about their marketplace here.

Each borrower completes a comprehensive application, providing their past financial history, the
reason for the loan, and more. Lending Club evaluates each borrower's credit score using past historical
data and their own data science process to assign an interest rate to the borrower. The interest rate
is the percent in addition to the requested loan amount the borrower has to pay back. You can read
more about the interest rate that Lending Club assigns here. Lending Club also tries to verify all the
information the borrower provides but it can't verify all of the information (usually for regulation
reasons).

A higher interest rate means that the borrower is a risk and more unlikely to pay back the loan. While
a lower interest rate means that the borrower has a good credit history and is more likely to pay
back the loan. The interest rates range from 5.32% all the way to 30.99% and each borrower is given
a grade according to the interest rate they were assigned. If the borrower accepts the interest rate,
then the loan is listed on the Lending Club marketplace.

Investors are primarily interested in receiving a return on their investments. Approved loans are
listed on the Lending Club website, where qualified investors can browse recently approved loans,
the borrower's credit score, the purpose for the loan, and other information from the application.
Once they’re ready to back a loan, they select the amount of money they want to fund. Once a
loan's requested amount is fully funded, the borrower receives the money they requested minus
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

the origination fee that Lending Club charges.


The borrower will make monthly payments back
to Lending Club either over 36 months or over
60 months. Lending Club redistributes these
payments to the investors. This means that
investors don't have to wait until the full amount
is paid off before they see a return in money. If a
loan is fully paid off on time, the investors make a
return which corresponds to the interest rate the
borrower had to pay in addition to the requested
amount. Many loans aren't completely paid off on
time and some borrowers default on the loan.

Here's a diagram from Bible Money Matters that


sums up the process:

42
While Lending Club has to be extremely savvy and rigorous with their credit modelling, investors
on Lending Club need to be equally as savvy about determining which loans are more likely to be
paid off. At first, you may wonder why investors put money into anything but low interest loans. The
incentive investors have to back higher interest loans is, well, the higher interest! If investors believe
the borrower can pay back the loan, even if he or she has a weak financial history, then investors can
make more money through the larger additional amount the borrower has to pay.

Most investors use a portfolio strategy to invest small amounts in many loans, with healthy mixes of
low, medium, and interest loans. In this course, we'll focus on the mindset of a conservative investor
who only wants to invest in the loans that have a good chance of being paid off on time. To do that,
we'll need to first understand the features in the dataset and then experiment with building machine
learning models that reliably predict if a loan will be paid off or not.

II.2.1 Data description


Lending Club releases data for all of the approved and declined loan applications periodically on
their website. You can select different year ranges to download the datasets (in CSV format) for both
approved and declined loans.

You'll also find a data dictionary (in XLS format) which contains information on the different column
names towards the bottom of the page. We recommend downloading the data dictionary to so you
can refer to it whenever you want to learn more about what a column represents in the datasets.
Here's a link to the data dictionary file hosted on Google Drive.

Before diving into the datasets, let's get familiar with the data dictionary. The LoanStats sheet describes
the approved loans datasets and the RejectStats describes the rejected loans datasets. Since rejected
applications don't appear on the Lending Club marketplace and aren't available for investment, we'll
be focusing on approved loans.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The approved loans datasets contain information on current loans, completed loans, and defaulted
loans. Let's now define the problem statement for this machine learning project:
• Can we build a machine learning model that can accurately predict if a borrower will pay off their
loan on time or not?

Before we can start doing machine learning, we need to define what features we want to use and
which column represents the target column we want to predict. Let's start by reading and exploring
the dataset.

II.2.2 Reading into Pandas

In this lesson, we'll focus on approved loans data from 2007 to 2011, since a good number of the loans
have already finished. In the datasets for later years, many of the loans are current and still being paid
off.

43
If we complete the following, we can reduce the size of the dataset for the ease of use:
• Remove the desc column:
• Which contains a long text explanation for each loan
• Remove the url column:
• Which contains a link to each loan on Lending Club which can only be accessed with an investor
account
• Remove all columns containing more than 50% missing values:
• Which allows us to move faster since we can spend less time trying to fill these values

First, let's read the dataset into a Dataframe so we can start to explore the data and remaining features.

Instructions

• Read loans_2007.csv into a DataFrame named loans_2007 and use the print function to display the
first row of the Dataframe.
• Use the print function to:
• Display the first row of loans_2007
• The number of columns in loans_2007

II.2.3 First group of columns

The Dataframe contains many columns and can be cumbersome to try to explore all at once. Let's
separate the columns into 3 groups of 18 columns and use the data dictionary to become familiar with
what each column represents. As you understand each feature, look for any features that:

• Disclose information from the future (after the loan has already been funded)
• Don't affect a borrower's ability to pay back a loan (e.g. a randomly generated ID value by Lending
Club)
• Need to be cleaned up and are formatted poorly
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Require more data or a lot of processing to turn into a useful feature


• Contain redundant information

We need to especially pay attention to data leakage, since it can cause our model to overfit. This is
because the model uses data about the target column that wouldn't be available when we're using the
model on future loans. We encourage you to take your time to understand each column, because a
poor understanding could cause you to make mistakes in the data analysis and modeling process. As
you go through the dictionary, keep in mind that we need to select one of the columns as the target
column we want to use for the machine learning phase.

In this screen and the next few screens, let's focus on just columns that we need to remove from
consideration. Then, we can circle back and further dissect the columns we decided to keep.

To make this process easier, we created a table that contains the name, data type, first row's value, and
description from the data dictionary for the first 18 rows.

44
Name dtype First Value Description
id Object 1077501 A unique LC assigned ID for the loan listing.
member_id Float64 1.2966e+06 A unique LC assigned Id for the borrower member.
loan_amnt float64 5000 The listed amount of the loan applied for by the borrower
funded_amnt float64 5000 The total amount committed to that loan at that point in time
funded_amnt_inv float64 49750 The total amount committed by investors for that loan at that point in time
term object 36 months The number of payments on the loan. Values are in months and can be either 36 or 60
int_rate object 10.65% Interest Rate on the loan
installment float64 162.87 The monthly payment owed by the borrower if the loan originates.
grade object B LC assigned loan grade
sub_grade object B2 LC assigned loan subgrade
emp_title object NaN The job title supplied by the Borrower when applying for the loan
Employment length in years. Possible values are between 0 and 10 where 0 means less than one year and 10
emp_length object 10+ years
means ten or more years
The home ownership status provided by the borrower during registration. Our values are: RENT, OWN,
home_ownership object RENT
MORTGAGE, OTHER
annual_inc float64 24000 The self-reported annual income provided by the borrower during registration
verification_status object Verified Indicates if income was verified by LC, not verified, or if the income source was verified
issue_d object Dec-2011 The month which the loan was funded
loan_status object Charged Off Current status of the loan
pymnt_plan object n Indicates if a payment plan has been put in place for the loan
purpose object car A category provided by the borrower for the loan request

After analyzing each column, we can conclude that the following features need to be removed:

• id: randomly generated field by Lending Club for unique identification purposes only
• member_id: also a randomly generated field by Lending Club for unique identification purposes only
• funded_amnt: leaks data from the future (after the loan is already started to be funded)
• funded_amnt_inv: also leaks data from the future (after the loan is already started to be funded)
• grade: contains redundant information as the interest rate column (int_rate)
• sub_grade: also contains redundant information as the interest rate column (int_rate)
• emp_title: requires other data and a lot of processing to potentially be useful
• issue_d: leaks data from the future (after the loan is already completely funded)

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Recall that Lending Club assigns a grade and a sub-grade based on the borrower's interest rate. While
the grade and sub_grade values are categorical, the int_rate column contains continuous values, which
are better suited for machine learning.

Let's now drop these columns from the Dataframe before moving onto the next group of columns.

Use the Dataframe method drop to remove the following columns from the loans_2007 Dataframe:
• Id

• member_id • sub_grade
• funded_amnt • emp_title
• funded_amnt_inv • issue_d
• grade

45
II.2.4 Second group of columns
Let's now look at the next 18 columns:
Name dtype First Value Description
title object Computer The loan title provided by the borrower
zip_code object 860xx The first 3 numbers of the zip code provided by the borrower in the loan application
addr_state object AZ The state provided by the borrower in the loan application

A ratio calculated using the borrower’s total monthly debt payments on the total debt obligations, excluding
dti float64 27.65
mortgage and the requested LC loan, divided by the borrower’s self -reported monthly income

delinq_2yrs float64 0 The number of 30+ days past -due incidences of delinquency in the borrower's credit file for the past 2 years

earliest_cr_line object janv-85 The month the borrower's earliest reported credit line was opened

inq_last_6mths float64 1 The number of inquiries in past 6 months (excluding auto and mortgage inquiries)
open_acc float64 3 The number of open credit lines in the borrower's credit file
pub_rec float64 0 Number of derogatory public records
revol_bal float64 13648 Total credit revolving balance
Revolving line utilization rate, or the amount of credit the borrower is using relative to all available revolving
revol_util object 83.7%
credit
total_acc float64 9 The total number of credit lines currently in the borrower's credit file
initial_list_status object f The initial listing status of the loan. Possible values are – W, F
out_prncp float64 0 Remaining outstanding principal for total amount funded
out_prncp_inv float64 0 Remaining outstanding principal for portion of total amount funded by investors
total_pymnt float64 5863.16 Payments received to date for total amount funded
total_pymnt_inv float64 5833.84 Payments received to date for portion of total amount funded by investors
total_rec_prncp float64 5000 Principal received to date

Within this group of columns, we need to drop the following columns:

• zip_code: redundant with the addr_state column since only the first 3 digits of the 5-digit zip code
are visible (which can only be used to identify the state the borrower lives in)
• out_prncp: leaks data from the future, (after the loan already started to be paid off)
• out_prncp_inv: also leaks data from the future, (after the loan already started to be paid off)
• total_pymnt: also leaks data from the future, (after the loan already started to be paid off)
• total_pymnt_inv: also leaks data from the future, (after the loan already started to be paid off)
• total_rec_prncp: also leaks data from the future, (after the loan already started to be paid off)
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The out_prncp and out_prncp_inv both describe the outstanding principal amount for a loan, which
is the remaining amount the borrower still owes. These 2 columns as well as the total_pymnt column
describe properties of the loan after it's fully funded and started to be paid off. This information isn't
available to an investor before the loan is fully funded and we don't want to include it in our model.

Let's go ahead and remove these columns from the Dataframe.

Use the Dataframe method drop to remove the following columns from the loans_2007 Dataframe:

• zip_code
• out_prncp
• out_prncp_inv
• total_pymnt
• total_pymnt_inv
• total_rec_prncp

46
II.2.5 Third group of columns
Let's now move on to the last group of features:
Name dtype First Value Description
total_rec_int float64 863.16 Interest received to date
total_rec_late_fee float64 0 Late fees received to date
recoveries float64 0 post charge off gross recovery
collection_recovery_fee float64 0 post charge off collection fee
last_pymnt_d object janv-15 Last month payment was received
last_pymnt_amnt float64 171.62 Last total payment amount received
last_credit_pull_d object juin-16 The most recent month LC pulled credit for this loan

collections_12_mths_ex_med float64 0 Number of collections in 12 months excluding medical collections

publicly available policy_code=1 new products not publicly available


policy_code float64 1
policy_code=2
Indicates whether the loan is an individual application or a joint
application_type object INDIVIDUAL
application with two co -borrowers
acc_now_delinq float64 0 The number of accounts on which the borrower is now delinquent.
chargeoff_within_12_mths float64 0 Number of charge-offs within 12 months
The past-due amount owed for the accounts on which the borrower is
delinq_amnt float64 0
now delinquent.
pub_rec_bankruptcies float64 0 Number of public record bankruptcies
tax_liens float64 0 Number of tax liens

In the last group of columns, we need to drop the following columns:

• total_rec_int: leaks data from the future, (after the loan has started to be paid off)
• total_rec_late_fee: leaks data from the future, (after the loan has started to be paid off)
• recoveries: leaks data from the future, (after the loan has started to be paid off)
• collection_recovery_fee: leaks data from the future, (after the loan has started to be paid off)
• last_pymnt_d: leaks data from the future, (after the loan has started to be paid off)
• last_pymnt_amnt: leaks data from the future, (after the loan has started to be paid off)

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


All of these columns leak data from the future, meaning that they're describing aspects of the loan
after it's already been fully funded and started to be paid off by the borrower.

Instructions

Use the Dataframe method drop to remove the following columns from the loans_2007 Dataframe:

• total_rec_int
• total_rec_late_fee
• Recoveries
• collection_recovery_fee
• last_pymnt_d
• last_pymnt_amnt

47
Use the print function to display the first row of loans_2007 and the number of columns in loans_2007.

By becoming familiar with the columns in the dataset, we were able to reduce the number of columns
from 52 to 32 columns. We now need to decide on a target column that we want to use for modeling.

We should use the loan_status column, since it's the only column that directly describes if a loan was
paid off on time, had delayed payments, or was defaulted on the borrower. Currently, this column
contains text values and we need to convert it to a numerical value for training a model. Let's explore
the different values in this column and come up with a strategy for converting the values in this column.

Instructions

• Use the Series method value_counts to return the frequency of the unique values in the loan_
status column
• Display the frequency of each unique value using the print function

II.2.6 Binary Classification

There are 8 different possible values for the loan_status column. You can read about most of the
different loan statuses on the Lending Clube website. The two values that start with "Does not meet
the credit policy" aren't explained unfortunately. A quick Google search takes us to explanations from
the lending community here.

We've compiled the explanation for each column as well as the counts in the Dataframe in the following
table:

Loan Status Count Meaning

Fully Paid 33136 Loan has been fully paid off


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Charged Off 5634 Loan for which there is no longer a reasonable expectation of further payments

Does not meet


the credit While the loan was paid off, the loan application today would no longer meet the
1988
policy. Status: credit policy and wouldn't be approved on to the marketplace
Fully Paid

Does not meet


the credit While the loan was charged off, the loan application today would no longer meet
761
policy. Status: the credit policy and wouldn't be approved on to the marketplace
Charged Off
In Grace Period 20 The loan is past due but still in the grace period of 15 days
Late (16-30
8 Loan hasn't been paid in 16 to 30 days (late on the current payment)
days)
Late (31-120
24 Loan hasn't been paid in 31 to 120 days (late on the current payment)
days)
Current 961 Loan is up to date on current payments

Default 3 Loan is defaulted on and no payment has been made for more than 121 days

48
rom the investor's perspective, we're interested in trying to predict whether loans will be paid off on
time. Only the Fully Paid and Charged Off values describe the final outcome of the loan. The other
values describe loans that are still ongoing and where the jury is still out on if the borrower will pay
back the loan on time or not. While the Default status resembles the Charged Off status, in Lending
Club's eyes, loans that are charged off have essentially no chance of being repaid while default ones
have a small chance.

Since we're interested in being able to predict which of these 2 values a loan will fall under, we can treat
the problem as a binary classification one. Let's remove all the loans that don't contain either Fully
Paid or Charged Off as the loan's status. After the removal of the loan statuses, then transform
the Fully Paid values to 1 for the positive case and the Charged Off values to 0 for the negative case.
While there are a few different ways to transform all of the values in a column, we'll use the Dataframe
method replace. According to the documentation, we can pass the replace method a nested mapping
dictionary in the following format:

Lastly, one thing we need to keep in mind is the class imbalance between the positive and negative
cases. While there are 33,136 loans that have been fully paid off, there are only 5,634 that were
charged off. This class imbalance is a common problem in binary classification and during training,
the model ends up having a strong bias towards predicting the class with more observations in the
training set and will rarely predict the class with less observations. The stronger the imbalance, the
more biased the model becomes. There are a few different ways to tackle this class imbalance, which
we'll explore later.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Instructions

• Remove all rows from loans_2007 that contain values other than Fully Paid or Charged Off for
the loan_status column
• Use the Dataframe method replace to replace:
• Fully Paid with 1
• Charged Off with 0

II.2.7 Removing single value columns

To wrap up this lesson, let's look for any columns that contain only one unique value and remove
them. These columns won't be useful for the model since they don't add any information to each loan
application. In addition, removing these columns will reduce the number of columns we'll need to
explore in the future.

49
We'll need to compute the number of unique values in each column and drop the columns that contain
only one unique value. While the Series method unique returns the unique values in a column, it also
counts the Pandas missing value object nan as a value:

Since we're trying to find columns that contain one true unique value, we should first drop the null
values then compute the number of unique values:

Instructions

• Remove any columns from loans_2007 that contain only one unique value:
• Create an empty list, drop_columns to keep track of which columns you want to drop
• For each column:
• Use the Series method dropna to remove any null values and then use the Series
method unique to return the set of non-null unique values
• Use the len() function to return the number of values in that set
• Append the column to drop_columns if it contains only 1 unique value
• Use the Dataframe method drop to remove the columns in drop_columns from loans_2007
• Use the print function to display drop_columns so we know which ones were removed

II.2 Machine Learning Project Walkthrough: Preparing the Features


II.2.1 Recap

You may have learned how to remove all of the columns that contained redundant information, weren't
useful for modeling, required too much processing to make useful, or leaked information from the
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

future. After exporting the Dataframe to a CSV file named filtered_loans_2007.csv to differentiate the
file with the loans_2007.csv. In this lesson, we'll prepare the data for machine learning by focusing on
handling missing values, converting categorical columns to numeric columns, and removing any other
extraneous columns we encounter throughout this process.

Mathematics underlying most machine learning models assumes that the data is numerical and
contains no missing values. To reinforce this requirement, scikit-learn will return an error if you try to
train a model using data that contain missing values or non-numeric values when working with models
like linear regression and logistic regression.

50
Let's start by computing the number of missing values and come up with a strategy for handling them.
Then, we'll focus on the categorical columns.

We can return the number of missing values across the Dataframe by:
• First using the Pandas Dataframe method isnull to return a Dataframe containing Boolean values:
• True if the original value is null
• False if the original value isn't null
• Then using the Pandas Dataframe method sum to calculate the number of null values in each column

Instructions

• Read in filtered_loans_2007.csv as a Dataframe and assign it to loans


• Use the isnull and sum methods to return the number of null values in each column. Assign the
resulting Series object to null_counts
• Use the print function to display the rows of null_counts that are greater than zero

Solutions

II.2.2 Handling missing values

In the previous screen we got a series displaying how many missing values each column with missing
values has:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


While most of the columns have no missing values, two columns have fifty or less rows with missing
values, and two columns, emp_length and pub_rec_bankruptcies, contain a relatively high number of
missing values.

Domain knowledge tells us that employment length is frequently used in assessing how risky a potential
borrower is, so we'll keep this column despite its relatively large number of missing values.

51
Let's inspect the values of the column pub_rec_bankruptcies.

We see that this column offers very little variability, nearly 94% of values are in the same category. It
probably won't have much predictive value. Let's drop it. In addition, we'll remove the remaining rows
containing null values.

This means that we'll keep the following columns and just remove rows containing missing values for
them:
• emp_length
• Title
• revol_útil
• last_credit_pull_d

After removing the rows containing missing values, drop the pub_rec_bankruptcies column entirely.

Let's use the strategy of removing the pub_rec_bankruptcies column first, then remove all rows
containing any missing values to cover both of these cases. This way, we only remove the rows
containing missing values for the emp_length, title and revol_util columns, but not the pub_rec_
bankruptcies column.

Instructions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Use the drop method to remove the pub_rec_bankruptcies column from loans
• Use the dropna method to remove all rows from loans containing any missing values
• Use the dtypes attribute followed by the value_counts() method to return the counts for each
column data type. Use the print function to display these counts

Solutions

52
II.2.3 Texts columns

While the numerical columns can be used natively with scikit-learn, the object columns that contain
text need to be converted to numerical data types. Let's return a new dataframe containing just the
object columns so we can explore them in more depth. You can use the dataframe method select_
dtypes to select only the columns of a certain data type:

Let's select just the object columns then display a sample row for a better sense of how the values in
each column are formatted.

Instructions

• Use the dataframe method select_dtypes to select only the columns of object type from loans and
assign the resulting Dataframe object_columns_df
• Display the first row in object_columns_df using the print function

Solutions

II.2.4 Converting Texts columns

Some of the columns seem like they represent categorical values, but we should confirm by checking
the number of unique values in those columns:
• home_ownership: home ownership status, can only be 1 of 4 categorical values according to the
data dictionary
• verification_status: indicates if income was verified by Lending Club

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• emp_length: number of years the borrower was employed upon time of application
• term: number of payments on the loan, either 36 or 60
• addr_state: borrower's state of residence
• purpose: a category provided by the borrower for the loan request
• title: loan title provided by the borrower

There are also two columns that represent numeric values and need to be converted:
• int_rate: interest rate of the loan in %
• revol_util: revolving line utilization rate or the amount of credit the borrower is using relative to all
available credit, read more here

Based on the first row's values for purpose and title, it seems like these columns could reflect the
same information. Let's explore the unique value counts separately to confirm if this is true.

53
Lastly, some of the columns contain date values that would require a good amount of feature
engineering for them to be potentially useful:
• earliest_cr_line: The month the borrower's earliest reported credit line was opened
• last_credit_pull_d: The most recent month Lending Club pulled credit for this loan

Since these date features require some feature engineering for modeling purposes, let's remove these
date columns from the dataframe.

II.2.5 First 5 categorical columns

Let's explore the unique value counts of the columns that seem like they contain categorical values.

Instructions

• Display the unique value counts for the following columns: home_ownership, verification_status,
emp_lenght, term, addr state columns:
• Store these column names in a list named cols
• Use a for loop to iterate over cols:
• Use the print function combined with the Series method value_counts to display each column's
unique value counts

Solutions

II.2.6 The reason of the Loan


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The home_ownership, verification_status, emp_length, term, and addr_state columns all contain
multiple discrete values. We should clean the emp_length column and treat it as a numerical one since
the values have ordering (2 years of employment is less than 8 years).

First, let's look at the unique value counts for the purpose and title columns to understand which
column we want to keep.

Instructions

Use the value_counts method and the print function to display the unique values in the following
columns:

• Title
• purpose

54
Solutions

II.2.7 Categorical columns

The home_ownership, verification_status, emp_length, and term columns each contain a few discrete
categorical values. We should encode these columns as dummy variables and keep them.

It seems like the purpose and title columns do contain overlapping information, but we'll keep
the purpose column since it contains a few discrete values. In addition, the title column has data quality
issues since many of the values are repeated with slight modifications (e.g. Debt Consolidation and Debt
Consolidation Loan and debt consolidation).

We can use the following mapping to clean the emp_length column:

• "10+ years": 10
• "9 years": 9
• "8 years": 8
• "7 years": 7
• "6 years": 6
• "5 years": 5
• "4 years": 4
• "3 years": 3
• "2 years": 2
• "1 year": 1
• "< 1 year": 0
• "n/a": 0

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


We erred on the side of caution with the 10+ years, < 1 year and n/a mappings. We assume that
people who may have been working more than 10 years have only really worked for 10 years. We also
assume that people who've worked less than a year or if the information is not available that they've
worked for 0. This is a general heuristic but it's not perfect.

Lastly, the addr_state column contains many discrete values, and we'd need to add 49 dummy variable
columns to use it for classification. This would make our dataframe much larger and could slow down
how quickly the code runs. Let's remove this column from consideration.

55
Instructions

• Remove the last_credit_pull_d, addr_state, title, and earliest_cr_line columns from loans
• Convert the int_rate and revol_util columns to float columns by:
• Using the str accessor followed by the rstrip string method to strip the right trailing percent sign
(%):
• loans['int_rate'].str.rstrip('%') returns a new Series with % stripped from the right side of each
value
• On the resulting Series object, use the astype method to convert to the float type
• Assign the new Series of float values back to the respective columns in the Dataframe
• Use the replace method to clean the emp_length column

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

II.2.8 Dummy variables


Let's now encode the home_ownership, verification_status, purpose, and term columns as dummy
variables so we can use them in our model. We first need to use the Pandas get_dummies method to
return a new Dataframe containing a new column for each dummy variable:

56
We can then use the concat method to add these dummy columns back to the original Dataframe:

And then drop the original columns entirely using the drop method:

Instructions

• Encode the home_ownership, verification_status, purpose, and term columns as integer values:
• Use the get_dummies function to return a Dataframe containing the dummy columns
• Use the concat method to add these dummy columns back to loans
• Remove the original, non-dummy columns (home_ownership, verification_status, purpose,
and term) from loans

Solutions

II.3 Machine Learning Project Walkthrough: Making Predictions


II.3.1 Recap

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Our goal is to generate features from data, which we can feed into a machine learning algorithm. The
algorithm will make predictions about whether or not a loan will be paid off on time, which is contained
in the loan_status column of the clean dataset.

As we prepared the data, we removed columns that had data leakage issues, contained redundant
information, or required additional processing to turn into useful features. We cleaned features that
had formatting issues and converted categorical columns to dummy variables.

In the last lesson, we noticed that there's a class imbalance in our target column, loan_status. There
are about 6 times as many loans that were paid off on time (positive case, label of 1) than those that
weren't (negative case, label of 0). Imbalances can cause issues with many machine learning algorithms,
where they appear to have high accuracy, but actually aren't learning from the training data. Due to its
potential to cause issues, we need to keep the class imbalance in mind as we build machine learning
models.

After all of our data cleaning, we ended up with the csv file called clean_loans_2007.csv. Let's read this
file into a dataframe and view a summary of the work we did.

57
Instructions

• Read clean_loans_2007.csv into a Dataframe named loans


• Use the info() method and the print function to display a summary of the dataset

Solutions

II.3.2 Piking an Error Metric

Before we dive into predicting loan_status with machine learning, let's go back to our first steps when
we started cleaning the Lending Club dataset. You may recall the original question we wanted to
answer:

• Can we build a machine learning model that can accurately predict if a borrower will pay off their
loan on time or not?

We established that this is a binary classification problem and we converted the loan_status column
to 0s and 1s as a result. Before diving in and selecting an algorithm to apply to the data, we should
select an error metric.

An error metric will help us figure out when our model is performing well, and when it's performing
poorly. To tie error metrics all the way back to the original question we wanted to answer, let's say
we're using a machine learning model to predict whether or not we should fund a loan on the Lending
Club platform. Our objective in this is to make money -- we want to fund enough loans that are paid
off on time to offset our losses from loans that aren't paid off. An error metric will help us determine if
our algorithm will make us money or lose us money.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

In this case, we're primarily concerned with false positives and false negatives. Both of these are
different types of misclassifications. With a false positive, we predict that a loan will be paid off on
time, but it actually isn't. This costs us money, since we fund loans that lose us money. With a false
negative, we predict that a loan won't be paid off on time, but it actually would be paid off on time.
This loses us potential money, since we didn't fund a loan that actually would have been paid off.

58
Here's a diagram to simplify the concepts:

In the loan_status and prediction columns,


a 0 means that the loan wouldn't be paid off on
time, and a 1 means that it would.

Since we're viewing this problem from the


standpoint of a conservative investor, we need
to treat false positives differently than false
negatives. A conservative investor would want to
minimize risk and avoid false positives as much as
possible. They'd be more secure with missing out
on opportunities (false negatives) than they would
be with funding a risky loan (false positives).

Let's calculate false positives and true positives in Python. We can use multiple conditionals, separated
by a & to select items in a NumPy array that meet certain conditions. For instance, if we had an array
called predictions, we could select items in predictions that equal 1 and where items in loans["loan_
status"] in the same position also equal 1 using this:

The above code will give us all the items in predictions that are true positives -- where we predicted
that the loan would be paid off on time, and it was actually paid off on time. By using the len function
to find the number of items, we can find the number of true positives.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Using the diagram above as a reference, it's possible to compute the other 3 quantities we mentioned
-- false positives, true negatives, and false negatives.

We've generated some predictions automatically and they are stored in a NumPy array called predictions.

59
Instructions

• Find the number of true negatives


• Find the number of items where predictions is 0, and the corresponding entry in loans["loan_
status"] is also 0
• Assign the result to tn
• Find the number of true positives
• Find the number of items where predictions is 1, and the corresponding entry in loans["loan_
status"] is also 1
• Assign the result to tp
• Find the number of false negatives
• Find the number of items where predictions is 0, and the corresponding entry in loans["loan_
status"] is 1
• Assign the result to fn
• Find the number of false positives
• Find the number of items where predictions is 1, and the corresponding entry in loans["loan_
status"] is 0
• Assign the result to fp

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

II.3.3 Class Imbalance

We mentioned earlier that there is a significant


class imbalance in the loan_status column. There
are 6 times as many loans that were paid off
on time (1), than loans that weren't paid off on
time (0). This causes a major issue when we use
accuracy as a metric. Due to the class imbalance,
a classifier can predict 1 for every row, and
still have high accuracy. Here's a diagram that
illustrates the concept:

60
In the above diagram, our predictions are 85.7%
accurate -- we've correctly identified loan_status
in 85.7% of cases. However, we've done this by
predicting 1 for every row. What this means is
that we'll actually lose money. Let's say we loan
out 1000 dollars on average to each borrower.
Each borrower pays us 10% interest back. We
will make a projected profit of 100 dollars on
each loan. In the above diagram, we'd actually
lose money:

As you can see, we made 600 dollars in interest from the borrowers that paid us back, but we
lost 1000 dollars on the one borrower who never paid us back, so we actually ended up losing 400 dollars
overall, even though our model is technically accurate.

This is why it's important to always be aware of imbalanced classes in machine learning models, and
to adjust your error metric accordingly. In this case, we don't want to use accuracy and should instead
use metrics that tell us the number of false positives and false negatives.

This means that we should optimize for:


• high recall (true positive rate)
• low fall-out (false positive rate)

We can calculate false positive rate and true positive rate, using the numbers of true positives, true
negatives, false negatives, and false positives.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


A false positive rate is the number of false positives divided by the number of false positives plus the
number of true negatives. This divides all the cases where we thought a loan would be paid off by all
the loans that weren't paid off:

True positive rate is the number of true positives divided by the number of true positives plus the
number of false negatives. This divides all the cases where we thought a loan would be paid off by all
the loans that were paid off:

61
Simple english ways to think of each term are:

• False Positive Rate: "the percentage of the loans that shouldn't be funded that I would fund".
• True Positive Rate: "the percentage of loans that should be funded that I would fund".

Generally, if we reduce false positive rate, true positive rate will also go down. This is because if we
want to reduce the risk of false positives, we wouldn't think about funding riskier loans in the first
place.

Instructions

• Compute the false positive rate for predictions


• Compute the number of false positives, then divide by the number of false positives plus the number
of true negatives
• Assign to fpr
• Compute the true positive rate for predictions
• Compute the number of true positives, then divide by the number of true positives plus the number
of false negatives
• Assign to tpr
• Print out fpr and tpr to verify
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

62
II.3.4 Logistic Regression

In the last screen, you may have noticed that both fpr and tpr were 1. This is because we predicted 1 for
each row. This means that we correctly identified all of the good loans (true positive rate), but we also
incorrectly identified all of the bad loans (false positive rate). Now that we've setup error metrics, we
can move on to making predictions using a machine learning algorithm.

As we saw in the first screen of the mission, our cleaned dataset contains 41 columns, all of which are
either the int64 or the float64 data type. There aren't any null values in any of the columns. This means
that we can now apply any machine learning algorithm to our dataset. Most algorithms can't deal with
non-numeric or missing values, which is why we had to do so much data cleaning.

In order to fit the machine learning models, we'll use the Scikit-learn library. Although we've built our
own implementations of algorithms in earlier missions, it's easier and faster to use algorithms that
someone else has already written and tuned for high performance.

A good first algorithm to apply to binary classification problems is logistic regression, for the following
reasons:

• It's quick to train and we can iterate more quickly


• It's less prone to overfitting than more complex models like decision trees
• It's easy to interpret

Instructions

• Create a dataframe named features that contains just the feature columns
• Remove the loan_status column
• Create a Series named target that contains just the target column (loan_status)
• Use the fit method of lr to fit a logistic regression to features and target

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• Use the predict method of lr to make predictions on features. Assign the predictions to predictions

Solutions

63
II.3.6 Cross Validation

While we generated predictions in the last screen, those predictions were overfit. They were overfit
because we generated predictions using the same data that we trained our model on. When we use
this to evaluate an error, we get an unrealistically high depiction of how accurate the algorithm is,
because it already "knows" the correct answers. This is like asking someone to memorize a bunch of
physics equations, then asking them to plug numbers into the equations. They can tell you the right
answer, but they can't explain a concept that they haven't already memorized an equation for.

In order to get a realistic depiction of the accuracy of the model, let's perform k-fold cross validation.
We can use the cross_val_predict() function from the sklearn.model_selection package. Here's what
the workflow looks like:

Once we have cross validated predictions, we can compute true positive rate and false positive rate.

Instructions

• Generate cross validated predictions for features


• Call cross_val_predict using lr, features, and target
• Set the cv parameter to 3, so that 3-fold cross validation is performed
• Assign the predictions to predictions
• Use the Series class to convert predictions to a pandas Series object
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Compute true positive rate and false positive rate


• Assign true positive rate to tpr
• Assign false positive rate to fpr
• Display fpr and tpr to evaluate them

64
Solutions

II.3.7 Penalizing the classifier

As you can see from the last screen, our fpr and tpr are around what we'd expect if the model was
predicting all ones. We can look at the first few rows of predictions to confirm:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Unfortunately, even though we're not using accuracy as an error metric, the classifier is, and it isn't
accounting for the imbalance in the classes. There are a few ways to get a classifier to correct for
imbalanced classes. The two main ways are:

65
• Use oversampling and undersampling to ensure that the classifier gets input that has a balanced
number of each class
• Tell the classifier to penalize misclassifications of the less prevalent class more than the other class

We'll look into oversampling and undersampling first. They involve taking a sample that contains equal
numbers of rows where loan_status is 0, and where loan_status is 1. This way, the classifier is forced
to make actual predictions, since predicting all 1s or all 0s will only result in 50% accuracy at most.

The downside of this technique is that since it has to preserve an equal ratio, you have to either:
• Throw out many rows of data. If we wanted equal numbers of rows where loan_status is 0 and
where loan_status is 1, one way we could do that is to delete rows where loan_status is 1
• Copy rows multiple times. One way to equalize the 0s and 1s is to copy rows where loan_status is 0
• Generate fake data. One way to equalize the 0s and 1s is to generate new rows where loan_status is 0

Unfortunately, none of these techniques are easy. The second method we mentioned earlier, telling
the classifier to penalize certain rows more, is much easier to implement using scikit-learn.

We can do this by setting the class_weight parameter to balanced when creating the LogisticRegression
instance. This tells scikit-learn to penalize the misclassification of the minority class during the training
process. The penalty means that the logistic regression classifier pays more attention to correctly
classifying rows where loan_status is 0. This lowers accuracy when loan_status is 1, but increases
accuracy when loan_status is 0.

By setting the class_weight parameter to balanced, the penalty is set to be inversely proportional
to the class frequencies. You can read more about the parameter here. This would mean that for the
classifier, correctly classifying a row where loan_status is 0 is 6 times more important than correctly
classifying a row where loan_status is 1.

We can repeat the cross validation procedure we performed in the last screen, but with the class_
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

weight parameter set to balanced.

66
Instructions

• Create a LogisticRegression instance


• Remember to set class_weight to balanced
• Assign the instance to lr
• Generate cross validated predictions for features
• Call cross_val_predict() using lr, features, and target
• Assign the predictions to predictions
• Use the Series class to convert predictions to a Pandas Series, as we did in the last screen
• Converting to Series objects let's take advantage of boolean filtering and arithmetic operations
from pandas
• Compute true positive rate and false positive rate
• Assign true positive rate to tpr
• Assign false positive rate to fpr
• Print out fpr and tpr to evaluate them

Solutions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

67
II.3.8 Manuel Penalties

We significantly improved the false positive rate in the last screen by balancing the classes, which
reduced true positive rate. Our true positive rate is now around 66%, and our false positive rate is
around 39%. From a conservative investor's standpoint, it's reassuring that the false positive rate is
lower, because it means that we'll be able to do a better job at avoiding bad loans than if we funded
everything. However, we'd only decide to fund 66% of the total loans (true positive rate), so we'd
immediately reject a good amount of loans.

We can try to lower the false positive rate further by assigning a harsher penalty for misclassifying the
negative class. While setting class_weight to balanced will automatically set a penalty based on the
number of 1s and 0s in the column, we can also set a manual penalty. In the last screen, the penalty
scikit-learn imposed for misclassifying a 0 would have been around 5.89 (since there are 5.89 times as
many 1s as 0s).

We can also specify a penalty manually if we want to adjust the rates more. To do this, we need to pass
in a dictionary of penalty values to the class_weight parameter:

The above dictionary will impose a penalty of 10 for misclassifying a 0 and a penalty of 1 for
misclassifying a 1.

Instructions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Modify the code from the last screen to change the class_weight parameter from the string "balanced" to
the dictionary:

Remember to print out the fpr and tpr values at the end!

68
II.3.9 Random Forets

It looks like assigning manual penalties lowered the false positive rate to 9%, and thus lowered our
risk. Note that this comes at the expense of true positive rate. While we have fewer false positives,
we're also missing opportunities to fund more loans and potentially make more money. Given that
we're approaching this as a conservative investor, this strategy makes sense, but it's worth keeping in
mind the tradeoffs.

While we could tweak the penalties further, it's best to move to try a different model right now, for
larger potential false positive rate gains. We can always loop back and iterate on the penalties more
later.

Let's try a more complex algorithm, random forest. We learned about random forests in a previous
mission and constructed our own model. Random forests are able to work with nonlinear data and learn
complex conditionals. Logistic regressions are only able to work with linear data. Training a random
forest algorithm may enable more accuracy due to columns that correlate nonlinearly with loan_status.

We can use the RandomForestClassifer class from scikit-learn to do this.

Instructions

• Modify the code from the last screen, and swap out the LogisticRegression for
a RandomForestClassifer model.
• Set the value of the keyword argument random_state to 1, so the predictions don't vary due to
random chance.
• Set the value of the keyword argument class_weight to balanced, so we avoid issues with imbalanced
classes.
• Remember to print out the fpr and tpr values at the end!

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

69
Solutions

Key Points
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Unfortunately, using a random forest classifier didn't improve our false positive rate. The model is
likely too heavy on the 1 class, and still mostly predicting 1s. We could fix this by applying a harsher
penalty for misclassifications of 0s.

Ultimately, our best model had a false positive rate of nearly 9%, and a true positive rate of nearly 24%.
For a conservative investor, this means that they make money as long as the interest rate is high enough
to offset the losses from 9% of borrowers defaulting. In addition, the pool of 24% of borrowers must
be large enough to make enough interest money to offset the losses.

If we had randomly picked loans to fund, borrowers would have defaulted on 14.5% of them, and our
model is better than that, although we're excluding more loans than a random strategy would. Given
this, there's still quite a bit of room to improve:

70
• We can tweak the penalties further
• We can try models other than a random forest and logistic regression
• We can use some of the columns we discarded to generate better features
• We can ensemble multiple models to get more accurate predictions
• We can tune the parameters of the algorithm to achieve higher performance

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

71
III. Kaggle
Fundamentals
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

III. Kaggle Fundamentals

72
Kaggle Fundamentals
Learn how to get started and participate in Kaggle competitions with our Kaggle Fundamentals course.
Kaggle is a data science competition site where you can sign up to compete with other data scientists
and data science teams to produce the most accurate analysis of a particular data set. Competition
in Kaggle is strong, and placing among the top finishers in a competition will give you bragging rights.

In this course, you will compete in Kaggle's 'Titanic' competition to build a simple machine learning
model and make your first Kaggle submission. You will also learn how to select the best algorithm and
tune your model for the best performance. You'll be working with multiple algorithms such as logistic
regression, k-nearest neighbors, and random forests in attempts to find the model that scores the best
and awards you the best rank.

Throughout this course, you'll learn several tips and tricks for competing in Kaggle competitions that
will help you place highly. You’ll also learn more about effective machine learning workflows, and
about how to use a Jupyter Notebook for Kaggle competitions.

At the end of the course, you’ll have a completed machine learning project and the knowledge you
need to dive into other Kaggle competitions and prove your skills to the world.

By the end of this course, you'll be able to:

• Build a simple machine learning model and make your fist Kaggle submission
• Create new features and select the best-performing features to improve your score
• Work with multiple algorithms including logistic regression, k nearest neighbors, and random forest
• How to select the best algorithm and tune your model for the best performance
III.1 Getting Started with Kaggle
III.1.1 Introduction to Kaggle

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Kaggle is a site where people create algorithms and compete against machine learning practitioners
around the world. Your algorithm wins the competition if it's the most accurate on a particular data set.
Kaggle is a fun way to practice your machine learning skills.

In this mission and the ones that follow, we're going to learn how to compete in Kaggle competitions.
In this introductory mission we'll learn how to:

• Approach a Kaggle competition


• Explore the competition data and learn about the competition topic
• Prepare data for machine learning
• Train a model
• Measure the accuracy of your model
• Prepare and make your first Kaggle submission

73
This course presumes you have an understanding of Python and the pandas library. If you need to
learn about these, we recommend our Python and pandas courses.

Kaggle has created a number of competitions designed for beginners. The most popular of these
competitions, and the one we'll be looking at, is about predicting which passengers survived the sinking
of the Titanic.

In this competition, we have a data set of different information about passengers onboard the Titanic,
and we want to see if we can use that information to predict whether those people survived or not.
Before we start looking at this specific competition, let's take a moment to understand how Kaggle
competitions work.
Each Kaggle competition has two key data
files that you will work with - a training set and
a testing set.

The training set contains data we can use to train


our model. It has a number of feature columns
which contain various descriptive data, as well
as a column of the target values we are trying to
predict: in this case, Survival.

The testing set contains all of the same feature


columns, but is missing the target value column.
Additionally, the testing set usually has fewer
observations (rows) than the training set.
This is useful because we want as much data as we can to train our model on. Once we have trained
our model on the training set, we will use that model to make predictions on the data from the testing
set, and submit those predictions to Kaggle.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

In this competition, the two files are named test.csv and train.csv. We'll start by using the pandas
library to read both files and inspect their size.

Instructions

• Use pandas.read_csv() to import train.csv and assign it to the variable train


• Use DataFrame.shape to calculate the number of rows and columns in train, and assign the result
to train_shape
• Click Run to run your code, and use the variable inspector to view the four variables you just created

74
Solutions

III.1.2 Exploring the data

The files we read in the previous screen are available on the data page for the Titanic competition on
Kaggle. That page also has a data dictionary, which explains the various columns that make up the data
set. Below are the descriptions contained in that data dictionary:

• PassengerID - A column added by Kaggle to identify each row and make submissions easier
• Survived - Whether the passenger survived or not and the value we are predicting (0=No, 1=Yes)
• Pclass - The class of the ticket the passenger purchased (1=1st, 2=2nd, 3=3rd)
• Sex - The passenger's sex
• Age - The passenger's age in years
• SibSp - The number of siblings or spouses the passenger had aboard the Titanic
• Parch - The number of parents or children the passenger had aboard the Titanic
• Ticket - The passenger's ticket number
• Fare - The fare the passenger paid
• Cabin - The passenger's cabin number
• Embarked - The port where the passenger embarked (C=Cherbourg, Q=Queenstown,
S=Southampton)

The data page on Kaggle has some additional notes about some of the columns. It's always worth

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


exploring this in detail to get a full understanding of the data.

The first 2 rows of the data is below:

75
The type of machine learning we will be doing is called classification, because when we make predictions
we are classifying each passenger as a survivor or not. More specifically, we are performing binary
classification, which means that there are only two different states we are classifying.

In any machine learning exercise, thinking about the topic you are predicting is very important. We call
this step acquiring domain knowledge, and it's one of the most important determinants for success in
machine learning.

In this case, understanding the Titanic disaster and specifically what variables might affect the outcome
of survival is important. Anyone who has watched the movie Titanic would remember that women and
children were given preference to lifeboats (as they were in real life). You would also remember the
vast class disparity of the passengers.
This indicates that Age, Sex, and PClass may The resultant plot will look like this:
be good predictors of survival. We'll start by
exploring Sex and Pclass by visualizing the data.

Because the Survived column contains 0 if the


passenger did not survive and 1 if they did, we
can segment our data by sex and calculate the
mean of this column. We can use DataFrame.
pivot_table() to easily do this:

We can immediately see that females survived in much higher proportions than males did. Let's do the
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

same with the Pclass column.

Instructions

• Use DataFrame.pivot_table() to pivot the train dataframe:


• Use "Pclass" for the index parameter
• Use "Survived" for the values parameter
• Use DataFrame.plot.bar() to plot the pivot table

76
III.1.3 Exploring and converting the age column

The Sex and PClass columns are what we call categorical features. That means that the values
represented a few separate options (for instance, whether the passenger was male or female).

Let's take a look at the Age column using Series.describe().

The Age column contains numbers ranging from 0.42 to 80.0 (if you look at Kaggle's data page, it
informs us that Age is fractional if the passenger is less than one). The other thing to note here is that
there are 714 values in this column, fewer than the 891 rows we discovered that the train data set had
earlier in this mission which indicates we have some missing values.

All of this means that the Age column needs to be treated slightly differently, as this is a continuous
numerical column. One way to look at distribution of values in a continuous numerical set is to use
histograms. We can create two histograms to compare visually those that survived vs those who died
across different age ranges:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The resultant plot will look like this:

The relationship here is not simple, but we can


see that in some age ranges more passengers
survived - where the red bars are higher than the
blue bars.

In order for this to be useful to our machine


learning model, we can separate this continuous
feature into a categorical feature by dividing it into
ranges. We can use the pandas.cut() function to
help us out.

77
The pandas.cut() function has two required parameters - the column we wish to cut, and a list of numbers
which define the boundaries of our cuts. We are also going to use the optional parameter labels, which
takes a list of labels for the resultant bins. This will make it easier for us to understand our results.

Before we modify this column, we have to be aware of two things. Firstly, any change we make to
the train data, we also need to apply to the test data, otherwise we will be unable to use our model to
make predictions for our submissions. Secondly, we need to remember to handle the missing values
we observed above.

In the example below, we create a function that:


• Uses the pandas.fillna() method to fill all of the missing values with -0.5
• Cuts the Age column into three segments: Missing, Child, and Adult using pandas.cut()

We then use that function on both The diagram below shows how the function
the train and test dataframes. converts the data:

Note that the cut_points list has one more element than the label_names list, since it needs to define
the upper boundary for the last segment.

Instructions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Create the cut_points and label_names lists to split the Age column into six categories:

• Missing, from -1 to 0
• Infant, from 0 to 5
• Child, from 5 to 12
• Teenager, from 12 to 18
• Young Adult, from 18 to 35
• Adult, from 35 to 60
• Senior, from 60 to 100
• Apply the process_age() function on the train dataframe, assigning the result to train
• Apply the process_age() function on the test dataframe, assigning the result to test
• Use DataFrame.pivot_table() to pivot the train dataframe by the Age_categories column
• Use DataFrame.plot.bar() to plot the pivot table

78
Solutions

III.1.4 Preparing the data for Machine Learning

So far we have identified three columns that may be useful for predicting survival:

• Sex
• Pclass
• Age, or more specifically our newly created Age_categories

Before we build our model, we need to prepare these columns for machine learning. Most machine
learning algorithms can't understand text labels, so we have to convert our values into numbers.

Additionally, we need to be careful that we don't imply any numeric relationship where there isn't one.
If we think of the values in the Pclass column, we know they are 1, 2, and 3. You can confirm this by
running the following code:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


While the class of each passenger certainly
has some sort of ordered relationship, the
relationship between each class is not the same
as the relationship between the numbers 1, 2,
and 3. For instance, class 2 isn't "worth" double
what class 1 is, and class 3 isn't "worth" triple
what class 1 is.

In order to remove this relationship, we can create


dummy columns for each unique value in Pclass:

79
Rather than doing this manually, we can use the pandas.get_dummies() function, which will generate
columns shown in the diagram above.

The following code creates a function to create the dummy columns for the Pclass column and add it
back to the original dataframe. It then applies that function the train and test dataframes.

Let's use that function to create dummy columns for both the Sex and Age_categories columns.

Instructions

• Use the create_dummies() function to create dummy variables for the Sex column:
• In the train dataframe
• In the test dataframe
• Use the create_dummies() function to create dummy variables for the Age_categories column:
• In the train dataframe
• In the test dataframe

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

III.1.5 Your First Machine Learning

Now that our data has been prepared, we are ready to train our first model. The first model we will use is
called Logistic Regression, which is often the first model you will train when performing classification.

We will be using the scikit-learn library as it has many tools that make performing machine learning
easier. The scikit-learn workflow consists of four main steps:

• Instantiate (or create) the specific machine learning model you want to use
• Fit the model to the training data
• Use the model to make predictions
• Evaluate the accuracy of the predictions

80
Each model in scikit-learn is implemented as a separate class and the first step is to identify the class
we want to create an instance of. In our case, we want to use the LogisticRegression class.

We'll start by looking at the first two steps. First, we need to import the class:

Next, we create a LogisticRegression object:

Lastly, we use the LogisticRegression.fit() method to train our model. The .fit() method accepts two
arguments: X and Y. X must be a two dimensional array (like a dataframe) of the features that we wish
to train our model on, and Y must be a one-dimensional array (like a series) of our target, or the column
we wish to predict.

The code above fits (or trains) our LogisticRegression model using three columns: Pclass_2, Pclass_3,
and Sex_male.

Let's train our model using all of the columns we created in the previous screen.

Instructions

• Instantiate a LogisticRegression object called lr


• Use LogisticRegression.fit() to fit the model on the train dataset using:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• The columns contained in columns as the first (X) parameter
• The Survived column as the second (Y) parameter

Solutions

81
III.1.6 Split data

Congratulations, you've trained your first machine learning model! Our next step is to find out how
accurate our model is, and to do that, we'll have to make some predictions.

If you recall from earlier, we do have a test dataframe that we could use to make predictions. We could
make predictions on that data set, but because it doesn't have the Survived column we would have to
submit it to Kaggle to find out our accuracy. This would quickly become a pain if we had to submit to
find out the accuracy every time we optimized our model.

We could also fit and predict on our train dataframe, however if we do this there is a high likelihood
that our model will overfit, which means it will perform well because we're testing on the same data
we've trained on, but then perform much worse on new, unseen data.

Instead we can split our train dataframe into two:

• One part to train our model on (often 80% of the observations)


• One part to make predictions with and test our model (often 20% of the observations

The convention in machine learning is to call these two parts train and test. This can become confusing,
since we already have our test dataframe that we will eventually use to make predictions to submit to
Kaggle. To avoid confusion, from here on, we're going to call this Kaggle 'test' data holdout data, which
is the technical name given to this type of data used for final predictions.

The scikit-learn library has a handy model_selection.train_test_split() function that we can use to split
our data. train_test_split() accepts two parameters, X and Y, which contain all the data we want to
train and test on, and returns four objects: train_X, train_Y, test_X, test_Y:
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

82
Here's what the syntax for creating these four objects looks like:

You'll notice that there are two other parameters we used: test_size, which lets us control what
proportions our data are split into, and random_state. The train_test_split() function randomizes
observations before dividing them, and setting a random seed means that our results will be
reproducible, which is important if you are collaborating, or need to produce consistent results each
time (which our answer checker requires).

Instructions

• Use the model_selection.train_test_split() function to split the train dataframe using the following
parameters:
• test_size of 0.2
• random_state of 0
• Assign the four returned objects to train_X, test_X, train_y, and test_y

Solutions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

III.1.7 Making Predictions and Measuring their Accuracy

Now that we have our data split into train and test sets, we can fit our model again on our training set,
and then use that model to make predictions on our test set

Once we have fit our model, we can use the LogisticRegression.predict() method to make predictions

83
The predict() method takes a single parameter X, a two dimensional array of features for the observations
we wish to predict. X must have the exact same features as the array we used to fit our model. The
method returns single dimensional array of predictions.

There are a number of ways to measure the


accuracy of machine learning models, but when
competing in Kaggle competitions you want
to make sure you use the same method that
Kaggle uses to calculate accuracy for that specific
competition.

In this case, the evaluation section for the Titanic


competition on Kaggle tells us that our score
calculated as "the percentage of passengers
correctly predicted". This is by far the most
common form of accuracy for binary classification.

As an example, imagine we were predicting a


small data set of five observations.
In this case, our model correctly predicted three out of five values, so the accuracy based on this
prediction set would be 60%.

Again, scikit-learn has a handy function we can use to calculate accuracy: metrics.accuracy_score(). The
function accepts two parameters, y_true and y_pred, which are the actual values and our predicted
values respectively, and returns our accuracy score.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Instructions

• Instantiate a new LogisticRegression() object, lr


• Fit the model using train_X and train_y
• Make predictions using test_X and assign the results to predictions
• Use accuracy_score() to compare test_y and predictions, assigning the result to accuracy
• Print the accuracy variable

Solutions

84
III.1.8 Using Cross Validation for More
Accurate Error Measurement

Our model has an accuracy score of 81.0% when


tested against our 20% test set. Given that this
data set is quite small, there is a good chance that
our model is overfitting, and will not perform as
well on totally unseen data.

To give us a better understanding of the real


performance of our model, we can use a technique
called cross validation to train and test our model
on different splits of our data, and then average
the accuracy scores.
The most common form of cross validation, and the one we will be using, is called k-fold cross validation.
'Fold' refers to each different iteration that we train our model on, and 'k' just refers to the number of
folds. In the diagram above, we have illustrated k-fold validation where k is 5.

We will use scikit-learn's model_selection.cross_val_score() function to automate the process. The


basic syntax for cross_val_score() is:

• estimator is a scikit-learn estimator object, like the LogisticRegression() objects we have been
creating
• X is all features from our data set
• y is the target variables
• cv specifies the number of folds

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The function returns a numpy ndarray of the accuracy scores of each fold.

It's worth noting, the cross_val_score() function can use a variety of cross validation techniques and
scoring types, but it defaults to k-fold validation and accuracy scores for our input types.

Instructions

• Instantiate a new LogisticRegression() object, lr


• Use model_selection.cross_val_score() to perform cross-validation on our data and assign the
results to scores:
• Use the newly created lr as the estimator
• Use all_X and all_y as the input data
• Specify 10 folds to be used
• Use the numpy.mean() function to calculate the mean of scores and assign the result to accuracy.
• Print the variables scores and accuracy

85
Solutions

III.1.9 Making Predictions on Unseen Data

From the results of our k-fold validation, you can see that the accuracy number varies with each fold -
ranging between 76.4% and 87.6%. This demonstrates why cross validation is important.

As it happens, our average accuracy score was 80.2%, which is not far from the 81.0% we got from
our simple train/test split, however this will not always be the case, and you should always use cross-
validation to make sure the error metrics you are getting from your model are accurate.

We are now ready to use the model we have built to train our final model and then make predictions
on our unseen holdout data, or what Kaggle calls the 'test' data set.

Instructions

• Instantiate a new LogisticRegression() object, lr


• Use the fit() method to train the model lr using all of the Kaggle training data: all_X and all_y
• Make predictions using the holdout data and assign the result to holdout_predictions

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

86
III.1.10 Creation submission file

The last thing we need to do is create a submission file. Each Kaggle competition can have slightly
different requirements for the submission file. Here's what is specified on the Titanic competition
evaluation page:

You should submit a csv file with exactly 418 entries plus a header row. Your submission will show an error if
you have extra columns (beyond PassengerId and Survived) or rows.

The file should have exactly 2 columns:

• PassengerId (sorted in any order)


• Survived (contains your binary predictions: 1 for survived, 0 for deceased)

The table below shows this in a slightly easier to We will need to create a new dataframe that
understand format, so we can visualize what we contains the holdout_predictions we created in
are aiming for. the previous screen and the PassengerId column
from the holdout dataframe. We don't need to
worry about matching the data up, as both of
these remain in their original order.

To do this, we can pass a dictionary to the pandas.


DataFrame() function:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Finally, we'll use the DataFrame.to_csv() method to save the dataframe to a CSV file. We need to make
sure the index parameter is set to False, otherwise we will add an extra column to our CSV.

Instructions

• Create a dataframe submission that matches Kaggle's specification


• Use the to_csv() method to save the submission dataframe using the filename submission.csv,
using the documentation to look up the correct syntax

Solutions

87
III.1.11 Making Our First Submission to Kaggle

You can download the submission file you just created (when working locally, it will be in the same
directory as your notebook).

Now that we have our submission file, we can start our submission to Kaggle by clicking the blue
'Submit Predictions' button on the competition page.

You will then be prompted to upload your CSV file, and add a brief description of your submission.
When you make your submission, Kaggle will process your predictions and give you your accuracy for
the holdout data and your ranking.

When it is finished processing you will see our first submission gets an accuracy score of 0.75598, or
75.6%.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The fact that our accuracy on the holdout data is 75.6% compared with the 80.2% accuracy we got
with cross-validation indicates that our model is overfitting slightly to our training data.

At the time of writing, accuracy of 75.6% gives a rank of 6,663 out of 7,954. It's easy to look at
Kaggle leaderboards after your first submission and get discouraged, but keep in mind that this is just
a starting point.

It's also very common to see a small number of scores of 100% at the top of the Titanic leaderboard
and think that you have a long way to go. In reality, anyone scoring about 90% on this competition is
likely cheating (it's easy to look up the names of the passengers in the holdout set online and see if
they survived).

88
There is a great analysis on Kaggle, How am I doing with my score, which uses a few different strategies
and suggests a minimum score for this competition is 62.7% (achieved by presuming that every
passenger died) and a maximum of around 82%. We are a little over halfway between the minimum
and maximum, which is a great starting point.

There are many things we can do to improve the accuracy of our model. Here are some that we will
cover in the next two missions of this course:

• Improving the features:


• Feature Engineering: Create new features from the existing data
• Feature Selection: Select the most relevant features to reduce noise and overfitting
• Improving the model:
• Model Selection: Try a variety of models to improve performance
• Hyperparameter Optimization: Optimize the settings within each particular machine learning
model
III.2 Feature Preparation, Selection and Engineering
In the last mission, we made our first submission to Kaggle, getting an accuracy score of 75.6%. While
this is a good start, there is definitely room for improvement. There are two main areas we can focus
on to boost the accuracy of our predictions:

• Improving the features we train our model on


• Improving the model itself

In this mission, we're going to focus working with the features used in our model.

We'll start by looking at feature selection. Feature selection is important because it helps to exclude
features which are not good predictors, or features that are closely related to each other. Both of these
will cause our model to be less accurate, particularly on previously unseen data.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The diagram below illustrates this. The red dots represent the data we are trying to predict, and each
of the blue lines represents a different model.

89
The model on the left is overfitting, which means the model represents the training data too closely,
and is unlikely to predict well on unseen data, like the holdout data for our Kaggle competition.

The model on the right is well-fit. It captures the underlying pattern in the data without the detailed
noise found just in the training set. A well fit model is likely to make accurate predictions on previously
unseen data. The key to creating a well-fit model is to select the right balance of features, and to create
new features to train your model.

In the previous mission, we trained our model using data about the age, sex and class of the passengers
on the Titanic. Let's start by using the functions we created in that mission to add the columns we had
at the end of the first mission.

Remember that any modifications we make to our training data (train.csv) we also have to make to our
holdout data (test.csv).

Instructions

• Use the process_age() function:


• To convert the Age column in train, assigning the result to train
• To convert the Age column in holdout, assigning the result to holdout
• Create a for loop which iterates over the column names "Age_categories", "Pclass", and "Sex". In
each iteration:
• Use the create_dummies() function to process the train dataframe for the given column, assigning
the result to train
• Use the create_dummies() function to process the holdout dataframe for the given column,
assigning the result to holdout
• Use the print() function to display the columns in train using train.columns

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

90
III.2.1 Preparing more Features

Our model in the previous mission was based on three columns from the original data: Age, Sex, and
Pclass. As you saw when you printed the column names in the previous screen, there are a number of
other columns that we haven't yet used. To make it easier to reference, the output from the previous
screen is copied below:

The last nine rows of the output are dummy columns we created, but in the first three rows we can
see there are a number of features we haven't yet utilized. We can ignore PassengerId, since this
is just a column Kaggle have added to identify each passenger and calculate scores. We can also
ignore Survived, as this is what we're predicting, as well as the three columns we've already used.

Here is a list of the remaining columns (with a brief description), followed by 10 randomly selected
passengers from and their data from those columns, so we can refamiliarize ourselves with the data.

• SibSp - The number of siblings or spouses the passenger had aboard the Titanic
• Parch - The number of parents or children the passenger had aboard the Titanic
Ticket - The passenger's ticket number

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™



• Fare - The fair the passenger paid
• Cabin - The passengers cabin number
• Embarked - The port where the passenger embarked (C=Cherbourg, Q=Queenstown,
S=Southampton)

91
Name SibSp Parch Ticket Fare Cabin Embarked
Peters, Miss.
680 0 0 330935 8.1375 NaN Q
Katie
Vander Planke,
333 Mr. Leo 2 0 345764 18.0000 NaN S
Edmondus

Heininen, Miss. STON/O2.


816 0 0 7.9250 NaN S
Wendla Maria 3101290

Hays, Miss.
310 Margaret 0 0 11767 83.1583 C54 C
Bechstein
Bishop, Mrs.
291 Dickinson H 1 0 11967 91.0792 B49 C
(Helen Walton)

Wheadon, Mr.
33 0 0 C.A. 24579 10.5000 NaN S
Edward H
Nirva, Mr. Iisakki SOTON/O2
761 0 0 7.1250 NaN S
Antino Aijo 3101272

Allison, Master.
305 1 2 113781 151.5500 C22 C26 S
Hudson Trevor

SOTON/O.Q.
210 Ali, Mr. Ahmed 0 0 7.0500 NaN S
3101311
Mellinger, Mrs.
272 (Elizabeth Anne 0 1 250644 19.5000 NaN S
Maidment)

At first glance, both the Name and Ticket columns look to be unique to each passenger. We will come
back to these columns later, but for now we'll focus on the other columns.

We can use the Dataframe.describe() method to give us some more information on the values within
each remaining column.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

92
Of these, SibSp, Parch and Fare look to be standard numeric columns with no missing values. Cabin has
values for only 204 of the 891 rows, and even then most of the values are unique, so for now we will
leave this column also. Embarked looks to be a standard categorical column with 3 unique values,
much like PClass was, except that there are two missing values. We can easily fill these two missing
values with the most common value, "S" which occurs 644 times.

Looking at our numeric columns, we can see a big difference between the range of each. SibSp has
values between 0-8, Parch between 0-6, and Fare is on a dramatically different scale, with values
ranging from 0-512. In order to make sure these values are equally weighted within our model, we'll
need to rescale the data.

Rescaling simply stretches or shrinks the data as needed to be on the same scale, in our case between
0 and 1.
In the diagram above, the three columns have
different minimum and maximum values before
rescaling.

After rescaling, the values in each feature has


been compressed or stretched so that they are all
on the same scale - they have the same minimum
and maximum, and the relationship between
each point is still the same relative other points in
that feature. You can now easily see that the data
represented in each column is identical.

Within scikit-learn, the preprocessing.minmax_


scale() function allows us to quickly and easily
rescale our data:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Let's process the Embarked, SibSp, Parch and Fare columns in both our train and holdout dataframes.

Instructions

• For both the train and holdout dataframes:


• Use the Series.fillna() method to replace any missing values in the Embarked column with “S”
• Use our create_dummies() function to create dummy columns for the Embarked column
• Use minmax_scale() to rescale the SibSp, Parch, and Fare columns, assigning the results back to
new columns SibSp_scaled, Parch_scaled and Fare_scaled respectively

93
Solutions

III.2.2 Determining the Most Relevant Features

In order to select the best-performing features, we need a way to measure which of our features are
relevant to our outcome - in this case, the survival of each passenger. One effective way is by training a
logistic regression model using all of our features, and then looking at the coefficients of each feature.

The scikit-learn LogisticRegression class has an attribute in which coefficients are stored after the
model is fit, LogisticRegression.coef_. We first need to train our model, after which we can access this
attribute.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The coef() method returns a NumPy array of coefficients, in the same order as the features that were
used to fit the model. To make these easier to interpret, we can convert the coefficients to a pandas
series, adding the column names as the index:

We'll now fit a model and plot the coefficients for each feature.

94
Instructions

• Instantiate a LogisticRegression() object


• Fit the LogisticRegression object using the columns from the list columns from the train dataframe
and the target column Survived
• Use the coef_ attribute to retrieve the coefficients of the features, and assign the results
to coefficients
• Create a series object using coefficients, with the feature column names as the index and assign it
to feature_importance
• Use the Series.plot.barh() method to plot the feature_importance series

Solutions

III.2.3 Training a model using relevant features

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The plot we generated in the last screen showed a range of both positive and negative values. Whether
the value is positive or negative isn't as important in this case, relative to the magnitude of the value.
If you think about it, this makes sense. A feature that indicates strongly whether a passenger died is
just as useful as a feature that indicates strongly that a passenger survived, given they are mutually
exclusive outcomes.

To make things easier to interpret, we'll alter the plot to show all positive values, and have sorted the
bars in order of size:

95
We'll train a new model with the top 8 scores and check our accuracy using cross validation.

Instructions

• Instantiate a LogisticRegression() object


• Use the model_selection.cross_val_score() function and assign the returned object to scores, using:
• The columns specified in columns and all rows from the train dataframe
• A cv parameter of 10
• Calculate the mean of the cross validation scores and assign the results to accuracy
Use the print() function to display the variable accuracy
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Solutions

96
III.2.4 Submitting our Improved Model to Kaggle

The cross validation score of 81.48% is marginally higher than the cross validation score for the model
we created in the previous mission, which had a score of 80.2%.

Hopefully, this improvement will translate to previously unseen data. Let's train a model using the
columns from the previous step, make some predictions on the holdout data and submit it to Kaggle
for scoring.

Instructions

• Instantiate a LogisticRegression() object and fit it using all_X and all_y


• Use the predict() method to make predictions using the same columns in the holdout dataframe,
and assign the result to holdout_predictions
• Create a dataframe submission with two columns:
• PassengerId, with the values from the PassengerId column of the holdout dataframe
• Survived, with the values from holdout_predictions
• Use the DataFrame.to_csv method to save the submission dataframe to the filename submission_1.csv

Solutions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


III.2.5 Engineering a New Feature Using Binning

You can download the CSV from the previous step here. When you submit it to Kaggle, you'll see
that the score is 77.0%, which at the time of writing equates to jumping about 1,500 places up the
leaderboard (this will vary as the leaderboard is always changing). It's only a small improvement, but
we're moving in the right direction.

A lot of the gains in accuracy in machine learning come from Feature Engineering. Feature engineering
is the practice of creating new features from your existing data.

97
One common way to engineer a feature is using a technique called binning. Binning is when you take
a continuous feature, like the fare a passenger paid for their ticket, and separate it out into several
ranges (or 'bins'), turning it into a categorical variable.
This can be useful when there are patterns in the
data that are non-linear and you're using a linear
model (like logistic regression). We actually used
binning in the previous mission when we dealt
with the Age column, although we didn't use the
term.

Let's look at histograms of the Fare column for


passengers who died and survived, and see if
there are patterns that we can use when creating
our bins.

Looking at the values, it looks like we can separate


the feature into four bins to capture some patterns
from the data:
Like in the previous mission, we can use
the pandas.cut() function to create our bins.
• 0-12
• 12-50
• 50-100
• 100+

Instructions

• Using the process_age() function as a model, create a function process_fare() that uses the
pandas cut() method to create bins for the Fare column and assign the results to a new column
called Fare_categories
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• We have already dealt with missing values in the Fare column, so you won't need the line that
uses fillna()
• Use the process_fare() function on both the train and holdout dataframes, creating the four 'bins’:
• 0-12, for values between 0 and 12
• 12-50, for values between 12 and 50
• 50-100, for values between 50 and 100
• 100+, for values between 100 and 1000
• Use the create_dummies() function we created earlier in the mission on both
the train and holdout dataframes to create dummy columns based on our new fare bins

98
Solutions

III.2.6 Engineering Features From Text


Columns

Another way to engineer features is by extracting


data from text columns. Earlier, we decided that
the Name and Cabin columns weren't useful by
themselves, but what if there is some data there
we could extract? Let's take a look at a random
sample of rows from those two columns:

While in isolation the cabin number of each passenger will be reasonably unique to each, we can see

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


that the format of the cabin numbers is one letter followed by two numbers. It seems like the letter is
representative of the type of cabin, which could be useful data for us. We can use the pandas Series.
str accessor and then subset the first character using brackets:

99
Looking at the Name column, There is a title like We can use the Series.str.extract method and
'Mr' or 'Mrs' within each, as well as some less a regular expression to extract the title from each
common titles, like the 'Countess' from the final name and then use the Series.map() method and
row of our table above. By spending some time a predefined dictionary to simplify the titles.
researching the different titles, we can categorize
these into six types:

• Mr
• Mrs
• Master
• Miss
• Officer
• Royalty

Instructions

• Use extract(), map() and the dictionary titles to categorize the titles for the holdout dataframe and
assign the results to a new column Title
• For both the train and holdout dataframes:
• Use the str() accessor to extract the first letter from the Cabin column and assign the result to a
new column Cabin_type
• Use the fillna() method to fill any missing values in Cabin_type with "Unknown"
• For the newly created columns Title and Cabin_type, use create_dummies() to create dummy
columns for both the train and holdout dataframes

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

100
III.2.7 Finding Correlated Features

We now have 34 possible feature columns we can use to train our model. One thing to be aware of
as you start to add more features is a concept called collinearity. Collinearity occurs where more than
one feature contains data that are similar.

The effect of collinearity is that your model will overfit - you may get great results on your test data set,
but then the model performs worse on unseen data (like the holdout set).

One easy way to understand collinearity is with a simple binary variable like the Sex column in our
dataset. Every passenger in our data is categorized as either male or female, so 'not male' is exactly the
same as 'female'.

As a result, when we created our two dummy


columns from the categorical Sex column, we've
actually created two columns with identical data
in them. This will happen whenever we create
dummy columns, and is called the dummy variable
trap. The easy solution is to choose one column
to drop any time you make dummy columns.

Collinearity can happen in other places, too.


A common way to spot collinearity is to plot
correlations between each pair of variables in a
heatmap. An example of this style of plot is below:

The darker squares, whether the darker red or darker blue, indicate pairs of columns that have higher
correlation and may lead to collinearity. The easiest way to produce this plot is using the DataFrame.
corr() method to produce a correlation matrix, and then use the Seaborn library's seaborn.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


heatmap() function to plot the values:

The example plot above was produced using a code example from seaborn's documentation which
produces an correlation heatmap that is easier to interpret than the default output of heatmap(). We've
created a function containing that code to make it easier for you to plot the correlations between the
features in our data.

101
Instructions

Use the plot_correlation_heatmap() function to produce a heatmap for the train dataframe, using only
the features in the list columns.

Solutions

III.2.8 Final Feature Selection using RFECV

The plot we created in the previous screen is reproduced below:


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

102
We can see that there is a high correlation between Sex_female/Sex_male and Title_Mr/Title_Mrs.
We will remove the columns Sex_female and Sex_male since the title data may be more nuanced.

Apart from that, we should remove one of each of our dummy variables to reduce the collinearity in
each. We'll remove:

• Pclass_2
• Age_categories_Teenager
• Fare_categories_12-50
• Title_Master
• Cabin_type_A

In an earlier step, we manually used the logit coefficients to select the most relevant features. An
alternate method is to use one of scikit-learn's inbuilt feature selection classes. We will be using
the feature_selection.RFECV class which performs recursive feature elimination with cross-validation.

The RFECV class starts by training a model using all of your features and scores it using cross validation.
It then uses the logit coefficients to eliminate the least important feature, and trains and scores a new
model. At the end, the class looks at all the scores, and selects the set of features which scored highest.

Like the LogisticRegression class, RFECV must first be instantiated and then fit. The first parameter
when creating the RFECV object must be an estimator, and we need to use the cv parameter to
specific the number of folds for cross-validation.

Once the RFECV object has been fit, we can use the RFECV.support attribute to access a boolean

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


mask of True and False values which we can use to generate a list of optimized columns:

Instructions

• Instantiate a LogisticRegression() object, lr


• Instantiate a RFECV() object selector using the newly created lr object and cv=10 as parameters
• Use the fit() method to fit selector using all_X and all_y
• Use the support attribute selector to subset all_X.columns, and assign the result to optimized_
columns

Because of the computation involved in this exercise, code running may take longer than other screens.

103
Solutions

III.2.9 Training A Model Using our Optimized Columns

The RFECV() selector returned only four columns:

Let's train a model using cross validation using these columns and check the score.

Instructions

• Instantiate LogisticRegression() object


• Use the model_selection.cross_val_score() function and assign the results to scores, using:
• all_X and all_y
• A cv parameter of 10
• Calculate the mean of the cross validation scores and assign the results to accuracy

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

III.2.10 Submitting our Model to Kaggle


This four-feature model scores 82.3%, a modest improvement compared to the 81.5% from our earlier
model. Let's train these columns on the holdout set, save a submission file and see what score we get
from Kaggle.

104
Instructions

• Instantiate a LogisticRegression() object and fit it using all_X and all_y


• Use the predict() method to make predictions using the same columns in the holdout dataframe,
and assign the result to holdout_predictions
• Create a dataframe submission with two columns:
• PassengerId, with the values from the PassengerId column of the holdout dataframe
• Survived, with the values from holdout_predictions
• Use the DataFrame.to_csv method to save the submission dataframe to the filename submission_2.csv.

Solutions

You can download the submission file we just created and submit it to Kaggle. The score this submission
gets is 78.0%, which is equivalent to a jump of roughly 1,000 spots (again, this will vary as submissions
are constantly being made to the leaderboard).

By preparing, engineering and selecting features, we have increased our accuracy by 2.4%. When
working in Kaggle competitions, you should spend a lot of time experimenting with features, particularly
feature engineering. Here are some ideas that you can use to work with features for this competition:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• Use SibSp and Parch to explore total relatives onboard
• Create combinations of multiple columns, for instance Pclass + Sex
• See if you can extract useful data out of the Ticket column
• Try different combinations of features to see if you can identify features that overfit less than others

In the next mission in this course, we'll look at selecting and optimizing different models to improve
our score.

105
III.3 Model Selection and Tuning
III.3.1 Model selection

In the previous mission, we worked to optimize our predictions by creating and selecting the features
used to train our model. The other half of the optimization puzzle is to optimize the model itself— or
more specifically, the algorithm used to train our model.

So far, we've been using the logistic regression algorithm to train our models, however there are
hundreds of different machine learning algorithms from which we can choose. Each algorithm has
different strengths and weaknesses, and so we need to select the algorithm that works best with our
specific data— in this case our Kaggle competition.

The process of selecting the algorithm which gives the best predictions for your data is called model
selection.

In this mission, we're going to work with two new algorithms: k-nearest neighbors and random forests.

Before we begin, we'll need to import in the data. To save time, we have saved the features we created
in the previous mission as CSV files, train_modified.csv and holdout_modified.csv.

Instructions

• Import train_modified.csv into a pandas dataframe and assign the result to train
• Import holdout_modified.csv into a pandas dataframe and assign the result to holdout

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

III.3.2 Training a Baseline Model

We're going to train our models using all the columns in the train dataframe. This will cause a small
amount of overfitting due to collinearity (as we discussed in the previous mission), but having more
features will allow us to more thoroughly compare algorithms.

So we have something to compare to, we're going to train a logistic regression model like in the
previous two missions. We'll use cross validation to get a baseline score.

106
Instructions

• Instantiate a linear_model.LogisticRegression class


• Use the model_selection.cross_val_score() function to train and test a model assigning the result
to scores, using:
• The LogisticRegression object you just created
• all_X and all_y as the X and y parameters
• 10 folds
• Calculate the mean of scores and assign the result to accuracy_lr

Solutions

III.3.3 Training a Model using K-Nearest Neighbors

The logistic regression baseline model from the previous screen scored 82.5%

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The logistic regression algorithm works by calculating linear relationships between the features and the
target variable and using those to make predictions. Let's look at an algorithm that makes predictions
using a different method.

The k-nearest neighbors algorithm finds the observations in our training set most similar to the
observation in our test set, and uses the average outcome of those 'neighbor' observations to make a
prediction. The 'k' is the number of neighbor observations used to make the prediction.

107
The plots below shows three simple k-nearest neighbors models where there are two features shown
on each axis, and two outcomes, red and green

• In the first plot, the value of k is 1. The green dot is therefore the closet neighbor to the gray dot,
making the prediction green
• In the second plot, the value of k is 3. The closest 3 neighbors to our gray dot are used (2 red vs 1
green), making the prediction red
• In the third plot, the value of k is 5. The closest 5 neighbors to our gray dot are used (3 red vs 2
green), making the prediction red

If you'd like to learn more about the k-nearest neighbors algorithm, you might like to check out our
free Introduction to K-Nearest Neighbors mission.

Just like it does for logistic regression, scikit-learn has a class that makes it easy to use k-nearest
neighbors to make predictions, neighbors.KNeighborsClassifier.

Scikit-learn's use of object-oriented design makes it easy to substitute one model for another. The
syntax to instantiate a KNeighborsClassifier is very similar to the syntax we use for logistic regression.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

The optional n_neighbors argument sets the value of k when predictions are made. The default value
of n_neighbors is 5, but we're going to start by building a simple model that uses the closest neighbor
to make our predictions.

Instructions

• Instantiate a neighbors.KNeighborsClassifier object, setting the n_neighbors argument to 1.


• Use the model_selection.cross_val_score() function to train and test a model assigning the result
to scores, using:
• The KNeighborsClassifier object you just created.
• all_X and all_y as the the X and y parameters.
• 10 folds
• Calculate the mean of scores and assign the result to accuracy_knn

108
Solutions

III.3.4 Exploring Different K Values

The k-nearest neighbors model we trained in the previous screen had an accuracy score of 78.3%,
worse than our baseline score of 82.5%.

Besides pure model selection, we can vary the settings of each model— for instance the value of k in
our k-nearest neighbors model. This is called hyperparameter optimization.

We can use a loop and Python's inbuilt range() class to iterate through different values for k and
calculate the accuracy score for each different value. We will only want to test odd values for k to
avoid ties, where both 'survived' and 'died' outcomes would have the same number of neighbors.

This is the syntax we would use to get odd values between 1-7 from range():

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Note that we use the arguments (1,8,2) to get values between 1 and 7, since the created range() object
contains numbers up to but not including the 8.

Let's use this technique to calculate the accuracy of our model for values of k from 1-49, storing the
results in a dictionary.

To make the results easier to understand, we'll finish by plotting the scores. We have provided a helper
function, plot_dict() which you can use to easily plot the dictionary.

109
Instructions

• Use a for loop and the range class to iterate over odd values of k from 1-49, and in each iteration:
• Instantiate a KNeighborsClassifier object with the value of k for the n_neighbors argument
• Use cross_val_score to create a list of scores using the newly created KNeighborsClassifier object,
using all_X, all_y, and cv=10 as the arguments
• Calculate the mean of the list of scores
• Add the mean of the scores to the dictionary knn_scores, using k for the key
• Use the plot_dict() helper function to plot the knn_scores dictionary

III.3.5 Automating Hyperparameter Optimization with Grid Search

Looking at our plot from the previous screen we can see that a k value of 19 gave us our best score,
and checking the knn_scores dictionary we can see that the score was 82.4%, identical to our baseline
(if we didn't round the numbers you would see that it's actually 0.01% less accurate).
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

110
The technique we just used is called grid search - we train a number of models across a 'grid' of values
and then searched for the model that gave us the highest accuracy.

Scikit-learn has a class to perform grid search, model_selection.GridSearchCV(). The 'CV' in the name
indicates that we're performing both grid search and cross validation at the same time.

By creating a dictionary of parameters and possible values and passing it to the GridSearchCV object
you can automate the process. Here's what the code from the previous screen would look like, when
implemented using the GridSearchCV class.

Running this code will produce the following output:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Our final step is to print the GridSearchCV.best_
params_ and GridSearchCV.best_score_ attributes
to retrieve the parameters of the best-performing
model, and the score it achieved.

We can also use GridSearchCV to try combinations


of different hyperparameters. Say we wanted to
test values of "ball_tree", "kd_tree", and "brute" for
the algorithm parameter and values of 1, 3,
and 5 for the n_neighbors algorithm
parameter. GridSearchCV would train and test 9
models (3 for the first hyperparameter times 3 for
the second hyperparameter), shown in the diagram
below.

111
Let's use GridSearchCV to turbo-charge our search for the best performing parameters for our model,
by testing 40 combinations of three different hyperparameters.

We have chosen the specific hyperparameters by consulting the documentation for


the KNeighborsClassifier class.

Instructions

• Instantiate a KNeighborsClassifier object


• Instantiate a GridSearchCV object, using:
• The KNeighborsClassifier object you just created as the first (unnamed) argument
• The hyperparameters dictionary for the param_grid
• A cv of 10
• Fit the GridSearchCV object using all_X and all_y
• Assign the parameters of the best performing model to best_params
• Assign the score of the best performing model to best_score

Solutions
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

112
III.3.6 Submitting K-Nearest Neighbors Predictions to Kaggle

The cross-validation score for the best performing model was 82.9%, better than our baseline model.

We can use the GridSearchCV.best_estimator_ attribute to retrieve a trained model with the best-
performing hyperparameters. This code:

Is equivalent to this code where we manually specify the hyperparameters and train the model:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Lets use that model to make predictions on the holdout set and submit those predictions to Kaggle to
see if we have improved overall.

Instructions

• Make predictions on the data from holdout_no_id using the best_knn model, and assign the result
to holdout_predictions
• Create a dataframe submission with two columns:
• PassengerId, with the values from the PassengerId column of the holdout dataframe
• Survived, with the values from holdout_predictions
• Use the DataFrame.to_csv method to save the submission dataframe to the filename submission_1.csv

113
Solutions

III.3.7 Introducing Random Forests

You can download the submission file from the previous screen here.

When you submit this to Kaggle, you'll see it scores 75.6%, less than our best submission of 78.0%.
While our model could be overfitting due to including all columns, it also seems like k-nearest neighbors
may not be the best algorithm choice.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

114
Let's try another algorithm called random forests.
Random forests is a specific type of decision
tree algorithm. You have likely seen decision trees
before as part of flow charts or infographics.
Say we wanted to build a decision tree to help
us categorize an object as either being 'hotdog' or
'not hotdog', we could construct a decision tree
like the below:

Decision tree algorithms attempt to build the most efficient decision tree based on the training data,
and then use that tree to make future predictions. If you'd like to learn about decision trees and
random forests in detail, you should check out our decision trees course.

Scikit-learn contains a class for classification using the random forest algorithm, ensemble.
RandomForestClassifier. Here's how to fit a model and make predictions using
the RandomForestClassifier class:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Because the algorithm includes randomization, we have to set the random_state parameter to make
sure our results are reproducible.

Let's use a RandomForestClassifier object with cross_val_score() as we did earlier to see how the
algorithm performs with the default hyperparameters.

Instructions

• Instantiate a RandomForestClassifier object, setting the random_state parameter to 1


• Use the cross_val_score() function to generate a set of scores and assign the result to scores, using:
• The RandomForestClassifier object you just created as the estimator
• all_X and all_y for the train and test data
• A cv value of 10
• Calculate the mean of scores and assign the result to accuracy_rf

115
Solutions

III.3.8 Tuning our Random Forests Model with GridSearch

Using the default settings, our random forests model obtained a cross validation score of 82.0%.

Just like we did with the k-nearest neighbors model, we can use GridSearchCV to test a variety of
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

hyperparameters to find the best performing model.

The best way to see a list of available hyperparameters is by checking the documentation for the
classifier— in this case, the documentation for RandomForestClassifier. Let's use grid search to test out
combinations of the following hyperparameters:

• criterion: "entropy" or "gini”


• max_depth: 5 or 10
• max_features: "log2" or "sqrt”
• min_samples_leaf: 1 or 5
• min_samples_split: 3 or 5
• n_estimators: 6 or 9

116
Instructions

• Instantiate a RandomForestClassifier object, setting the random_state parameter to 1


• Instantiate a GridSearchCV object, using:
• The RandomForestClassifier object you just created as the first (unnamed) argument
• A dictionary of hyperparameters that matches the list above for the param_grid argument
• A cv of 10
• Fit the GridSearchCV object using all_X or all_y
• Assign the parameters of the best performing model to best_params
• Assign the score of the best performing model to best_score

Solutions

III.3.9 Submitting Random Forest Predictions to Kaggle

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


The cross-validation score for the best performing model was 83.8%, making it the best cross-validation
score we've obtained in this mission.

117
Let's train it on the holdout data and create a submission file to see how it performs on the Kaggle
leaderboard!

Instructions

• Assign the best performing model from the GridSearchCV object grid to best_rf
• Make predictions on the data from holdout_no_id using the best_rf model, and assign the result
to holdout_predictions
• Create a dataframe submission with two columns:
• PassengerId, with the values from the PassengerId column of the holdout dataframe
• Survived, with the values from holdout_predictions
• Use the DataFrame.to_csv method to save the submission dataframe to the filename submission_2.csv

Solutions

III.3.10 Introducing Random Forests

The submission file we created in the previous step is available.


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

If you submit this to Kaggle, it achieves a score of 77.1%, considerably better than our k-nearest
neighbors score of 75.6% and very close (2 incorrect predictions) to our best score from the previous
mission of 78.0%.

118
By combining our strategies for feature selection, feature engineering, model selection and model
tuning, we'll be able to continue to improve our score.

The next and final mission in this course is a guided project, where we'll teach you how to combine
everything you've learned into a real-life Kaggle workflow, and continue to improve your score.

III.4 Guided Project: Creating a Kaggle Workflow


III.4.1 Introducing Data Science Workflows

So far in this course, you've been learning about Kaggle competitions using Dataquest missions.
Missions are highly structured and your work is answer checked every step of the way

Guided projects, on the other hand, are less structured and focus more on exploration. Guided projects
help you synthesize concepts learned during missions and practice what you have learned.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Guided projects bridge the gap between learning using the Dataquest missions, and applying the
knowledge on your own computer, and your answers are not checked like they are in regular missions,
however you can access a solution notebook by using the top of the interface.

Working with Guided projects is a great opportunity to practice some of the extra skills you'll need
to do data science by yourself, including practicing debugging using all the tools at your disposal,
including googling for answers, visiting Stack Overflow and consulting the documentation for the
modules you are using.

This guided project uses Jupyter notebook, a web application which lets you combine text and code
within a single file, and is one of the most popular ways to explore and iterate when working with data.
The Jupyter notebook easily allows you to share your work, and makes exploring data much easier.

If you're not familiar with Jupyter notebook, we recommend completing our guided project on Using
Jupyter notebook to familiarize yourself.

119
In this guided project, we're going to put together all that we've learned in this course and create
a data science workflow.

Data science, and particularly machine learning, contain many dimensions of complexity when
compared with standard software development. In standard software development, code not working
as you expect can be caused by a number of factors along two dimensions:

• Bugs in implementation
• Algorithm design

Machine learning problems, have many more dimensions:

• Bugs in implementation
• Algorithm design
• Model issues
• Data quality

The result of this is that there are exponentially more places that machine learning can go wrong.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

This concept is shown in the diagram above (taken from the excellent post Why is machine learning
'hard'?). The green dot is a 'correct' solution, where the red dots are incorrect solutions. In this
illustration there are only a small number of incorrect combinations for software engineering, but in
machine learning this becomes exponentially greater!

By defining a workflow for yourself, you can give yourself a framework with which to make iterating
on ideas quicker and easier, allowing yourself to work more efficiently.

In this mission, we're going to explore a workflow to make competing in the Kaggle Titanic competition
easier, using a pipeline of functions to reduce the number of dimensions you need to focus on.

To get started, we'll read in the original train.csv and test.csv files from Kaggle.

120
Instructions

• Import the pandas library


• Use pandas to import the file train.csv as train
• Use pandas to import the file test.csv as holdout
• Display the first few lines of the test dataframe

III.4.2 Preprocessing the Data

One of the many benefits of using Jupyter is that (by default) it uses the IPython kernel to run code.
This gives you all the benefits of IPython, including code completion and 'magic' commands. (If you'd
like to read more about the internals of Jupyter and how it can help you work more efficiently, you
might like to check out our blog post Jupyter Notebook Tips, Tricks and Shortcuts.)

We can use one of those magic commands, the %load command, to load an external file.
The %load command will copy the contents of the file into the current notebook cell. The syntax is
simple:

To illustrate, say we had a file called test.py with the following line of code:

To use load, we simply type the following into a Jupyter cell:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


If we ran the cell one more time, the code would run, giving us the output This is test.py.

We have created a file, functions.py which contains versions of the functions we created in the earlier
missions form this course, which will save you building those functions again from scratch.

Let's import that file and preprocess our Kaggle data.

121
Instructions

• Use the %load magic command to load the contents of functions.py into a notebook cell and read
through the functions you have imported
• Create a new function, which:
• Accepts a dataframe parameter
• Applies the process_missing(), process_age(), process_fare(), process_titles(), and process_
cabin() functions to the dataframe
• Applies the create_dummies() function to the “Age_categories”, “Fare_categories”, “Title”, and
“Sex” columns
• Returns the processed dataframe
• Apply the newly create function on the train and holdout dataframes

III.4.3 Exploring the Data

In the first three missions of this course, we have


done a variety of activities, mostly in isolation:
Exploring the data, creating features, selecting
features, selecting and tuning different models.

The Kaggle workflow we are going to build will


combine all of these into a process.

• Data exploration, to find patterns in the data


• Feature engineering, to create new features from those patterns or through pure experimentation
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Feature selection, to select the best subset of our current set of features
• Model selection/tuning, training a number of models with different hyperparameters to find the
best performer

We can continue to repeat this cycle as we work to optimize our predictions. At the end of any cycle
we wish, we can also use our model to make predictions on the holdout set and then Submit to
Kaggle to get a leaderboard score.

While the first two steps of our workflow are relatively freeform, later in this project we'll create some
functions that will help automate the complexity of the latter two steps so we can move faster.

For now, let's practice the first stage, exploring the data. We're going to examine the two columns that
contain information about the family members each passenger had onboard: SibSp and Parch.

If you need some help with techniques for exploring and visualizing data, you might like to check out
our Data Analysis with Pandas and Exploratory Data Visualization courses.

122
Instructions

• Review the data dictionary and variable notes for the Titanic competition on Kaggle's website to
familiarize yourself with the SibSp and Parch columns
• Use pandas and matplotlib to explore those two columns. You might like to try:
• Inspecting the type of the columns
• Using histograms to view the distribution of values in the columns
• Use pivot tables to look at the survival rate for different values of the columns
• Find a way to combine the columns and look at the resulting distribution of values and survival
rate
• Write a markdown cell explaining your findings

III.4.4 Engineering New Features


You should have discovered in the previous step that by combining the values of SibSp and Parch into
a single column, only 30% of the passengers who had no family members onboard survived.

If you didn't get this conclusion, you can use the code segment below to verify this for yourself:

Based of this, we can come up with an idea for a new feature - was the passenger alone. This will be a
binary column containing the value:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• 1 if the passenger has zero family members onboard
• 0 if the passenger has one or more family members onboard

Let's go ahead and create this feature.

Instructions

• Create a function, that:


• Accepts a dataframe as input
• Adds a new column, isalone that has the value 0 if the passenger has one or more family members
onboard, and 1 if the passenger has zero family members onboard.
• Returns the new dataframe
• Apply the newly created function to the train and holdout dataframes

123
III.4.5 Selecting the Best-Performing Features

The next step in our workflow is feature selection. In the Feature Preparation, Selection and
Engineering mission, we used scikit-learn's feature_selection.RFECV class to automate selecting the
best-performing features using recursive feature elimination.

To speed up our Kaggle workflow, we can create a function that performs this step for us, which will
mean we can perform feature selection by calling a self-contained function and focus our efforts on
the more creative part - exploring the data and engineering new features.

You may remember that the first parameter when you instantiate a RFECV() object is an estimator. At
the time we used a Logistic Regression estimator, but we've since discovered in the Model Selection
and Tuning mission that Random Forests seems to be a better algorithm for this Kaggle competition.

Let's write a function that:

• Accepts a dataframe as input


• Performs data preparation for machine learning
• Uses recursive feature elimination and the random forests algorithm to find the best-performing
set of features

Instructions

• Import feature_selection.RFECV and ensemble.RandomForestClassifier


• Create a function, select_features(), that:
• Accepts a dataframe as input
• Removes any non-numeric columns or columns containing null values
• Creates all_X and all_y variables, making sure that all_X contains neither
the PassengerId or Survived columns
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• Uses feature_selection.RFECV and ensemble.RandomForestClassifier to perform recursive


feature elimination using:
• all_X and all_y
• A random state of 1
• 10 fold cross validation
• Prints a list of the best columns from recursive feature elimination
• Returns a list of the best columns from recursive feature elimination
• Run the newly created function using the train dataframe as input and assign the result to a variable

III.4.6 Selecting and Tuning Different Algorithms

Just like we did with feature selection, we can write a function to do the heavy lifting of model selection
and tuning. The function we'll create will use three different algorithms and use grid search to train
using different combinations of hyperparameters to find the best performing model.

124
We can achieve this by creating a list of dictionaries— that is, a list where each element of the list is a
dictionary. Each dictionary should contain:

• The name of the particular model


• An estimator object for the model
• A dictionary of hyperparameters that we'll use for grid search

Here's an example of what one of these dictionaries will look like:

We can then use a for loop to iterate over the list of dictionaries, and for each one we can use scikit-
learn's model_selection.GridSearchCV class to find the best set of performing parameters, and add
values for both the parameter set and the score to the dictionary.

Finally, we can return the list of dictionaries, which will have our trained GridSearchCV objects as well
as the results so we can see which was the most accurate.

Instructions

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• Import model_selection.GridSearchCV, neighbors import KNeighborsClassifier, and linear_model
import LogisticRegression
• Create a function, select_model(), that:
• Accepts a dataframe and a list of features as input
• Splits the dataframe into all_X (containing only the features in the input parameter) and all_y
• Contains a list of dictionaries, each containing a model name, its estimator and a dictionary of
hyperparameters:
• LogisticRegression, using the following hyperparameters:
• "solver": ["newton-cg", "lbfgs", "liblinear"]
• KNeighborsClassifier, using the following hyperparameters:
• "n_neighbors": range(1,20,2)
• "weights": ["distance", "uniform"]
• "algorithm": ["ball_tree", "kd_tree", "brute"]
• "p": [1,2]
• RandomForestClassifier, using the following hyperparameters:
• "n_estimators": [4, 6, 9]

125
• "criterion": ["entropy", "gini"]
• "max_depth": [2, 5, 10]
• "max_features": ["log2", "sqrt"]
• "min_samples_leaf": [1, 5, 8]
• "min_samples_split": [2, 3, 5]
• Iterate over that list of dictionaries, and for each dictionary:
• Print the name of the model.
• Instantiate a GridSearchCV() object using the model, the dictionary of hyperparameters and
specify 10 fold cross validation
• Fit the GridSearchCV() object using all_X and all_y
• Assign the parameters and score for the best model to the dictionary
• Assign the best estimator for the best model to the dictionary
• Print the the parameters and score for the best model
• Return the list of dictionaries
• Run the newly created function using the train dataframe and the output of select_features() as
inputs and assign the result to a variable

III.4.7 Making a Submission to Kaggle

After running your function, you will have three scores from three different models. At this point in the
workflow you have a decision to make: Do you want to train your best model on the holdout set and
make a Kaggle submission, or do you want to go back to engineering features.

You may find that adding a feature to your model doesn't improve your accuracy. In that case you
should go back to data exploration and repeat the cycle again.

If you're going to be continually submitting to Kaggle, a function will help make this easier. Let's create
a function to automate this.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Note that in our Jupyter Notebook environment, the DataFrame.to_csv() method will save the CSV in
the same directory as your notebook, just as it would if you are running Jupyter locally. To download
the CSV from our environment, you can either click the 'download' button to download all of your
project files as a tar file, or click the Jupyter logo at the top of the interface, and navigate to the CSV
itself to download just that file.

126
Instructions

• Create a function, save_submission_file(), that:


• Accepts a trained model and a list of columns as required arguments, and an optional filename
argument
• Uses the model to make predictions on the holdout dataframe using the columns specified
• Transforms the predictions into a submission dataframe with PassengerID and Survived columns
as specified by Kaggle
• Saves that dataframe to a CSV file with either a default filename, or the filename specified by the
optional argument
• Retrieve the best performing model from the variable returned by select_model()
• Use save_submission_file() to save out a CSV of predictions
• Download that file and submit it to Kaggle

III.4.8 Next misión

In this guided project, we created a reproducible


workflow to help us iterate over ideas and continue
to improve the accuracy of our predictions. We
also created helper functions which will make
feature selection, model selection/tuning and
creating submissions much easier as we continue
to explore the data and create new features.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


We encourage you to continue working on this Kaggle competition. Here are some suggestions of
next steps:

• Continue to explore the data and create new features, following the workflow and using the
functions we created
• Read more about the titanic and this Kaggle competition to get ideas for new features
• Use some different algorithms in the select_model() function, like support vector machines, stochastic
gradient descent or perceptron linear models
• Experiment with RandomizedSearchCV instead of GridSearchCV to speed up your select_
features() function

127
You can continue to work on this Kaggle competition within this guided project environment and
save out files for submission if you like, although we would encourage you to set up your own Python
environment so that you can work on your own computer. We have a Python Installation Guide that
walks you through how to do this.

Lastly, while the Titanic competition is great for learning about how to approach your first Kaggle
competition, we recommend against spending many hours focused on trying to get to the top of the
leaderboard. With such a small data set, there is a limit to how good your predictions can be, and your
time would be better spent moving onto more complex competitions.

Once you feel like you have a good understanding of the Kaggle workflow, you should look at some
other competitions - a great next competition is the House Prices Competition. We have a great tutorial
for getting started with this competition on our blog.

Curious to see what other students have done on this project? Head over to our Community to check
them out. While you are there, please remember to show some love and give your own feedback!

And of course, we welcome you to share your own project and show off your hard work. Head over to
our Community to share your finished Guided Project!
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

128
IV. TensorFlow Concepts
Concepts
IV. TensorFlow

129
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
IV.1 Presentation of TensorFlow

TensorFlow is one of the famous deep learning


framework, developed by Google Team. It is a free
and open source software library and designed
in Python programming language, this tutorial
is designed in such a way that we can easily
implement deep learning project on TensorFlow
in an easy and efficient way.

The word TensorFlow is made by two words, i.e.,


Tensor and Flow

1. Tensor is a multidimensional array


2. Flow is used to define the flow of data in
operation

TensorFlow is used to define the flow of data in


operation on a multidimensional array or Tensor.
IV.1.1 History of TensorFlow

Many years ago, deep learning started to exceed all other machine learning algorithms when giving
extensive data. Google has seen it could use these deep neural networks to upgrade its services:

• Google search engine


• Gmail
• Photo
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

They build a framework called TensorFlow to permit researchers and developers to work together in
an AI model. Once it approved and scaled, it allows lots of people to use it.

It was first released in 2015, while the first stable version was coming in 2017. It is an open- source
platform under Apache Open Source License. We can use it, modify it, and reorganize the revised
version for free without paying anything to Google.

130
IV.1.2 Components of TensorFlow

The name TensorFlow is derived from its core


framework, "Tensor”. A tensor is a vector or a
matrix of n-dimensional that represents all type
of data. All values in a tensor hold similar data
type with a known shape. The shape of the data
is the dimension of the matrix or an array.

A tensor can be generated from the input data


or the result of a computation. In TensorFlow,
all operations are conducted inside a graph.
The group is a set of calculation that takes place
successively. Each transaction is called an op
node are connected. TensorFlow makes use of a graph framework. The
chart gathers and describes all the computations
done during the training.

IV .1.3 Advantages

• It was fixed to run on multiple CPUs or GPUs and mobile operating systems
• The portability of the graph allows to conserve the computations for current or later use. The graph
can be saved because it can be executed in the future
• All the computation in the graph is done by connecting tensors together

Consider the following expression a= (b+c)*(c+2). We can break the functions into components given

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


below:

d=b+c
e=c+2
a=d*e

Now, we can represent these operations graphically below:

131
A session can execute the operation from the
graph. To feed the graph with the value of a tensor,
we need to open a session. Inside a session, we
must run an operator to create an output.

IV.1.4 Why is TensorFlow popular?

TensorFlow is the better library for all because it is accessible to everyone. TensorFlow library
integrates different API to create a scale deep learning architecture like CNN (Convolutional Neural
Network) or RNN (Recurrent Neural Network).

TensorFlow is based on graph computation; it can allow the developer to create the construction of
the neural network with Tensorboard. This tool helps debug our program. It runs on CPU (Central
Processing Unit) and GPU (Graphical Processing Unit).
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

IV. 1.5 Use Cases/Applications of TensorFlow

TensorFlow provides amazing functionalities


and services when compared to other popular
deep learning frameworks. TensorFlow is used
to create a large-scale neural network with many
layers.

132
It is mainly used for deep learning or machine learning problems such as Classification, Peception,
Understanding, Discovering, Prediction and Creation.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Voice/Sound Recognition

Voice and sound recognition applications are the most-known use cases of deep-learning. If the neural
networks have proper input data feed, neural networks are capable of understanding audio signals.
For example:

• Voice recognition is used in the Internet of Things, automotive, security, and UX/UI
• Sentiment Analysis is mostly used in customer relationship management (CRM)
• Flaw Detection (engine noise) is mostly used in automotive and Aviation
• Voice search is mostly used in customer relationship management (CRM)

133
Image Recognition

Image recognition is the first application that made deep learning and machine learning popular.
Telecom, Social Media, and handset manufacturers mostly use image recognition. It is also used for
face recognition, image search, motion detection, machine vision, and photo clustering.

For example, image recognition is used to recognize and identify people and objects in from of images.
Image recognition is used to understand the context and content of any image.

For object recognition, TensorFlow helps to classify and identify arbitrary objects within larger images.
This is also used in engineering application to identify shape for modeling purpose (3d reconstruction
from 2d image) and by Facebook for photo tagging.

For example, deep learning uses TensorFlow for analyzing thousands of photos of cats. So a deep
learning algorithm can learn to identify a cat because this algorithm is used to find general features of
objects, animals, or people.

Time Series

Deep learning is using Time Series algorithms for examining the time series data to extract meaningful
statistics. For example, it has used the time series to predict the stock market.

A recommendation is the most common use case for Time Series. Amazon, Google, Facebook,
and Netflix are using deep learning for the suggestion. So, the deep learning algorithm is used to
analyze customer activity and compare it to millions of other users to determine what the customer
may like to purchase or watch.

For example, it can be used to recommend us TV shows or movies that people like based on TV shows
or movies we already watched.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Video Detection

The deep learning algorithm is used for video detection. It is used for motion detection, real-time
threat detection in gaming, security, airports, and UI/UX field.

For example, NASA is developing a deep learning network for object clustering of asteroids and orbit
classification. So, it can classify and predict NEOs (Near Earth Objects).

134
Text-Based Applications

Text-based application is also a popular deep learning algorithm. Sentimental analysis, social media,
threat detection, and fraud detection, are the example of Text-based applications.

For example, Google Translate supports over 100 languages.

Some companies who are currently using TensorFlow are Google, AirBnb, eBay, Intel, DropBox, Deep
Mind, Airbus, CEVA, Snapchat, SAP, Uber, Twitter, Coca-Cola, and IBM.

IV 1.6 Features of TensorFlow

TensorFlow has an interactive multiplatform programming interface which is scalable and reliable
compared to other deep learning libraries which are available.

These features of TensorFlow will tell us about the popularity of TensorFlow.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

135
Responsive Construct

We can visualize each part of the graph, which is not an option while using Numpy or SciKit. To
develop a deep learning application, firstly, there are two or three components that are required to
create a deep learning application and need a programming language.

Flexible

It is one of the essential TensorFlow Features according to its operability. It has modularity and parts
of it which we want to make standalone.

Easily Trainable

It is easily trainable on CPU and for GPU in distributed computing.

Parallel Neural Network Training

TensorFlow offers to the pipeline in the sense


that we can train multiple neural networks and
various GPUs, which makes the models very
efficient on large-scale systems.

Large Community
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Google has developed it, and there already is a large team of software engineers who work on stability
improvements continuously.

Open Source

The best thing about the machine learning library is that it is open source so anyone can use it as much
as they have internet connectivity. So, people can manipulate the library and come up with a fantastic
variety of useful products. And it has become another DIY community which has a massive forum for
people getting started with it and those who find it hard to use it.

Feature Columns

TensorFlow has feature columns which could be thought of as intermediates between raw data and
estimators; accordingly, bridging input data with our model.

The feature below describes how the feature column is implemented.

136
Availability of Statistical Distributions

This library provides distributions functions including Bernoulli, Beta, Chi2, Uniform, Gamma, which
are essential, especially where considering probabilistic approaches such as Bayesian models.

Layered Components

TensorFlow produces layered operations of weight and biases from the function such as tf.contrib.
layers and also provides batch normalization, convolution layer, and dropout layer. So tf.contrib.
layers.optimizers have optimizers such as Adagrad, SGD, Momentum which are often used to solve
optimization problems for numerical analysis.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Visualizer (With TensorBoard)

We can inspect a different representation of a model and make the changed necessary while debugging
it with the help of TensorBoard.

Event Logger (With TensorBoard)

It is just like UNIX, where we use tail - f to monitor the output of tasks at the cmd. It checks, logging
events and summaries from the graph and production with the TensorBoard.

137
IV.2 TensorFlow Basics
IV.2.1 Single Layer Perceptron in
TensorFlow

The perceptron is a single processing unit of any


neural network. Frank Rosenblatt first proposed
in 1958 is a simple neuron which is used to classify
its input into one or two categories. Perceptron
is a linear classifier, and is used in supervised
learning. It helps to organize the given input data.

A perceptron is a neural network unit that does


a precise computation to detect features in the
input data. Perceptron is mainly used to classify
the data into two parts. Therefore, it is also known
as Linear Binary Classifier.

Perceptron uses the step function that returns +1 if the weighted sum of its input 0 and -1.

The activation function is used to map the input between the required value like (0, 1) or (-1, 1).

A regular neural network looks like this:


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

138
The perceptron consists of 4 parts.

1. Input value or One input layer: The input layer of the perceptron is made of artificial input neurons
and takes the initial data into the system for further processing
2. Weights and Bias
• Weight: It represents the dimension or strength of the connection between units. If the weight
to node 1 to node 2 has a higher quantity, then neuron 1 has a more considerable influence on
the neuron
• Bias: It is the same as the intercept added in a linear equation. It is an additional parameter which
task is to modify the output along with the weighted sum of the input to the other neuron
3. Net sum: It calculates the total sum
4. Activation Function: A neuron can be activated or not, is determined by an activation function.
The activation function calculates a weighted sum and further adding bias with it to give the result

A standard neural network looks like the below diagram.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

139
How does it work?
The perceptron works on these simple steps which are given below:
A. In the first step, all the inputs x are multiplied B. In this step, add all the increased values and
with their weights w call them the Weighted sum

C. In our last step, apply the weighted sum to a correct Activation Function. For Example:

A Unit Step Activation Function


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

There are two types of architecture. These types focus on the functionality of artificial neural networks
as follows:

• Single Layer Perceptron


• Multi-Layer Perceptron

The single-layer perceptron was the first neural network model, proposed in 1958 by Frank Rosenbluth.
It is one of the earliest models for learning. Our goal is to find a linear decision function measured by
the weight vector w and the bias parameter b.

To understand the perceptron layer, it is necessary to comprehend artificial neural networks (ANNs).

140
The artificial neural network (ANN) is an information processing system, whose mechanism is inspired
by the functionality of biological neural circuits. An artificial neural network consists of several
processing units that are interconnected.

This is the first proposal when the neural model is built. The content of the neuron's local memory
contains a vector of weight.

The single vector perceptron is calculated by calculating the sum of the input vector multiplied by the
corresponding element of the vector, with each increasing the amount of the corresponding component
of the vector by weight. The value that is displayed in the output is the input of an activation function.

Let us focus on the implementation of a single-layer perceptron for an image classification problem
using TensorFlow. The best example of drawing a single-layer perceptron is through the representation
of "logistic regression."

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Now, We have to do the following necessary steps of training logistic regression:

• The weights are initialized with the random values at the origination of each training
• For each element of the training set, the error is calculated with the difference between the desired
output and the actual output. The calculated error is used to adjust the weight
• The process is repeated until the fault made on the entire training set is less than the specified limit
until the maximum number of iterations has been reached

141
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

142
Complete code of Single layer perceptron
143
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
The output of the Code:

The logistic regression is considered as predictive analysis. Logistic regression is mainly used to
describe data and use to explain the relationship between the dependent binary variable and one or
many nominal or independent variables.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

IV.2.2 Hidden Layer Perceptron in TensorFlow

A hidden layer is an artificial neural network that is a layer in between input layers and output layers.
Where the artificial neurons take in a set of weighted inputs and produce an output through an
activation function. It is a part of nearly and neural in which engineers simulate the types of activity
that go on in the human brain.

The hidden neural network is set up in some techniques. In many cases, weighted inputs are
randomly assigned. On the other hand, they are fine-tuned and calibrated through a process
called backpropagation.

144
The artificial neuron in the hidden layer of perceptron works as a biological neuron in the brain- it takes
in its probabilistic input signals, and works on them. And it converts them into an output corresponding
to the biological neuron's axon.

Layers after the input layer are called hidden because they are directly resolved to the input. The
simplest network structure is to have a single neuron in the hidden layer that directly outputs the
value.

Deep learning can refer to having many hidden layers in our neural network. They are deep because
they will have been unimaginably slow to train historically, but may take seconds or minutes to prepare
using modern techniques and hardware.

A single hidden layer will build a simple network.

The code for the hidden layers of the perceptron is shown below:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

145
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

146
147
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
IV.2.3 Artificial Neural Network in TensorFlow

Neural Network or artificial neural network (ANN) are modeled the same as the human brain. The
human brain has a mind to think and analyze any task in a particular situation.

But how can a machine think like that? For the purpose, an artificial brain was designed is known as a
neural network. The neural network is made up many perceptrons.

Perceptron is a single layer neural network. It is a binary classifier and part of supervised learning. A
simple model of the biological neuron in an artificial neural network is known as the perceptron.

The artificial neuron has input and output.

Human brain has neurons for passing information,


similarly neural network has nodes to perform the
same task. Nodes are the mathematical functions.
A neural network is based on the structure and
Representation of perceptron model function of biological neural networks.
mathematically.
A neural network itself changes or learn based on
input and output. The information flows through
the system affect the structure of the artificial
neural network because of its learning and
improving the property.

A Neural Network is also defined as:


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

A computing system made of several simple, highly interconnected processing elements, which process
information by its dynamic state response to external inputs.

A neural network can be made with multiple perceptrons. Where there are three layers:

• Input layer: Input layers are the real value from the data
• Hidden layer: Hidden layers are between input and output layers where three or more layers are
deep network
• Output layer: It is the final estimate of the output

148
IV.2.4 Types of Artificial Neural Network

Neural Network works the same as the human


nervous system functions. There are several
types of neural network. These networks
implementation are based on the set of parameter
and mathematical operation that are required for
determining the output.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


IV.2.5 Feedforward Neural Network
(Artificial Neuron)

FNN is the purest form of ANN in which input


and data travel in only one direction. Data flows
in an only forward direction; that's why it is
known as the Feedforward Neural Network. The
data passes through input nodes and exit from
the output nodes. The nodes are not connected
cyclically. It doesn't need to have a hidden layer.
In FNN, there doesn't need to be multiple layers.
It may have a single layer also.

149
It has a front propagate wave that is achieved by using a classifying activation function. All other types of
neural network use backpropagation, but FNN can't. In FNN, the sum of the product's input and weight
are calculated, and then it is fed to the output. Technologies such as face recognition and computer
vision are used FNN.

IV.2.6 Redial basis function Neural Network

RBFNN find the distance of a point to the centre and considered it to work smoothly. There are two
layers in the RBF Neural Network. In the inner layer, the features are combined with the radial basis
function. Features provide an output that is used in consideration. Other measures can also be used
rather than Euclidean.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Redial Basis Function

• We define a receptor t
• Confronted maps are drawn around the receptor
• For RBF Gaussian Functions are generally used. So we can define the radial distance r=||X-t||

Redial Function=Φ(r) = exp (- r2/2σ2), where σ > 0

This Neural Network is used in power restoration system. In the present era power system have
increased in size and complexity. It's both factors increase the risk of major power outages. Power
needs to be restored as quickly and reliably as possible after a blackout.

150
IV.2.7 Multilayer Perceptron

A Multilayer Perceptron has three or more layer. The data that cannot be separated linearly is classified
with the help of this network. This network is a fully connected network that means every single node
is connected with all other nodes that are in the next layer. A Nonlinear Activation Function is used in
Multilayer Perceptron. It's input and output layer nodes are connected as a directed graph. It is a deep
learning method so that for training the network it uses backpropagation. It is extensively applied in
speech recognition and machine translation technologies.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

151
IV.2.8 Convolutional Neural Network

In image classification and image recognition, a Convolutional Neural Network plays a vital role, or we
can say it is the main category for those. Face recognition, object detection, etc., are some areas where
CNN are widely used. It is similar to FNN, learn-able weights and biases are available in neurons.

CNN takes an image as input that is classified and process under a certain category such as dog, cat,
lion, tiger, etc. As we know, the computer sees an image as pixels and depends on the resolution of the
picture. Based on image resolution, it will see h * w * d, where h= height w= width and d= dimension.
For example, An RGB image is 6 * 6 * 3 array of the matrix, and the grayscale image is 4 * 4 * 3 array
of the pattern.

In CNN, each input image will pass through a sequence of convolution layers along with pooling, fully
connected layers, filters (Also known as kernels). And apply Soft-max function to classify an object
with probabilistic values 0 and 1.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

IV.2.9 Recurrent Neural Network

Recurrent Neural Network is based on prediction. In this neural network, the output of a particular
layer is saved and fed back to the input. It will help to predict the outcome of the layer. In Recurrent
Neural Network, the first layer is formed in the same way as FNN's layer, and in the subsequent layer,
the recurrent neural network process begins.

Both inputs and outputs are independent of each other, but in some cases, it required to predict the
next word of the sentence.

Then it will depend on the previous word of the sentence. RNN is famous for its primary and most
important feature, i.e., Hidden State. Hidden State remembers the information about a sequence.

152
RNN has a memory to store the result after
calculation. RNN uses the same parameters on
each input to perform the same task on all the
hidden layers or data to produce the output.
Unlike other neural networks, RNN parameter
complexity is less.

IV.2.10 Modular Neural Network

In Modular Neural Network, several different networks are functionally independent. In MNN the
task is divided into sub-task and perform by several systems. During the computational process,
networks don't communicate directly with each other. All the interfaces are work independently
towards achieving the output. Combined networks are more powerful than flat and unrestricted.
Intermediary takes the production of each system, process them to produce the final output.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

153
IV.2.11 Sequence to Sequence Network

It is consist of two recurrent neural networks. Here, encoder processes the input and decoder processes
the output. The encoder and decoder can either use for same or different parameter.

Sequence-to-sequence models are applied in chatbots, machine translation, and question answering
systems.

IV.2.12 Components of an Artificial Neural Network

Neurons

Neurons are similar to the biological neurons. Neurons are nothing but the activation function. Artificial
neurons or Activation function has a "switch on" characteristic when it performs the classification task.
We can say when the input is higher than a specific value; the output should change state, i.e., 0 to 1,
-1 to 1, etc. The sigmoid function is commonly used activation function in Artificial Neural Network.

F (Z) = 1/1+EXP (-Z)


Nodes

The biological neuron is connected in hierarchical


networks, with the output of some neurons
being the input to others. These networks are
represented as a connected layer of nodes. Each
node takes multiple weighted inputs and applies
to the neuron to the summation of these inputs
and generates an output.
Bias
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

In the neural network, we predict the output (y) based on the given input (x). We create a model, i.e.
(mx + c), which help us to predict the output. When we train the model, it finds the appropriate value
of the constants m and c itself.

The constant c is the bias. Bias helps a model in such a manner that it can fit best for the given data.
We can say bias gives freedom to perform best.

Algorithm

Algorithms are required in the neural network. Biological neurons have self-understanding and
working capability, but how an artificial neuron will work in the same way? For this, it is necessary to
train our artificial neuron network. For this purpose, there are lots of algorithms used. Each algorithm
has a different way of working.

154
IV.3 Classification of Neural Network in TensorFlow
Artificial neural networks are computational models which are inspired by biological neural networks,
and it is composed of a large number of highly interconnected processing elements called neurons.

An ANN (Artificial Neural network) is configured for a specific application, such as pattern recognition
or data classification.

It can derive meaning from complicated or imprecise data.

It extracts patterns and detects trends that are too complex to be noticed by either humans or other
computer techniques.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Transfer Function

The behavior of ANN (Artificial Neural Network) depends on both the weights and the input-output
function, which is specified for the unit. This function falls into one of these three categories:

• Linear (or ramp)


• Threshold
• Sigmoid

Linear units: The output activity is proportional to the total weighted output in linear units.

155
Threshold: The output is set at one of two levels, depending on whether the total input is greater than
or less than some threshold value.

Sigmoid units: The output varies continuously but not linearly as the input changes. Sigmoid units bear
a more considerable resemblance to real neurons than do linear or threshold units, but all three must
be considered rough approximations.

Below is the code by which we classify the neural network.

Firstly, we made an activation function so that we have to plot as POPC and to create the sigmoid
function, which is an effortless activation function takes in Z to make the sigmoid.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

156
Then, we make the operation which inherits sigmoid. So let's see a classification example and sikat
learn has a helpful function and capabilities to create data set for us. And then we are going to say my
data is equal to make blobs. It just creates a couple of blobs there that we can classify. So, we have to
create 50 samples and the number of features to a status that's going to make two blobs, so this is just
a binary classification problem.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

157
Now, we have to create the scatterplot of features all the rows in column 0 and so if we do scatterplot
of two distinctive blobs and able to classify these two highly separable classes.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

158
159
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
Here, we're going to build a matrix of one that's a matrix of one by two. And then, we pass that into
our sigmoid function say sigmoid Z because that's necessarily going to output is 0 or 1 for us as we're
classifying them based on whether it is positive or negative.

The more positive input, the more sure our model is going to be that it belongs to the one class.

So now we were able to successfully use our


graph objects placeholders variables activation
functions to the recession and able to perform a
very simple classification. And hopefully, soon we
know how to do this manually it's going to make
learning tensor flow a lot and easier in performing
all essential functions with the TensorFlow.

IV.4 Linear Regression in TensorFlow


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Linear Regression is a machine learning algorithm that is based on supervised learning. It performs
a regression function. The regression models a target predictive value based on the independent
variable. It is mostly used to detect the relation between variables and forecasts.

Linear regression is a linear model; for example, a model that assumes a linear relationship between
an input variable (x) and a single output variable (y). In particular, y can be calculated by a linear
combination of input variables (x).

Linear regression is a prevalent statistical method that allows us to learn a function or relation from a
set of continuous data. For example, we are given some data point of x and the corresponding, and we
need to know the relationship between them, which is called the hypothesis.

In the case of linear regression, the hypothesis is a straight line, that is,

h (x) = wx + b

160
Where w is a vector called weight, and b is a scalar called Bias. Weight and bias are called parameters
of the model.

We need to estimate the value of w and b from the set of data such that the resultant hypothesis
produces at least cost 'j,' which has been defined by the below cost function.

Where m is the data points in the particular dataset.

This cost function is called the Mean Squared Error.

For optimization of parameters for which the value of j is minimal, we will use a commonly used
optimizer algorithm, called gradient descent. The following is pseudocode for gradient descent:

Implementation of Linear Regression

We will start to import the necessary libraries in Tensorflow. We will use Numpy with Tensorflow for
computation and Matplotlib for plotting purposes.

First, we have to import packages:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


To make the random numbers predicted, we have to define fixed seeds for both Tensorflow and Numpy.

161
Now, we have to generate some random data for training the Linear Regression Model.

Let us visualize the training data. Output


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Now, we will start building our model by defining placeholders x and y, so that we feed the training
examples x and y into the optimizer while the training process.

Now, we can declare two trainable TensorFlow variables for the bias and Weights initializing them
randomly using the method:

162
Now we define the hyperparameter of the model, the learning rate and the number of Epochs.

Now, we will build Hypothesis, Cost Function and Optimizer. We will not manually implement the
Gradient Decent Optimizer because it is built inside TensorFlow. After that, we will initialize the
variables in the method.

Now we start the training process inside the TensorFlow Session.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

163
Output is given below:
Epoch: 50 cost = 5.8868037 W = 0.9951241 b = 1.2381057
Epoch: 100 cost = 5.7912708 W = 0.9981236 b = 1.0914398
Epoch: 150 cost = 5.7119676 W = 1.0008028 b = 0.96044315
Epoch: 200 cost = 5.6459414 W = 1.0031956 b = 0.8434396
Epoch: 250 cost = 5.590798 W = 1.0053328 b = 0.7389358
Epoch: 300 cost = 5.544609 W = 1.007242 b = 0.6455922
Epoch: 350 cost = 5.5057884 W = 1.008947 b = 0.56223
Epoch: 400 cost = 5.473068 W = 1.01047 b = 0.46775345
Epoch: 450 cost = 5.453845 W = 1.0118302 b = 0.42124168
Epoch: 500 cost = 5.421907 W = 1.0130452 b = 0.36183489
Epoch: 550 cost = 5.4019218 W = 1.0141305 b = 0.30877414
Epoch: 600 cost = 5.3848578 W = 1.0150996 b = 0.26138115
Epoch: 650 cost = 5.370247 W = 1.0159653 b = 0.21905092
Epoch: 700 cost = 5.3576995 W = 1.0167387 b = 0.18124212
Epoch: 750 cost = 5.3468934 W = 1.0174294 b = 0.14747245
Epoch: 800 cost = 5.3375574 W = 1.0180461 b = 0.11730932
Epoch: 850 cost = 5.3294765 W = 1.0185971 b = 0.090368526
Epoch: 900 cost = 5.322459 W = 1.0190894 b = 0.0663058
Epoch: 950 cost = 5.3163588 W = 1.0195289 b = 0.044813324
Epoch: 1000 cost = 5.3110332 W = 1.0199218 b = 0.02561669

Now, see the result.

Output
Training cost= 5.3110332 Weight= 1.0199214 bias=0.02561663

Note that in this case, both weight and bias are scalars in order. This is because we have examined only
one dependent variable in our training data. If there are m dependent variables in our training dataset,
the weight will be a one-dimensional vector while Bias will be a scalar.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Finally, we will plotting our result: Output

164
V. Keras Basis
V.Keras Basis

165
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
V. Keras Basis
Keras is an open-source high-level Neural Network library, which is written in Python is capable
enough to run on Theano, TensorFlow, or CNTK. It was developed by one of the Google engineers,
Francois Chollet. It is made user-friendly, extensible, and modular for facilitating faster experimentation
with deep neural networks. It not only supports Convolutional Networks and Recurrent Networks
individually but also their combination.

It cannot handle low-level computations, so it makes use of the Backend library to resolve it. The
backend library act as a high-level API wrapper for the low-level API, which lets it run on TensorFlow,
CNTK, or Theano.
V.1 Kears Layers
Focus on user experience has always been a major part of Keras.

• Large adoption in the industry


• It is a multi backend and supports multi-platform, which helps all the encoders come together for
coding
• Research community present for Keras works amazingly with the production community
• Easy to grasp all concepts
• It supports fast prototyping
• It seamlessly runs on CPU as well as GPU
• It provides the freedom to design any architecture, which then later is utilized as an API for the
project
• It is really very simple to get started with
• Easy production of models actually makes Keras special

Keras being a model-level library helps in developing deep learning models by offering high-level building
blocks. All the low-level computations such as products of Tensor, convolutions, etc. are not handled
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

by Keras itself, rather they depend on a specialized tensor manipulation library that is well optimized
to serve as a backend engine. Keras has managed it so perfectly that instead of incorporating one
single library of tensor and performing operations related to that particular library, it offers plugging of
different backend engines into Keras.

Keras consist of three backend engines, which are as follows:

166
TensorFlow

TensorFlow is a Google product, which is one


of the most famous deep learning tools widely
used in the research area of machine learning and
deep neural network. It came into the market on
9th November 2015 under the Apache License
2.0. It is built in such a way that it can easily run
on multiple CPUs and GPUs as well as on mobile
operating systems. It consists of various wrappers
in distinct languages such as Java, C++, or Python.

Theano

Theano was developed at the University of


Montreal, Quebec, Canada, by the MILA group.
It is an open-source python library that is widely
used for performing mathematical operations on
multi-dimensional arrays by incorporating scipy
and numpy. It utilizes GPUs for faster computation
and efficiently computes the gradients by building
symbolic graphs automatically. It has come out to
be very suitable for unstable expressions, as it first
observes them numerically and then computes

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


them with more stable algorithms.

CNTK

Microsoft Cognitive Toolkit is deep learning's


open-source framework. It consists of all the
basic building blocks, which are required to form
a neural network. The models are trained using
C++ or Python, but it incorporates C# or Java to
load the model for making predictions.

167
V.1.2 Keras Convolution Neural Network
Layers and Working

We widely use Convolution Neural Networks for


computer vision and image classification tasks.
The Convolution Neural Network architecture
generally consists of two parts. The first part is
the feature extractor which we form from a series
of convolution and pooling layers. The second
part includes fully connected layers which act as
classifiers.

In this section, we will study how to use Convolution


Neural Networks for image classification tasks.
We will walk through a few examples to show
the code for the implementation of Convolution
Neural Networks in Keras.
The convolution neural network algorithm is the result of continuous advancements in computer
vision with deep learning.CNN is a Deep learning algorithm that is able to assign importance to various
objects in the image and able to differentiate them.

CNN has the ability to learn the characteristics and perform classification. An input image has many
spatial and temporal dependencies, CNN captures these characteristics using relevant filters/kernels.A
Kernel or filter is an element in CNN that performs convolution around the image in the first part. The
kernel moves to the right and shifts according to the stride value. Every time during convolution a
matrix multiplication operation is performed.

After convolution, we obtain another image with a different height, width, and depth. We obtain more
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

channels than just RGB but less width and height.We slide each filter though out the image step by
step, this step in the forward pass is called stride.

V.1.3 Keras Convolution layer

It is the first layer to extract features from the input image. Here we define the kernel as the layer
parameter. We perform matrix multiplication operations on the input image using the kernel.
Example:
Suppose a 3*3 image pixel and a 2*2 filter as shown:
pixel : [[1,0,1],
[0,1,0],
[1,0,1]]
filter : [[1,0],
[0,1]]
The restaurant matrix a�er convolu�on of filter would be:
[[2,0],
[0,2]]

168
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™
V.1.4 Keras Pooling Layer

After convolution, we perform pooling to reduce the number of parameters and computations.There
are different types of pooling operations, the most common ones are max pooling and average pooling.
Example:

Take a sample case of max pooling with 2*2 filter and stride 2.
Image pixels:
[[1,2,3,4],
[5,6,7,8],
[3,4,5,6],
[6,7,8,9]]
The resultant matrix a�er max-pooling would be:
[[6,8],
[7,9]]

169
V.1.5 Keras Dropout Layer

It is used to prevent the network from overfitting.


In this layer, some fraction of units in the network
is dropped in training such that the model is
trained on all the units.

A series of convolution and pooling layers


are used for feature extraction. After that, we
construct densely connected layers to perform
classification based on these features.

V.1.6 Keras Flatten Layer

It is used to convert the data into 1D arrays to create a single feature vector. After flattening we
forward the data to a fully connected layer for final classification.
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

170
V.1.7 Keras Dense Layer

It is a fully connected layer. Each node in this layer is connected to the previous layer i.e densely
connected. This layer is used at the final stage of CNN to perform classification.

V1.8 Implementing CNN on CIFAR 10 Dataset

CIFAR 10 dataset consists of 10 image classes. The available image classes are:

• Car
• Airplane
• Bird
• Cat
• Deer
• Dog
• Frog

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


• Horse
• Ship
• Truck

This is one of the most popular datasets that allow researchers to practice different algorithms for
object recognition. Convolution Neural Networks have shown the best results in solving the CIFAR-10
problem.

171
Let’s build our Convolution model to recognize CIFAR-10 classes.

1. Load the dataset from keras datasets module

2. To visualize the dataset

3. The dataset looks like


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

172
4. Normalizing inputs

5. One hot encoding

6. Build the model

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


7. Model Compiling

173
8. Analyzing Model Summary

9. Train the model and check its accuracy on test data


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

174
10. Evaluate the model

V.1.9 Implementing CNN on Fashion MNIST Datase

The Fashion MNIST dataset consists of a training set of 60000 images and a testing set of 10000
images. There are 10 image classes in this dataset and each class has a mapping corresponding to the
following labels:

• T-shirt/top
• Trouser
• Pullover
• Dress
• Coat
• Sandals
• Shirt
• Sneaker
• Bag
• Ankle boot

Let’s build our CNN model on this dataset.


1. Import required modules

2. Load the dataset ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

175
3. Reshaping and one hot encoding

4. Visualize the dataset using matplotlib


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

5. Normalizing data

176
6. Build the model

7. Training our model

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


8. Evaluate our Model Perfermance

177
V.2 Deep Learning with Keras Implementation and Example
In this Keras section, we will walk through deep learning with keras and an important deep learning
algorithm used in keras. We will study the applications of this algorithm and also its implementation
in Keras.

Deep Learning is a subset of machine learning which concerns the algorithms inspired by the
architecture of the brain. In the last decade, there have been many major developments to support
deep learning research. Keras is the result of one of these recent developments which allow us to
define and create neural network models in a few lines of code.

There has been a boom in the research of Deep Learning algorithms. Keras ensures the ease of users
to create these algorithms.

But before we begin with Tensorflow Keras Deep learning article, let us do keras installation.

Below are mentioned some of the popular algorithms in deep learning:

• Auto-Encoders
• Convolution Neural Nets
• Recurrent Neural Nets
• Long Short Term Memory Nets
• Deep Boltzmann Machine(DBM)
• Deep Belief Nets(DBN)

There are implementations of convolution neural nets, recurrent neural nets, and LSTM

Here we will take a tour of Auto Encoders algorithm of deep learning.


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

Auto-Encoders

These types of neural networks are able to compress the input data and reconstruct it again. These are
very old deep learning algorithms. It encodes the input up to a bottleneck layer and then decodes it to
get the input back. At the bottleneck layer, we get a compressed form of input.

Anomaly detection and denoising an image are a few of the major applications of Auto-Encoders.

178
Types of Auto-Encoders

There are seven types of deep learning auto encoders as mentioned below:

• Denoising autoencoders
• Deep autoencoders
• Sparse autoencoders
• Contractive autoencoders
• Convolutional autoencoders
• Variational autoencoders
• Undercomplete autoencoders

For our study, we will create a Denoising autoencoder.

Implementation of Denoising Auto-encoder in Keras

For the purpose of its implementation in Keras, we will work on MNIST handwritten digit dataset.

Firstly, we will introduce some noise in the MNIST images. Then we will create an Auto – Encoder for
removing noise from the images and reconstruct the original images.

1. Import required modules

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


2. Load MNIST images from datasets module of keras

3. Convert dataset in range of 0 to 1

179
4. Introducing noise in MNIST images using Gaussian distribution

5. Visualize the noise introduced


ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

180
6. Specify input layer and create model

7. Encoded is the bottleneck layer and consists of a compressed form of images

8. Train the autoencoder

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

9. Get prediction on noisy data

181
10. Again visualize the reconstructed images
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

You can see our Auto Encoder is able to reconstruct the images and remove its noise. We will get
better quality if we increase the epoch count of training.

To conclude, we have seen Deep learning with Keras implementation and example. This article
concerns the Keras library and its support to deploy major deep learning algorithms. It also introduces
you to Auto-Encoders, its different types, its applications, and its implementation. It explains how to
build a neural network for removing noise from our data.

182
V.3 Keras Vs Tensorflow – Difference Between Keras and Tensorflow
Keras and Tensorflow are two very popular deep learning frameworks. Deep Learning practitioners
most widely use Keras and Tensorflow. Both of these frameworks have large community support.
Both of these frameworks capture a major fraction of deep learning production.

There are some differences between Keras and Tensorflow, which will help you choose between the
two. We will provide you better insights on both these frameworks.

Following points will help you to learn comparison between tensorflow and keras to find which one is
more suitable for you.

Complexity

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


Keras allows the development of models without the worry of backend details. While in TensorFlow
you have to deal with computation details in the form of tensors and graphs.

This feature of Keras provides more comfort and makes it less complex than TensorFlow.

Easy to Use API

Keras is a high-level API. Keras uses either Tensorflow, Theano, or CNTK as its backend engines.

Tensorflow provides both high and low-level APIs. Tensorflow is a math library that uses data flow
programming for a wide variety of tasks.

If you are looking for a neural network tool that is easy to use and has simple syntax then you will find
Keras more favorable.

183
Fast Development

If you want to quickly deploy and test your deep learning models, choose Keras. Using Keras, you can
create your models with very less lines of code and within a few minutes. Keras provides two APIs to
write your neural network. These are:

• Model(functional API)
• Sequential

With these APIs, you can easily create any complex neural network.

Performance

Since Keras is not directly responsible for the backend computation, Keras is slower. Keras depends
upon its backend engines for computation tasks. It provides an abstraction over its backend. To perform
the underlying computations and training Keras calls its backend.

On the other hand, Tensorflow is a symbolic math library. Its complex architecture focuses on reducing
cognitive load for computation. Hence, Tensorflow is fast and provides high performance.

Functionality and Flexibility

Tensorflow gives you more flexibility, more control, and advanced features for the creation of complex
topologies. It provides more control over your network. Therefore if you want to define your own cost
function, metric, or layer Or, if you want to perform operations on input weights or gradients, choose
TensorFlow.

Dataset
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

We prefer Keras if the size of the dataset is of relatively small or medium size. While if the dataset
is large, we prefer TensorFlow because of fewer overheads. Also, TensorFlow provides more level of
control, hence we have more options to handle large datasets.

Tensorflow provides more number of inbuilt datasets than Keras. It contains all the datasets that are
available in Keras and tf.datasets module of TensorFlow contains a wide range of dataset and these are
classified under the following headings:

Audio, Image, Image classification, object detection, question answering, structured, summarization,
text, translate, and video.

The datasets in Keras are present under the Keras.datasets module.

184
Debug

Debugging the TensorFlow code is very difficult. In general, we perform de-bugging in TensorFlow
debugger and done through the command line. We start by wrapping the TensorFlow session with,tf_
debug.LocalCLIDebugWrapperSession(session), and then we execute the file with different necessary
debug flags.

Keras is high level and does not deal with backend computation, therefore debugging is easy. We can
also check the output from each layer in Keras using keras.backend.function().

Popularity

Keras has 48.7k stars on github and 18.4k fork on github. WhereasTensorflow has 146k stars and
81.7k forks on github.

Since both Keras and TensorFlow were released in 2015, it’s clear that TensorFlow has a larger
developer community.

Other than the above factors, you should be aware that Tensorflow also provides support for Keras.
Tensorflow provides tf.keras sub-module that allows you to drop Tensorflow code directly into Keras
models. You can obtain features of both Keras and Tensorflow using tf.keras, i.e you can get the best
of both worlds.

The below code describes how to use tf.keras to create your models:

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

185
VI. References
Papers

[1] L. Xu, F. Cai, Y. Hu, Z. Lin, and Q. Liu, “Using deep learning algorithms to perform accurate
spectral classification,” Optik, vol. 231, p. 166423, Apr. 2021, doi: 10.1016/j.ijleo.2021.166423.

[2] J. Gordon and J. M. Hernández-Lobato, “Combining deep generative and discriminative


models for Bayesian semi-supervised learning,” Pattern Recognit., vol. 100, p. 107156, Apr. 2020, doi:
10.1016/j.patcog.2019.107156.

[3] L. Zeng et al., “Deep learning trained algorithm maintains the quality of half-dose contrast-
enhanced liver computed tomography images: Comparison with hybrid iterative reconstruction: Study
for the application of deep learning noise reduction technology in low dose,” Eur. J. Radiol., vol. 135, p.
109487, Feb. 2021, doi: 10.1016/j.ejrad.2020.109487.

[4] S. Khan, N. Islam, Z. Jan, I. Ud Din, and J. J. P. C. Rodrigues, “A novel deep learning based
framework for the detection and classification of breast cancer using transfer learning,” Pattern
Recognit. Lett., vol. 125, pp. 1–6, Jul. 2019, doi: 10.1016/j.patrec.2019.03.022.

[5] Y. He, P. Wu, Y. Li, Y. Wang, F. Tao, and Y. Wang, “A generic energy prediction model of machine
tools using deep learning algorithms,” Appl. Energy, vol. 275, p. 115402, Oct. 2020, doi: 10.1016/j.
apenergy.2020.115402.

[6] M. Jiang, J. Liu, L. Zhang, and C. Liu, “An improved Stacking framework for stock index prediction
by leveraging tree-based ensemble models and deep learning algorithms,” Phys. Stat. Mech. Its Appl.,
vol. 541, p. 122272, Mar. 2020, doi: 10.1016/j.physa.2019.122272.

[7] D. Kißkalt, A. Mayr, B. Lutz, A. Rögele, and J. Franke, “Streamlining the development of data-
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

driven industrial applications by automated machine learning,” Procedia CIRP, vol. 93, pp. 401–406,
Jan. 2020, doi: 10.1016/j.procir.2020.04.009.

[8] Y. Chen, X. Zou, K. Li, K. Li, X. Yang, and C. Chen, “Multiple local 3D CNNs for region-based
prediction in smart cities,” Inf. Sci., vol. 542, pp. 476–491, Jan. 2021, doi: 10.1016/j.ins.2020.06.026.

[9] T. D. Akinosho et al., “Deep learning in the construction industry: A review of present status and
future innovations,” J. Build. Eng., vol. 32, p. 101827, Nov. 2020, doi: 10.1016/j.jobe.2020.101827.

[10] M.-A. Zamora-Hernández, J. A. Castro-Vargas, J. Azorin-Lopez, and J. Garcia-Rodriguez, “Deep


learning-based visual control assistant for assembly in Industry 4.0,” Comput. Ind., vol. 131, p. 103485,
Oct. 2021, doi: 10.1016/j.compind.2021.103485.

186
[11] R. Espinosa, H. Ponce, and S. Gutiérrez, “Click-event sound detection in automotive industry
using machine/deep learning,” Appl. Soft Comput., vol. 108, p. 107465, Sep. 2021, doi: 10.1016/j.
asoc.2021.107465.

[12] J. Leng et al., “A loosely-coupled deep reinforcement learning approach for order acceptance
decision of mass-individualized printed circuit board manufacturing in industry 4.0,” J. Clean. Prod.,
vol. 280, p. 124405, Jan. 2021, doi: 10.1016/j.jclepro.2020.124405.

[13] M. Mishra, J. Nayak, B. Naik, and A. Abraham, “Deep learning in electrical utility industry: A
comprehensive review of a decade of research,” Eng. Appl. Artif. Intell., vol. 96, p. 104000, Nov. 2020,
doi: 10.1016/j.engappai.2020.104000.

[14] P. Tripicchio and S. D’Avella, “Is Deep Learning ready to satisfy Industry needs?,” Procedia
Manuf., vol. 51, pp. 1192–1199, Jan. 2020, doi: 10.1016/j.promfg.2020.10.167.

[15] R. Oberleitner and J. Schwartz, “5.29 Integrating Deep Learning With Behavior Imaging to
Accelerate Industry Learning of Autism Core Deficits,” J. Am. Acad. Child Adolesc. Psychiatry, vol. 56,
no. 10, Supplement, p. S263, Oct. 2017, doi: 10.1016/j.jaac.2017.09.312.

[16] T. Kotsiopoulos, P. Sarigiannidis, D. Ioannidis, and D. Tzovaras, “Machine Learning and Deep
Learning in smart manufacturing: The Smart Grid paradigm,” Comput. Sci. Rev., vol. 40, p. 100341, May
2021, doi: 10.1016/j.cosrev.2020.100341.

[17] C. Yang, H. Lan, F. Gao, and F. Gao, “Review of deep learning for photoacoustic imaging,”
Photoacoustics, vol. 21, p. 100215, Mar. 2021, doi: 10.1016/j.pacs.2020.100215.

[18] L. Zhu, P. Spachos, E. Pensini, and K. N. Plataniotis, “Deep learning and machine vision for
food processing: A survey,” Curr. Res. Food Sci., vol. 4, pp. 233–249, Jan. 2021, doi: 10.1016/j.
crfs.2021.03.009.

ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™


[19] X. Xu, J. Wang, B. Zhong, W. Ming, and M. Chen, “Deep learning-based tool wear prediction and
its application for machining process using multi-scale feature fusion and channel attention mechanism,”
Measurement, vol. 177, p. 109254, Jun. 2021, doi: 10.1016/j.measurement.2021.109254.

[20] S. Shajun Nisha, M. Mohamed Sathik, and M. Nagoor Meeral, “3 - Application, algorithm, tools
directly related to deep learning,” in Handbook of Deep Learning in Biomedical Engineering, V. E. Balas,
B. K. Mishra, and R. Kumar, Eds. Academic Press, 2021, pp. 61–84. doi: 10.1016/B978-0-12-823014-
5.00007-7.

187
Books

• https://www.pdfdrive.com/introduction-to-deep-learning-using-r-a-step-by-step-guide-to-
learning-and-implementing-deep-learning-models-using-r-e158252417.html
• https://www.pdfdrive.com/learn-keras-for-deep-neural-networks-a-fast-track-approach-to-
modern-deep-learning-with-python-e185770502.html
• https://www.pdfdrive.com/applied-deep-learning-a-case-based-approach-to-understanding-
deep-neural-networks-e176380114.html
• https://www.pdfdrive.com/deep-learning-adaptive-computation-and-machine-
learning-e176370174.html
• https://www.pdfdrive.com/deep-learning-in-python-master-data-science-and-machine-learning-
with-modern-neural-networks-written-in-python-theano-and-tensorflow-e196480537.html
• https://www.pdfdrive.com/deep-learning-with-python-e54511249.html
• https://www.pdfdrive.com/learning-tensorflow-a-guide-to-building-deep-learning-
systems-e158557113.html
• https://www.pdfdrive.com/deep-learning-with-applications-using-python-chatbots-and-face-
object-and-speech-recognition-with-tensorflow-and-keras-e184016771.html
• https://www.pdfdrive.com/mastering-machine-learning-with-python-in-six-steps-a-practical-
implementation-guide-to-predictive-data-analytics-using-python-e168776616.html
• https://hackr.io/blog/artificial-intelligence-books

Tutorials

• https://www.fast.ai
• https://www.coursera.org/learn/machine-learning
• https://www.coursera.org/specializations/deep-learning
• https://www.udemy.com/course/machinelearning/
• https://www.edx.org/professional-certificate/harvardx-data-science
• https://www.udacity.com/course/intro-to-machine-learning-nanodegree--nd229
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

• https://online.stanford.edu/courses/cs229-machine-learning
• https://www.edx.org/learn/machine-learning
• https://learn.datacamp.com/courses/introduction-to-machine-learning-with-r

Talks and Webinars

• https://www.brighttalk.com/topic/deep-learning/
• https://www.dataiku.com/webinars/

188
www.certiprof.com

189
ARTIFICIAL INTELLIGENCE EXPERT CERTIFICATE CAIEC™

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy