introduction_to_deep_learning
introduction_to_deep_learning
Neural Networks
Even with the advancements of technology and algorithms, there is still no match to the
capabilities of the human brain which hasn't been fully understood yet. The core
components of our nervous system or the brain are the brain cells or nerve cells which
are also referred as “neurons”. These cells are connected to one another to form a
complex network structure known as “Neural Network”.
The above figure represents the structure of a neuron which has three major
components, namely,
1. Cell body
2. Dendrite
3. Axon
Dendrites receive signal from other neurons and bring it to the cell body, where the
signal is processed. The axon transmits the processed signal from the cell body to
dendrites of the neighboring cells to which it is connected.
The study of artificial neural networks (ANN) is inspired by attempts to simulate the
biological neural system. ANN consists of interconnected artificial neurons
or nodes, analogous to human/biological neuron network. Just like a biological neuron
receives an input signal, processes it and transmits the output to other neurons, in
ANN, a node receives the input, processes it using a function known as activation
function and transmits the output to other nodes.
The figure below shows the analogy between a neuron of the human brain and the node
of ANN.
The figure below shows part of a biological neural network and the Artificial neural
network (ANN).
Components of ANN
Artificial Neurons/Nodes: These are the elementary units in a neural network. An
artificial neuron is a computational unit.
Layers: These are group of neurons at different levels. An ANN has one input layer, one
or more hidden layers and one output layer. Nodes in input layer are the nodes to which
we feed in the input feature values. Output layer nodes give us the output or target
values. The layers between input and output layers are called as hidden layers.
Weights and Biases: Weights are numerical parameters which determine how strongly
each of the neurons affects the other. Bias is special neuron with the value 1. It is added
to each pre-output layer.
Activation function: An activation function is a non-linear mathematical function which
converts the input values to an output. Without activation functions, the working of a
neural network will be like linear models.
Lets now consider an example to demonstrate how an ANN is used.
Assume we have a lot of past medical data about diabetic patients. We want to make a
model to predict whether a person is “Diabetic” or “Non-Diabetic” based on the inputs,
“Sugar Level” and “Age”.
We will see how an ANN is trained with the past data for this purpose.
Black Box Example - Diabetic prediction:
The example we considered can be compared to black box to predict whether the
person is “Diabetic” or “Non- Diabetic” on the following input.
INPUT:
Sugar level
Age
OUTPUT:
Diabetic
Non- Diabetic
Layers in ANN
The problem can be modelled as a neural network as shown below. Let us learn about
each of the layers in detail to understand how the model works. Note that the input to
the model is the age and the sugar level and the output is a binary decision if a patient is
diabetic.
Input Layer
• It is the first layer in any neural network.
• It brings the initial data into the system for further processing by subsequent
layers of artificial neurons. (It is the only layer where no computation happens
i.e. no activation function is applied.)
• Here we will pass “Sugar Level” and “Age” as input parameters to the neurons in
the input layer.
Hidden Layer
• Layers between input layer and output layer in any ANN are called hidden layers.
• It takes a set of weighted input and applies a non-linear activation function to it.
• Each of these hidden layers can contain same or different number of neurons.
Output Layer
• It is the last layer in any neural network, from which we get the final target values.
• Usually, output layer nodes also have activation fuctions.
• The neurons of this layer will give us the output whether the person is “Diabetic”
or “Non-Diabetic”.
Learning Weights and Biases
Weights and biases are parameters that are learned (or adjusted) to produce the
desired outputs from the given inputs. Mathematical techniques like stochastic
Gradient Descent, Adam etc. are used to search for the optimal set of weights that will
make the model accurate in its prediction.
Activation Function
An activation function (also known as transfer function) is a non-linear mathematical
function which converts input values to an output. It helps neural networks find non-
linear relations in data.
Padding
We observed that everytime we use the filter and scan the image for similarity, the size
of the resulting image becomes smaller and smaller. We do not want that, because we
want to preserve the original size of the image to extract some low level of features. The
second issue is that the pixels in the periphery get covered only once but the ones in the
middle get covered multiple times.
Thus, there are two issues with convolution -
• Shrinking feature map and
• Losing information on the periphery of the image.
These two issues are solved by padding. In padding, the image is added with extra pixels
at the boundary before the convolution. Zero padding means the value of these added
pixels is zero. Sometimes the value of the edge pixel is added in the padded cell.
Add image
If zero padding = 1 , one pixel will be added around the original image with value = 0.
Dropout
Neural networks can very easily overfit -- they learn very well on training set but perform
poorly on validation set.
To get around this problem, the network is trained by randomly dropping a few weights
(or filters) during each pass. This ensures that the filters do not “over specialize” in
recognizing features and can work when a few features are marked off.
Putting them together Convolutional network
The following image shows all the stages in a Convolutional neural network.
Training
The fully connected network at the end is trained using Back-propagation and gradient
descent discussed earlier. It is during this process that the distinguishing features of the
images are learnt by the filters.
Training an RNN
Recurrent Neural Networks are different from feed-forward networks discussed earlier
as they allow feedback loops, including self-feedback loops.
Training the network: The recurrent neural network can be trained using the gradient
descent error optimization method. The error is back propagated to all the previous time
steps to optimize the parameters (weights). Hence this process is also called Back
Propagation Through Time (BPTT). Once the network is trained, it can predict the next
sequence given the previous sequence data as inputs.
Recurrent Neural Networks are a state-of-the-art algorithm for sequential data. They are
used by, among others, Apple's Siri and Google's Voice Search.
Applications of RNN
• Speech recognition means taking the speech wave data, converting them into
machine learnable format and identifying words and phrases used in them. A
simple way to imagine speech recognition would be to input audio clip and
output the transcript. Speech is broken down into chunks and is passed through
the RNN. Systems like Alexa, Siri use RNN for speech recognition.
• Machine Translation means translating text/speech from one language to
another. It happens in two steps --
1. Speech recognition: converting the input to words and phrases and
translating each word to another language. However, this sequence may
not be correct in another language. Hence;
2. Sequence the words as per the target language. RNN is used for
sequencing the words in another language.
• Robot control -- Robots are being implemented using RNN in surgeries
for precision and control over the equipment where humans have a higher rate of
error compared to the still robots.
Autoencoders
Autoencoders are unsupervised way to learn about inputs. Input and output are same.
There are two phases: encoding, decoding. Both phases are mirror layers with similar
number of layers and depths decreasing for encoder phase and increasing in decoding
phase.
In the above image, input data is being encoded to a smaller fraction and then it is
decoded to the original form.
Salient features get captured due to data compression. This is why it is used to remove
noise from images (e.g. removing patches from the image). It can be used as
dimensionality reduction technique for the same reason.
Applications of Autoencoders
• Data De-noising
As we compress data in subsequent hidden layers, only salient features are captured in
the process of auto encoding. This process helps us eliminate any noise in the data.
• Dimensionality Reduction
Google uses auto-encoders for image search. A situation where a large number of
images are to be stored and when an unseen image is presented, it searches through all
these images and suggests similar images from the storage. Auto encoders is used to
compress image data and store it as an index. Any new image data can be compressed
and quickly searched against this index.
Another application could be face recognition system to identify criminals at public
locations (Railway station, Airports, Metro stations, etc).
Applications
Finance
Deep learning is used in Finance domain for various purposes like detecting a
fraudulent transaction, determining the risk in offering a loan amount, checking
whether an insurance claim is genuine or not and many more.
• Fraud Detection
• Loan and Insurance underwriting
• Portfolio Management
• Algorithmic Trading
Fraud Detection
Fraud detection is a topic applicable to many industries including the banking and
financial sectors, insurance, government agencies, and law enforcement, and more.
Fraud attempts have seen a drastic increase in recent years, making fraud detection
more important than ever. Despite efforts on the part of the affected institutions,
hundreds of millions of dollars are lost to fraud every year. Since relatively few cases
show fraud in a large population, finding these can be tricky.
Benefits:
• Reduces fraud involving credit card, forging checks, misleading accounting
practices, etc.
• Helps in detecting fraud claims in insurance and e-commerce companies
• Helps in better loan amount prediction
• Can detect malwares and phishing websites much easily
• Enhanced automation that leads to fewer manual reviews
• Greater speed in making a risk assessment by quickly identifying patterns in data
• Increased accuracy to identifying good orders versus fraudulent orders
• Allows data to be better classified and trends to be identified quickly
• Efficient utilization of resources is achieved as models are constantly updated
with new data and feature extraction.
Companies:
• Danske Bank
• Lloyds Bank
Portfolio Management
Portfolio management is all about
• Art and science of making decisions about investment mix and policy
• Matching investments to objectives,
• Asset allocation for individuals and institutions, and
• Balancing risk against performance.
• Determining strengths, weaknesses, opportunities and threats in the choice of
debt vs. equity, domestic vs. international, growth vs. safety, and
• Many other trade-offs encountered in the attempt to maximize return at a given
appetite for risk.
To learn more please refer this link.
Benefits:
• End investors will benefit.
• Expenses will decrease as firms achieve economies of scale by
using technology.
• Returns will increase as new investment management processes are put in
place.
• Better decision making system
• Better risk management
Companies:
• Wealthfront
• Betterment
Algorithmic Trading
Algorithmic trading (also called automated trading, black-box trading, or simply algo-
trading) is the process of using computers programmed to follow a defined set of
instructions for placing a trade in order to generate profits at a speed and frequency that
is impossible for a human trader.
The defined sets of rules are based on:
• Timing,
• Price,
• Quantity or
• Any mathematical model.
Apart from profit opportunities for the trader, algo-trading makes markets more liquid
and makes trading more systematic by ruling out emotional human impacts on trading
activities. Some of the organizations that leverage these techniques are
Benefits:
• Better decision making system
• Better risk management
• Predict better stock to invest
• Predict better fund for long term investment
Companies:
• Societe Generale
• Goldman Sachs
• Morgan Stanley
• Credit Suisse
Energy
Saving energy consumption
Different companies are now using deep learning to conserve energy.
Saving energy is a complex scenario as it is not just about switching off. It means
intelligently adjusting the energy consumption based on the anticipated system need.
Benefits:
• Better efforts to improve the efficiency of building appliances and materials
• Better efforts towards increasing the use of renewable energy sources
• New policies, incentives, and regulations to reduce energy consumption
• Automate building control in a way that improves building operation
Companies:
• Colas
• DeepMind
• Energisme
Better production
Saving energy is not enough, we also need to increase the efficiency of its productivity
and its distribution in order to achieve better result. DL can help in this too.
Benefits:
• More production of power in same amount of resource as earlier (say an
improved mileage for oil consumption)
• Saving energy to reduce energy wastage
• Using better architecture to increase efficiency
• Better energy conversation plans
Companies:
• nVidia
• Maruti Suzuki
Healthcare
Deep Learning in Health Care
Deep Learning models are now being successfully used in medical domain to detect
rare diseases and also to find ways to treat and prevent them.
Benefits:
• Automated and faster image diagnosis
• Detecting rare diseases
• Help in monitoring risk factors
• Personalised health monitoring
• Reduction in waiting time for reports
• Better administrative workflow
Companies:
• Carrefour
• Manipal Hospital
• Sensely
PyTorch is the Python version of Torch. This was primarily developed by Facebook AI Research
group
• Hybrid Front-End
DL4J
• Open Source Deep Learning library
• Java and Scala based
• Used across industries: Finance, E-Commerce, Banking, Supply Chain,
Manufacturing
DL4J Features:
• Excel in identifying patterns in unstructured data
• Can be used for image processing
• Works well with text analysis
• Compatible with distributed computing softwares
Keras
• User friendly interface to Tensorflow, Microsoft Cognitive Toolkit, Theano
• Python based
• Developing Deep Learning network is simplified
• Gaining acceptance in user community
• Developed by Google
Keras Features:
• Can be used for prediction
• Widely used for feature extraction
• Weights can be downloaded
• Large number of pre-trained models available
• Fine tuning can be easily done
Caffe
• Convolutional Architecture for Fast Feature Embedding
• C++ based with Python interface
• Usable with Apace Spark distributed computing environment
• Primarily used for image classification
Google Cloud Platform, offered by Google, is a suite of cloud computing services that
runs on the same infrastructure that Google uses internally for its end-user products,
such as Google Search and YouTube.
It provides a series of modular cloud services including computing, data storage, data
analytics and machine learning.
GCP Features:
• Server less, fully managed computing
• Secured platform
• Better data center
• Powerful for data analytics
• Infrastructure developed keeping future in mind
GCP Products:
• Cloud computing – provides compute, storage, databases
• Analytic and machine learning – provides BigQuery, artificial intelligence APIs
• Google Maps platform – provides Maps, Routes, Places
• Browser, hardware and OS – provides Chrome, Android, Jamboard
• Professional services – provides consulting, training, certification
and many more.
H2O:
IBM Watson:
Watson is a question-answering computer system capable of answering questions
posed in natural language. It was created as a question answering computing system
that IBM built to apply advanced natural language processing, information
retrieval, knowledge representation, automated reasoning, and machine
learning technologies to the field of open domain question answering.
Watson Features:
• Accelerate research and discovery
• Detect liabilities and migrate risk
• Scale expertise and learning
• Learn more with less data
• Reimagine your workflow
Watson Products:
• AI Assistance
• Data
• Knowledge
• Vision
• Speech