NNML Full
NNML Full
Biological neurons form the basis of the human brain and are connected through synapses.
They consist of dendrites (input), a cell body (processing), and an axon (output). Artificial
Neural Networks (ANNs) mimic this structure.
In ANNs, inputs represent dendrites, the node represents the cell body, weights act as
synapses, and output resembles the axon.
The main goal of ANNs is to simulate the way the human brain learns and makes decisions,
using interconnected artificial neurons that transmit data and adjust through learning.
McCulloch-Pitts Perceptron
Perceptron is a supervised learning algorithm used for binary classification. It takes input
values, multiplies them by weights, adds a bias, and passes the result through an activation
function to determine output.
Single-layer Perceptron: This is the simplest form with one input layer and one output
node. It can only solve linearly separable problems. If the weighted sum exceeds a
threshold, the output is 1; otherwise, it is 0.
Multi-layer Perceptron(MLP): MLP consists of an input layer, one or more hidden
layers, and an output layer. It uses activation functions like Sigmoid, Tanh, or ReLU. The
MLP learns using the backpropagation algorithm, which involves forward propagation
to compute output and backward propagation to adjust weights by minimizing error. It can
model complex, non-linear problems and perform classification and regression tasks.
Activation functions are mathematical equations that determine the output of a neural network
model.
They decide whether a neuron should be activated or not by introducing non-linearity into the
model. Without activation functions, neural networks would behave like a linear regression model,
regardless of the number of layers.
They play a critical role in helping neural networks learn and make complex decisions by enabling
them to approximate non-linear functions.
Definition:
The sigmoid function maps any input value into the range of 0 to 1 using the formula:
Properties:
Range: (0, 1)
Shape: S-shaped (sigmoid curve)
Differentiable: Yes, which is necessary for backpropagation.
Advantages:
Outputs values between 0 and 1, making it suitable for binary classification and
probability predictions.
Provides a smooth gradient, preventing abrupt changes in output values .
GeeksforGeeks+1Wikipedia+1
Disadvantages:
Saturates for large input values, leading to vanishing gradients and slow learning.
Outputs are not zero-centered, which can make optimization more challenging
Use Case:
Definition:
Tanh is similar to the sigmoid function but outputs values in a different range:
Properties:
Range: (-1, 1)
Shape: S-shaped, like sigmoid, but zero-centered.
Advantages:
Disadvantages:
Use Case:
Definition:
ReLU is the most widely used activation function in modern neural networks:
Properties:
Range: [0, ∞)
Simple computation and fast convergence.
Advantages:
Dying ReLU Problem: If neurons only output 0, they may stop learning.
Not zero-centered.
Use Case:
Definition:
Properties:
Advantages:
Disadvantages:
Use Case:
Structure:
Input Layer: Accepts input features. Each neuron corresponds to one feature.
Hidden Layers: Perform computations using weighted inputs and biases. Non-linear
activation functions (e.g., ReLU, sigmoid, tanh) are applied here.
Output Layer: Produces the final output of the network, such as a classification label or a
numeric value.
Working Mechanism:
Features:
Applications:
Architecture:
1. Convolutional Layer: Uses filters (kernels) that slide over the image to extract local
features like edges, textures, or colors. It captures spatial hierarchies.
2. Activation Layer: Applies a non-linear activation function (commonly ReLU) to
introduce non-linearity.
3. Pooling Layer: Reduces spatial dimensions (e.g., MaxPooling) to make computation
efficient and reduce overfitting.
4. Fully Connected Layer (Dense Layer): Flattens the feature map into a vector for final
classification.
Image Classification: Classify an image into categories (e.g., dog, cat, etc.).
Object Detection: Identify and locate objects in an image (e.g., YOLO, SSD).
Facial Recognition: Used in security systems and photo tagging.
Medical Imaging: Detect anomalies in X-rays, MRIs, and CT scans.
Self-driving Cars: Lane detection, obstacle recognition, and traffic sign identification.
3. Recurrent Neural Networks (RNNs)
A Recurrent Neural Network (RNN) is a type of neural network designed to handle sequential
data. It has loops in its architecture that allow it to store past information in a hidden state and
use it for future computations.
Architecture:
Each neuron not only receives input from the current time step but also receives input
from the hidden state of the previous time step.
The network shares weights across time steps.
Uses Backpropagation Through Time (BPTT) for training.
Hidden State:
Challenges:
Vanishing Gradient Problem: When gradients become too small, it’s hard to learn
long-term dependencies.
Exploding Gradients: When gradients grow too large.
These are mitigated using improved architectures like:
o LSTM (Long Short-Term Memory): Uses gates to regulate memory flow.
o GRU (Gated Recurrent Unit): Simplified LSTM with similar performance.
Applications:
Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables computer systems
to learn patterns and make decisions or predictions from data without being explicitly
programmed. Instead of following strictly coded instructions, ML models learn from past
experiences (data) and improve their performance over time.
How it Works:
ML involves training algorithms on datasets so that the model can learn underlying patterns. Once
trained, the model can be used to make predictions or decisions on new, unseen data.
1. Healthcare:
o Disease prediction and diagnosis (e.g., cancer detection)
o Drug discovery and personalized treatment
2. Finance:
o Credit scoring and risk assessment
o Fraud detection and stock market prediction
3. Retail and Marketing:
o Recommendation systems (e.g., Amazon, Netflix)
o Customer segmentation and demand forecasting
4. Transportation:
o Self-driving cars and traffic prediction
o Route optimization in logistics
5. Natural Language Processing:
o Chatbots and virtual assistants
o Language translation and speech recognition
A. Supervised Learning
In supervised learning, the algorithm learns from labeled training data, mapping inputs to
known outputs.
The goal is to predict the output for new inputs based on what it learned.
Examples:
Algorithms:
Linear Regression
Logistic Regression
Decision Trees
Support Vector Machines (SVM)
k-Nearest Neighbors (KNN)
B. Unsupervised Learning
Examples:
Algorithms:
K-Means Clustering
Hierarchical Clustering
Principal Component Analysis (PCA)
Association Rule Mining
C. Reinforcement Learning
Key Components:
1. AutoML:
o Automates model selection, tuning, and deployment.
o Makes ML accessible to non-experts.
2. Explainable AI (XAI):
o Focuses on making ML models more transparent and understandable.
3. Federated Learning:
o Allows models to be trained across decentralized devices while preserving user
privacy.
4. Edge ML:
o Enables running ML algorithms on devices like smartphones and IoT sensors.
5. Introduction to Data Preprocessing
Before feeding data into a machine learning model, it must be preprocessed to ensure quality,
consistency, and relevance. Data preprocessing significantly affects model accuracy and
performance.
A. Data Cleaning:
B. Data Transformation:
C. Feature Engineering:
Feature Creation: Create new features from raw data (e.g., extract "hour" from
timestamp).
Feature Selection: Remove redundant or irrelevant features using correlation, mutual
information, or wrapper methods.
Dimensionality Reduction: PCA, LDA to reduce features while retaining variance.
I. Regression:
Regression algorithms predict a continuous numerical value. Two common types are:
🔷 1. Linear Regression
➤ Goal:
➤ Equation: y = mx + c
➤ Working:
➤ Use Cases:
🔷 2. Logistic Regression
➤ Goal:
Used for classification tasks (binary or multi-class), despite the name "regression".
➤ Equation: Sigmoid function =
➤ Working:
➤ Use Cases:
➤ II. Classification
Classification algorithms predict discrete categories or class labels.
➤ Goal:
Classify a data point based on the majority label of its k-nearest neighbors.
➤ Working:
Calculate Euclidean (or other) distances between the new point and all training points.
Select the k closest points.
Assign the class with the most votes among neighbors.
➤ Use Cases:
Recommender systems
Image classification
➤ Advantages:
🔷 4. Decision Tree
A Decision Tree is a supervised learning algorithm used for classification and regression. It
splits the data based on feature values by asking questions at each node. Each branch shows an
outcome, and each leaf gives a prediction. The goal is to keep dividing the data until each group
is similar or belongs to one class. It's simple, easy to understand, and widely used.
➤ Goal:
➤ Working:
➤ Use Cases:
➤ Advantages:
Easy to interpret
Handles both numerical and categorical data
➤Disadvantages:
Prone to overfitting
🔷 5. Random Forest
➤ Goal:
An ensemble learning method that builds multiple decision trees and merges them for better
accuracy.
➤ Working:
➤ Use Cases:
Fraud detection
Stock price prediction
➤ Advantages:
High accuracy
Reduces overfitting
➤ Disadvantages:
Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. While it can handle regression problems, SVM is
particularly well-suited for classification tasks.
➤ Working:
Selects the best separating hyperplane with the maximum margin between classes.
Can use kernel tricks to handle non-linearly separable data (e.g., RBF, Polynomial).
➤ Use Cases:
Face detection
Text classification
Bioinformatics
➤ Advantages:
➤ Disadvantages:
Memory-intensive
Difficult to tune parameters (kernel, C, gamma).
K-Means
K-Means is an unsupervised, iterative clustering technique that partitions a dataset into k distinct
clusters by assigning each point to the nearest cluster “mean,” then recomputing means until
convergence. It emphasizes intra-cluster similarity and inter-cluster dissimilarity, making it a fast,
scalable method for grouping data.
Definition
Unsupervised iterative technique: No labels are used; clusters form based solely on data
distribution.
Cluster: A set of points exhibiting mutual similarity, with each point belonging to the
cluster whose mean (centroid) is nearest.
Algorithm Steps
1. Choose k: Decide the number of clusters you want.
2. Initialize centroids: Randomly pick k data points as initial cluster centers, ensuring they
are as far apart as possible.
3. Compute distances: For each data point, calculate its distance to every centroid (e.g.,
Euclidean or custom distance).
4. Assign clusters: Assign each point to the cluster of its nearest centroid.
5. Update centroids: Recalculate each centroid as the mean of all points assigned to that
cluster.
6. Repeat: Go back to step 3 and iterate until one of the following stopping criteria is met:
o Centroids no longer move
o Point assignments remain unchanged
o A preset maximum number of iterations is reached.
Advantages
Time complexity: Efficient O(n k t) where n = instances, k = clusters, t = iterations.
Local optimum: Often converges quickly, and can be enhanced to find global optima
using methods like simulated annealing or genetic algorithms.
Disadvantages
Need to specify k: You must know the number of clusters in advance.
Sensitivity to outliers: Cannot handle noise or outliers wel
Gradient Descent (GD)
Definition: An optimization algorithm used to train machine learning models (including
neural networks) by iteratively adjusting parameters in the direction of the negative
gradient of the loss function, thereby minimizing prediction error.
Usage:
Trains models by minimizing the difference between actual and expected outputs
(e.g., Mean Squared Error in regression).
Fundamental to backpropagation in neural networks, where weights and biases
are updated via GD at each layer.
How it works: Computes the gradient of the cost function using the entire training dataset
before each parameter update.
Advantages:
Produces stable, smooth convergence since each update is based on the full dataset.
Precise gradient estimates lead to consistent progress toward the global minimum.
Disadvantages:
Can be very slow on large datasets due to full-dataset passes each iteration.
High memory usage, as it must load all samples to compute each update.
How it works: Updates model parameters using the gradient computed from one randomly
selected training example per iteration.
Advantages:
Faster convergence in practice for large datasets, since updates are made more
frequently.
Requires minimal memory and can begin learning before seeing the entire dataset.
Randomness helps escape shallow local minima in non-convex loss landscapes.
Disadvantages:
Updates have high variance, causing the loss to fluctuate rather than decrease
smoothly.
May require careful tuning of learning rate and often benefits from decay schedules.
3. Mini-Batch Gradient Descent (MBGD)
How it works: Splits the training set into small batches (e.g., 32–256 samples) and
performs an update for each mini-batch.
Advantages:
Disadvantages:
Still requires batch-size tuning; too small batches behave like SGD, too large like
BGD.
May get stuck in local minima if batch size is poorly chosen.