0% found this document useful (0 votes)
6 views19 pages

NNML Full

The document provides an overview of neural networks, including the comparison between biological and artificial neurons, and details various types of neural network architectures such as Feedforward Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks. It explains key concepts like activation functions, supervised and unsupervised learning, and the importance of data preprocessing in machine learning. Additionally, it discusses challenges and trends in machine learning, highlighting its applications across various domains.

Uploaded by

singhgaurav7974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views19 pages

NNML Full

The document provides an overview of neural networks, including the comparison between biological and artificial neurons, and details various types of neural network architectures such as Feedforward Neural Networks, Convolutional Neural Networks, and Recurrent Neural Networks. It explains key concepts like activation functions, supervised and unsupervised learning, and the importance of data preprocessing in machine learning. Additionally, it discusses challenges and trends in machine learning, highlighting its applications across various domains.

Uploaded by

singhgaurav7974
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Fundamentals of Neural Networks

Biological Neurons vs Artificial Neurons:

Biological neurons form the basis of the human brain and are connected through synapses.

 They consist of dendrites (input), a cell body (processing), and an axon (output). Artificial
Neural Networks (ANNs) mimic this structure.
 In ANNs, inputs represent dendrites, the node represents the cell body, weights act as
synapses, and output resembles the axon.
 The main goal of ANNs is to simulate the way the human brain learns and makes decisions,
using interconnected artificial neurons that transmit data and adjust through learning.

McCulloch-Pitts Perceptron

Perceptron is a supervised learning algorithm used for binary classification. It takes input
values, multiplies them by weights, adds a bias, and passes the result through an activation
function to determine output.

 Single-layer Perceptron: This is the simplest form with one input layer and one output
node. It can only solve linearly separable problems. If the weighted sum exceeds a
threshold, the output is 1; otherwise, it is 0.
 Multi-layer Perceptron(MLP): MLP consists of an input layer, one or more hidden
layers, and an output layer. It uses activation functions like Sigmoid, Tanh, or ReLU. The
MLP learns using the backpropagation algorithm, which involves forward propagation
to compute output and backward propagation to adjust weights by minimizing error. It can
model complex, non-linear problems and perform classification and regression tasks.

Activation Functions in Neural Networks

Activation functions are mathematical equations that determine the output of a neural network
model.

They decide whether a neuron should be activated or not by introducing non-linearity into the
model. Without activation functions, neural networks would behave like a linear regression model,
regardless of the number of layers.

They play a critical role in helping neural networks learn and make complex decisions by enabling
them to approximate non-linear functions.

1. Sigmoid Activation Function

Definition:

The sigmoid function maps any input value into the range of 0 to 1 using the formula:
Properties:

 Range: (0, 1)
 Shape: S-shaped (sigmoid curve)
 Differentiable: Yes, which is necessary for backpropagation.

Advantages:

 Outputs values between 0 and 1, making it suitable for binary classification and
probability predictions.
 Provides a smooth gradient, preventing abrupt changes in output values .
GeeksforGeeks+1Wikipedia+1

Disadvantages:

 Saturates for large input values, leading to vanishing gradients and slow learning.
 Outputs are not zero-centered, which can make optimization more challenging

Use Case:

 Typically used in the output layer of binary classification models.

2. Tanh (Hyperbolic Tangent) Activation Function

Definition:

Tanh is similar to the sigmoid function but outputs values in a different range:
Properties:

 Range: (-1, 1)
 Shape: S-shaped, like sigmoid, but zero-centered.

Advantages:

 Centered around zero, which makes learning faster.


 Outputs both positive and negative values.

Disadvantages:

 Also suffers from the vanishing gradient problem.


 For very high or low input values, the gradient becomes very small.

Use Case:

 Often used in hidden layers of neural networks.

3. ReLU (Rectified Linear Unit) Activation Function

Definition:

ReLU is the most widely used activation function in modern neural networks:

Properties:

 Range: [0, ∞)
 Simple computation and fast convergence.

Advantages:

 Introduces non-linearity while being easy to compute.


 Reduces the likelihood of vanishing gradients compared to sigmoid/tanh.
 Sparse activation (only some neurons activate at once), which improves efficiency.
Disadvantages:

 Dying ReLU Problem: If neurons only output 0, they may stop learning.
 Not zero-centered.

Use Case:

 Mostly used in the hidden layers of deep neural networks.

4. Softmax Activation Function

Definition:

Softmax is an extension of the sigmoid function for multi-class classification problems. It


converts raw scores into probabilities:

Properties:

 Range: (0, 1), and the outputs sum to 1.


 Gives a probabilistic interpretation of outputs.

Advantages:

 Clearly highlights the most likely class.


 Helps in interpreting the model’s confidence in predictions.

Disadvantages:

 Computationally more intensive than ReLU.


 Can be sensitive to outliers (large input values can dominate the output).

Use Case:

 Commonly used in the output layer of multi-class classification models.


Neural Network Architectures: Feedforward Neural
Networks (FNNs)
A Feedforward Neural Network (FNN) is the simplest type of artificial neural network wherein
the connections between the nodes do not form a cycle. It is also known as a multilayer perceptron
when it contains one or more hidden layers.

Structure:

An FNN consists of the following layers:

 Input Layer: Accepts input features. Each neuron corresponds to one feature.
 Hidden Layers: Perform computations using weighted inputs and biases. Non-linear
activation functions (e.g., ReLU, sigmoid, tanh) are applied here.
 Output Layer: Produces the final output of the network, such as a classification label or a
numeric value.

Working Mechanism:

1. Data is passed through the input layer.


2. Each neuron calculates a weighted sum of inputs, adds a bias, and passes the result through
an activation function.
3. The result propagates through the network layer by layer until it reaches the output.
4. The weights are updated using backpropagation, a technique where the error is
propagated backward to minimize loss using gradient descent.

Features:

 The data moves strictly in one direction (input → hidden → output).


 No feedback loops or memory is involved.
 Efficient for simple pattern recognition and classification tasks.

Applications:

 Email spam detection


 Customer churn prediction
 Credit scoring
 Digit recognition
2. Convolutional Neural Networks (CNNs): Applications in
Image Processing
A Convolutional Neural Network (CNN) is a deep learning model specialized for processing
data with a grid-like topology, such as images. CNNs are biologically inspired by the visual cortex
and are highly effective in image-related tasks.

Architecture:

1. Convolutional Layer: Uses filters (kernels) that slide over the image to extract local
features like edges, textures, or colors. It captures spatial hierarchies.
2. Activation Layer: Applies a non-linear activation function (commonly ReLU) to
introduce non-linearity.
3. Pooling Layer: Reduces spatial dimensions (e.g., MaxPooling) to make computation
efficient and reduce overfitting.
4. Fully Connected Layer (Dense Layer): Flattens the feature map into a vector for final
classification.

Why CNNs are Effective for Image Processing:

 Shared weights reduce the number of parameters.


 Preserve spatial information.
 Detect features hierarchically: edges in early layers, objects in deeper layers.

Applications in Image Processing:

 Image Classification: Classify an image into categories (e.g., dog, cat, etc.).
 Object Detection: Identify and locate objects in an image (e.g., YOLO, SSD).
 Facial Recognition: Used in security systems and photo tagging.
 Medical Imaging: Detect anomalies in X-rays, MRIs, and CT scans.
 Self-driving Cars: Lane detection, obstacle recognition, and traffic sign identification.
3. Recurrent Neural Networks (RNNs)
A Recurrent Neural Network (RNN) is a type of neural network designed to handle sequential
data. It has loops in its architecture that allow it to store past information in a hidden state and
use it for future computations.

Architecture:

 Each neuron not only receives input from the current time step but also receives input
from the hidden state of the previous time step.
 The network shares weights across time steps.
 Uses Backpropagation Through Time (BPTT) for training.

Hidden State:

 Acts as memory to retain important features from previous inputs.


 This makes RNNs suitable for time series and natural language tasks.

Challenges:

 Vanishing Gradient Problem: When gradients become too small, it’s hard to learn
long-term dependencies.
 Exploding Gradients: When gradients grow too large.
 These are mitigated using improved architectures like:
o LSTM (Long Short-Term Memory): Uses gates to regulate memory flow.
o GRU (Gated Recurrent Unit): Simplified LSTM with similar performance.

Applications:

 Text Generation and Language Modeling


 Speech Recognition
 Machine Translation
 Stock Price Prediction
 Chatbots and Virtual Assistants
1. Introduction to Machine Learning
Definition:

Machine Learning (ML) is a subset of Artificial Intelligence (AI) that enables computer systems
to learn patterns and make decisions or predictions from data without being explicitly
programmed. Instead of following strictly coded instructions, ML models learn from past
experiences (data) and improve their performance over time.

How it Works:

ML involves training algorithms on datasets so that the model can learn underlying patterns. Once
trained, the model can be used to make predictions or decisions on new, unseen data.

2. Basics of Machine Learning: Definitions, Applications,


and Scope
Applications of Machine Learning:

1. Healthcare:
o Disease prediction and diagnosis (e.g., cancer detection)
o Drug discovery and personalized treatment
2. Finance:
o Credit scoring and risk assessment
o Fraud detection and stock market prediction
3. Retail and Marketing:
o Recommendation systems (e.g., Amazon, Netflix)
o Customer segmentation and demand forecasting
4. Transportation:
o Self-driving cars and traffic prediction
o Route optimization in logistics
5. Natural Language Processing:
o Chatbots and virtual assistants
o Language translation and speech recognition

Scope of Machine Learning:

 ML is at the core of many emerging technologies like autonomous vehicles, smart


assistants, and predictive analytics.
 It plays a key role in big data analysis and decision-making systems.
 With the availability of large datasets and computational resources (e.g., cloud computing,
GPUs), ML continues to grow rapidly across all domains.
3. Types of Machine Learning
Machine Learning is typically categorized into three major types:

A. Supervised Learning

 In supervised learning, the algorithm learns from labeled training data, mapping inputs to
known outputs.
 The goal is to predict the output for new inputs based on what it learned.

Examples:

 Predicting house prices based on area, location, etc.


 Classifying emails into spam and non-spam.

Algorithms:

 Linear Regression
 Logistic Regression
 Decision Trees
 Support Vector Machines (SVM)
 k-Nearest Neighbors (KNN)

B. Unsupervised Learning

 In this type, the data provided to the algorithm is unlabeled.


 The goal is to discover hidden patterns, structures, or relationships within the data.

Examples:

 Customer segmentation in marketing


 Organizing documents or images into groups

Algorithms:

 K-Means Clustering
 Hierarchical Clustering
 Principal Component Analysis (PCA)
 Association Rule Mining

C. Reinforcement Learning

 In reinforcement learning, an agent learns to interact with an environment and takes


actions to maximize cumulative reward.
 Learning is based on trial-and-error and feedback in the form of rewards or penalties.
Examples:

 Game playing agents (e.g., AlphaGo)


 Robot navigation
 Real-time bidding in advertising

Key Components:

 Agent: Learner or decision-maker


 Environment: Everything the agent interacts with
 Policy: Strategy the agent follows
 Reward Signal: Feedback to evaluate actions

4. Key Challenges and Trends in Machine Learning


Challenges:

1. Data Availability and Quality:


o ML algorithms require large, high-quality datasets to perform effectively.
o Noisy, incomplete, or biased data can affect model performance.
2. Overfitting and Underfitting:
o Overfitting occurs when a model learns noise instead of pattern.
o Underfitting occurs when the model is too simple to capture complexity.
3. Interpretability:
o Complex models like deep neural networks are often "black boxes," making it
difficult to understand or trust their decisions.
4. Computational Resources:
o Training large models, especially on big data, requires powerful hardware (e.g.,
GPUs).
5. Ethical and Social Concerns:
o ML models can inherit biases from training data.
o There's a risk of misuse, privacy violations, and discrimination.

Trends in Machine Learning:

1. AutoML:
o Automates model selection, tuning, and deployment.
o Makes ML accessible to non-experts.
2. Explainable AI (XAI):
o Focuses on making ML models more transparent and understandable.
3. Federated Learning:
o Allows models to be trained across decentralized devices while preserving user
privacy.
4. Edge ML:
o Enables running ML algorithms on devices like smartphones and IoT sensors.
5. Introduction to Data Preprocessing
Before feeding data into a machine learning model, it must be preprocessed to ensure quality,
consistency, and relevance. Data preprocessing significantly affects model accuracy and
performance.

A. Data Cleaning:

 Handling Missing Values: Using mean, median imputation, or deletion.


 Removing Duplicates: Prevents misleading model training.
 Handling Outliers: Use techniques like z-score, IQR to detect and remove or transform.
 Noise Removal: Using filters or statistical methods.

B. Data Transformation:

 Scaling and Normalization:


o Scaling ensures that features are on the same scale (e.g., 0–1).
o Standardization (z-score) centers data around the mean.
 Encoding Categorical Variables:
o One-hot encoding, label encoding for converting text into numerical format.
 Log Transformation:
o Helps with skewed distributions.

C. Feature Engineering:

 Feature Creation: Create new features from raw data (e.g., extract "hour" from
timestamp).
 Feature Selection: Remove redundant or irrelevant features using correlation, mutual
information, or wrapper methods.
 Dimensionality Reduction: PCA, LDA to reduce features while retaining variance.

Objective: Improve model performance by selecting meaningful and informative input


variables.
Supervised Learning Techniques
Supervised Learning involves training models using labeled data (i.e., input-output pairs). The
model learns to map inputs to outputs and is later used to predict outcomes for unseen data.

It is mainly divided into two types:

I. Regression:
Regression algorithms predict a continuous numerical value. Two common types are:

🔷 1. Linear Regression

➤ Goal:

Predict a continuous output based on linear relationships between input variables.

➤ Equation: y = mx + c

➤ Working:

 Finds the best-fitting straight line through the data points.


 Minimizes the Mean Squared Error (MSE) between predicted and actual values.

➤ Use Cases:

 Predicting house prices


 Estimating sales revenue

🔷 2. Logistic Regression

➤ Goal:

Used for classification tasks (binary or multi-class), despite the name "regression".
➤ Equation: Sigmoid function =

➤ Working:

 Outputs a probability between 0 and 1 using a sigmoid function.


 Based on a threshold (e.g., 0.5), it predicts a class label.

➤ Use Cases:

 Spam detection (Spam or Not)


 Disease diagnosis (Yes/No)

➤ II. Classification
Classification algorithms predict discrete categories or class labels.

🔷 3. k-Nearest Neighbors (k-NN)

➤ Goal:

Classify a data point based on the majority label of its k-nearest neighbors.

➤ Working:

 Calculate Euclidean (or other) distances between the new point and all training points.
 Select the k closest points.
 Assign the class with the most votes among neighbors.

➤ Use Cases:

 Recommender systems
 Image classification

➤ Advantages:

 Simple and intuitive


 No training phase
➤ Disadvantages:

 Slow prediction for large datasets


 Sensitive to irrelevant features

🔷 4. Decision Tree

A Decision Tree is a supervised learning algorithm used for classification and regression. It
splits the data based on feature values by asking questions at each node. Each branch shows an
outcome, and each leaf gives a prediction. The goal is to keep dividing the data until each group
is similar or belongs to one class. It's simple, easy to understand, and widely used.

➤ Goal:

Predict outcomes by learning decision rules from data features.

➤ Working:

 Splits data based on feature values.


 Uses Gini Index or Information Gain to choose best splits.
 Tree is formed where internal nodes are decision rules, and leaves are outcomes.

➤ Use Cases:

 Credit risk analysis


 Customer churn prediction

➤ Advantages:

 Easy to interpret
 Handles both numerical and categorical data

➤Disadvantages:

 Prone to overfitting
🔷 5. Random Forest

➤ Goal:

An ensemble learning method that builds multiple decision trees and merges them for better
accuracy.

➤ Working:

 Creates many decision trees on random subsets of data and features.


 Final prediction is based on majority voting (classification) or average (regression).

➤ Use Cases:

 Fraud detection
 Stock price prediction

➤ Advantages:

 High accuracy
 Reduces overfitting

➤ Disadvantages:

 Less interpretable than a single tree


 Slower prediction than individual models
🔷 6. Support Vector Machines (SVM)

Support Vector Machine (SVM) is a supervised machine learning algorithm used for
classification and regression tasks. While it can handle regression problems, SVM is
particularly well-suited for classification tasks.

➤ Working:

 Selects the best separating hyperplane with the maximum margin between classes.
 Can use kernel tricks to handle non-linearly separable data (e.g., RBF, Polynomial).

➤ Use Cases:

 Face detection
 Text classification
 Bioinformatics

➤ Advantages:

 Works well for high-dimensional data


 Effective in complex but small- to medium-sized datasets

➤ Disadvantages:

 Memory-intensive
 Difficult to tune parameters (kernel, C, gamma).
K-Means
K-Means is an unsupervised, iterative clustering technique that partitions a dataset into k distinct
clusters by assigning each point to the nearest cluster “mean,” then recomputing means until
convergence. It emphasizes intra-cluster similarity and inter-cluster dissimilarity, making it a fast,
scalable method for grouping data.

Definition
 Unsupervised iterative technique: No labels are used; clusters form based solely on data
distribution.
 Cluster: A set of points exhibiting mutual similarity, with each point belonging to the
cluster whose mean (centroid) is nearest.

Algorithm Steps
1. Choose k: Decide the number of clusters you want.
2. Initialize centroids: Randomly pick k data points as initial cluster centers, ensuring they
are as far apart as possible.
3. Compute distances: For each data point, calculate its distance to every centroid (e.g.,
Euclidean or custom distance).
4. Assign clusters: Assign each point to the cluster of its nearest centroid.
5. Update centroids: Recalculate each centroid as the mean of all points assigned to that
cluster.
6. Repeat: Go back to step 3 and iterate until one of the following stopping criteria is met:
o Centroids no longer move
o Point assignments remain unchanged
o A preset maximum number of iterations is reached.

Advantages
 Time complexity: Efficient O(n k t) where n = instances, k = clusters, t = iterations.
 Local optimum: Often converges quickly, and can be enhanced to find global optima
using methods like simulated annealing or genetic algorithms.

Disadvantages
 Need to specify k: You must know the number of clusters in advance.
 Sensitivity to outliers: Cannot handle noise or outliers wel
Gradient Descent (GD)
 Definition: An optimization algorithm used to train machine learning models (including
neural networks) by iteratively adjusting parameters in the direction of the negative
gradient of the loss function, thereby minimizing prediction error.
 Usage:

 Trains models by minimizing the difference between actual and expected outputs
(e.g., Mean Squared Error in regression).
 Fundamental to backpropagation in neural networks, where weights and biases
are updated via GD at each layer.

Types of Gradient Descent


1. Batch Gradient Descent (BGD)

 How it works: Computes the gradient of the cost function using the entire training dataset
before each parameter update.
 Advantages:

 Produces stable, smooth convergence since each update is based on the full dataset.
 Precise gradient estimates lead to consistent progress toward the global minimum.

 Disadvantages:

 Can be very slow on large datasets due to full-dataset passes each iteration.
 High memory usage, as it must load all samples to compute each update.

2. Stochastic Gradient Descent (SGD)

 How it works: Updates model parameters using the gradient computed from one randomly
selected training example per iteration.
 Advantages:

 Faster convergence in practice for large datasets, since updates are made more
frequently.
 Requires minimal memory and can begin learning before seeing the entire dataset.
 Randomness helps escape shallow local minima in non-convex loss landscapes.

 Disadvantages:

 Updates have high variance, causing the loss to fluctuate rather than decrease
smoothly.
 May require careful tuning of learning rate and often benefits from decay schedules.
3. Mini-Batch Gradient Descent (MBGD)

 How it works: Splits the training set into small batches (e.g., 32–256 samples) and
performs an update for each mini-batch.
 Advantages:

 Balances the stability of BGD with the efficiency of SGD.


 Exploits vectorized operations on modern hardware (GPUs/TPUs) for faster
computation.
 Reduces the variance of parameter updates compared to SGD, leading to smoother
convergence.

 Disadvantages:

 Still requires batch-size tuning; too small batches behave like SGD, too large like
BGD.
 May get stuck in local minima if batch size is poorly chosen.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy