Module 5-Notes
Module 5-Notes
1. Introduction to Edge AI
Edge AI, or Artificial Intelligence at the Edge, is the deployment of AI models directly on edge
devices such as IoT devices, smartphones, and embedded systems, enabling real-time data
processing without relying on cloud servers. This approach offers several advantages, including
reduced latency, improved security and privacy, and efficient bandwidth usage, as data is
processed locally rather than being transmitted to remote servers. Edge AI leverages optimized
AI models and specialized hardware like TPUs, NPUs, and lightweight AI frameworks such as
TensorFlow Lite and OpenVINO to run efficiently on low-power devices. It finds applications
across various industries, including healthcare, where wearable devices enable real-time
health monitoring, autonomous vehicles for AI-driven navigation, manufacturing for predictive
maintenance and quality control, smart cities for AI-powered surveillance and traffic
management, and retail for automated checkout systems and personalized recommendations.
By bringing intelligence closer to the source of data generation, Edge AI enhances decision-
making speed, improves efficiency, and ensures greater data privacy, making it a crucial
innovation in AI-driven technology.
Key Components of Edge AI:
The integration of Edge AI with Very Large-Scale Integration (VLSI) is revolutionizing modern
computing by enabling real-time, low-power, and high-performance AI processing at the edge.
However, this fusion introduces several challenges and opportunities in VLSI design and
implementation.
The rapid growth of Machine Learning (ML) and Deep Learning (DL) applications has increased
the demand for energy-efficient algorithms that can be implemented on hardware platforms
like FPGAs, ASICs, TPUs, and Edge AI chips. The focus is on reducing power consumption while
maintaining high computational performance, making ML/DL feasible for low-power devices,
IoT, and edge computing.
Challenges in Hardware Implementation of ML/DL
1. High Computational Complexity
o Deep networks require billions of operations per second, increasing power
consumption.
o Running models on edge devices demands efficient processing to save energy.
2. Memory Bottlenecks
o Storing and moving large ML/DL models consume significant power.
o Limited on-chip memory and high DRAM access costs affect efficiency.
3. Parallelism and Dataflow Optimization
o Maximizing hardware utilization without excessive power draw is challenging.
o Mapping computations to specialized hardware needs optimized scheduling.
4. Precision vs. Accuracy Trade-off
o Lower precision reduces energy but may degrade model accuracy.
o Finding the optimal balance is key for real-world applications.
1. Model Quantization
Key Idea: Reduce the number of bits used for weights and activations to minimize memory
usage and power consumption.
Hardware Implementation:
• Tensor Processing Units (TPUs) support 8-bit quantization for energy savings.
• FPGAs use fixed-point arithmetic to optimize efficiency.
Key Idea: Remove redundant neurons, weights, or layers from deep networks.
Hardware Implementation:
3. Knowledge Distillation
Key Idea: Train a smaller model (student) to learn from a larger model (teacher) while
maintaining high accuracy.
Hardware Implementation:
• Used in Edge AI devices like Google's Edge TPU for efficient inference.
4. Low-Rank Approximation
Key Idea: Decompose weight matrices into smaller, lower-rank components to reduce
computation.
• Singular Value Decomposition (SVD) and Tensor Decomposition reduce energy usage.
• Approximate large layers (e.g., Fully Connected layers) with efficient alternatives.
Hardware Implementation:
5. Approximate Computing
Key Idea: Use approximate arithmetic units to trade slight accuracy loss for significant energy
savings.
Hardware Implementation:
Key Idea: Allow models to exit early if confidence is high, reducing unnecessary computation.
Hardware Implementation:
Key Idea: Use AI to design energy-optimized deep networks tailored for specific hardware.
• Finds architectures with low power, high accuracy, and minimal latency.
• Google's EfficientNet and MobileNet are NAS-designed models for low-power AI.
Hardware Implementation:
Neuromorphic computing is an emerging paradigm that mimics the brain's structure and
function to achieve energy-efficient, parallel, and adaptive computing. This approach is
particularly beneficial for edge AI, robotics, IoT, and real-time learning systems, where power
efficiency and latency are critical. Implementing neuromorphic computing on Field-
Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs)
provides significant advantages in terms of performance, flexibility, and power optimization.
FPGAs provide a flexible and reconfigurable platform for neuromorphic computing, allowing
researchers to experiment with SNN architectures, event-driven processing, and hardware
optimizations.
ROLLS (Robust Low Power Mixed-signal FPGA implementation for ultra-low power
SNN) neuromorphic computing
Reconfigurable Spiking
FPGA-based SNN accelerator for real-time event-driven AI
Processor (RSP)
1.5 VLSI Application: Implementing DL algorithms on FPGA for real-time data processing.
Deep Learning (DL) is revolutionizing various applications, from computer vision and natural
language processing (NLP) to autonomous systems and real-time analytics. However,
deploying DL models for real-time data processing on conventional hardware such as GPUs
and CPUs often leads to high power consumption and latency issues. Field-Programmable
Gate Arrays (FPGAs) provide an efficient alternative by offering:
• Low latency
• High parallelism
• Energy efficiency
• Customizable architectures
This makes FPGAs ideal for real-time AI applications, including edge AI, IoT, medical imaging,
autonomous vehicles, and industrial automation.
Power Consumption Low (optimized for energy-efficient AI) High Very High
Cost Efficiency Medium (expensive initially but cost-effective long-term) High Medium