0% found this document useful (0 votes)

57 views7 pages

Sway 020 A

Uploaded by

shahzaib3769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

57 views7 pages

Sway 020 A

Uploaded by

shahzaib3769

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Bringing machine learning to

embedded systems

Mark Nadeski
Embedded Processing
Texas Instruments
It is hard to understate the promise of machine
learning, the latest evolution of which, deep learning,
has been called a foundational technology that will
impact the world to the same degree as the internet,
or the transistor before that.
Brought on by great advancements in computing power and the availability of enormous
labeled data sets, deep learning has already brought major improvements to image
classification, virtual assistants and game playing, and will likely do the same for
countless industries. Compared to traditional machine learning, deep learning can
provide improved accuracy, greater versatility and better utilization of big data – all with
less required domain expertise.

In order for machine learning to fulfill its promise in Training and inference
many industries, it is necessary to be able to deploy In the subset of machine learning that is deep
the inference (the part that executes the trained learning, there are two main pieces: training and
machine learning algorithm) into an embedded inference, which can be executed on completely
system. This deployment has its own unique set different processing platforms, as shown in Figure 1,
of challenges and requirements. This white paper below. The training side of deep learning usually
will address the challenges of deploying machine occurs offline on desktops or in the cloud and
learning in embedded systems and the primary entails feeding large labeled data sets into a deep
considerations when choosing an embedded neural network (DNN). Real-time performance or
processor for machine learning. power is not an issue during this phase. The result
of the training phase is a trained neural network that

Large data sets Trained Network End product

Training Format Conversion Inference – deployment

(PC/GPU) (if necessary) (embedded processor)

Figure 1. Traditional deep learning development flow.

Bringing deep learning to embedded systems I 2 March 2019

when deployed can perform a specific task, such • Power. Power is always a priority for embedded
as inspecting a bottle on an assembly line, counting systems. Moving data consumes power. The further
and tracking people within a room, or determining the data needs to travel, the more energy needed.
whether a bill is counterfeit. The deployment of the
trained neural network on a device that executes Choosing an embedded processor
the algorithm is known as the inference. Given the for machine learning
constraints imposed by an embedded system, Many of the concerns requiring local processing
the neural network will often be trained on a overlap with those inherent in embedded systems,
different processing platform than the one running particularly power and reliability. Embedded
the inference. This paper focuses on processor systems also have several other factors to consider
selection for the inference part of deep learning. that are related to or caused by the system’s
The terms “deep learning” and “machine learning” physical limitations. There are frequently inflexible
in the rest of this paper refer to the inference. requirements regarding size, memory, power,
temperature, longevity and, of course, cost.
Machine learning at the edge
In the midst of balancing all of the requirements
The concept of pushing computing closer to and concerns for a given embedded application,
where sensors gather data is a central point of there are a few important factors to consider when
modern embedded systems – i.e. the edge of the choosing a processor to execute machine learning
network. With deep learning, this concept becomes inference for the edge:
even more important to enable intelligence and
• Consider the entire application. One of the
autonomy at the edge. For many applications – from
first things to understand before selecting a
automated machinery and industrial robots on a
processing solution is the scope of the entire
factory floor, to self-guided vacuums in the home,
application. Will running the inference be the only
to an agricultural tractor in the field – the processing
processing required or will there be a combination
must happen locally.
of traditional machine vision with the addition of
The reasons for local processing can be quite varied
a deep learning inference? It can often be more
depending on the application. Here are just a few of
efficient for a system to run a traditional computer
the concerns driving the need for local processing:
vision algorithm at a high level and then run deep

• Reliability. Relying on an internet connection is learning when needed. For example, an entire

often not a viable option. input image at high frames per second (fps)
can run classical computer vision algorithms to
• Low latency. Many applications need an
perform object tracking with deep learning used
immediate response. An application may not be
on identified sub-regions of the image at a lower
able to tolerate the time delay in sending data
fps for object classification. In this example, the
somewhere else for processing.
classification of objects across multiple subregions
• Privacy. The data may be private and therefore may require multiple instances of inference, or
should not be transmitted or stored externally. possibly even different inferences running on each
• Bandwidth. Network bandwidth efficiency is often sub-region. In the latter case, you must choose a
a key concern. Connecting to a server for every processing solution that can run both traditional
use case is not sustainable. computer vision and deep learning, as well as

Bringing deep learning to embedded systems I 3 March 2019

more embedded-friendly network on a 244 x 244
region of interest.

• Think embedded. Selecting the right network is

just as important as selecting the right processor.
Not every neural net architecture will fit on an
embedded processor. Limiting models to those
with fewer operations will help achieve real-time
performance. You should prioritize benchmarks
of an embedded-friendly network, one that will
tradeoff accuracy for significant computational
savings, instead of more well-known networks
Figure 2. Example of object classification using embedded deep learning.
like AlexNet and GoogleNet, which were not

multiple instances of different deep learning designed for the embedded space. Similarly, look

inferences. Figure 2 shows an example usage of for processors capable of efficiently leveraging

tracking multiple objects through sub-regions of the tools that bring these networks into the

an image and performing classification on each embedded space. For example, neural networks
can tolerate lots of errors; using quantization is a
object being tracked.
good way to reduce performance requirements
• Choose the right performance point. Once
with minimal decreases in accuracy. Processors
you have a sense of the scope of the entire
that can support dynamic quantization and
application, it becomes important to understand
efficiently leverage other tricks like sparsity (limiting
how much processing performance is necessary
the number of non-zero weights) are good choices
to satisfy the application needs. This can be
in the embedded space.
difficult to understand when it comes to machine
• Ensure ease of use. Ease of use refers
learning because so much of the performance
to both ease of development and ease of
is application-specific. For example, the
evaluation. As mentioned earlier, right-sizing the
performance of a convolutional neural net (CNN)
processor performance is an important design
that classifies objects on a video stream depends
consideration. The best way to do this correctly
on what layers are used in the network, how
is to run the chosen network on an existing
deep the network is, the video’s resolution, the
processor. Some offerings provide tools that,
fps requirement and how many bits are used for
given a network topology, will show achievable
the network weights – to name just a few. In an
performance and accuracy on a given processor,
embedded system, however, it is important to try
thus enabling a performance evaluation without
and get a measure of the performance needed
the need for actual hardware or finalization of a
because throwing too powerful a processor at the
network. For development, being able to easily
problem generally comes at a trade-off against
import a trained network model from popular
increased power, size and/or cost. Although a
frameworks like Caffe or TensorFlow is a must.
processor may be capable of 30fps at 1080p of
ResNet-10, a popular neural net model used in Additionally, support for open ecosystems like

high power, centralized deep learning applications, ONNX (Open Neural Network eXchange) will
support an even larger base of frameworks to be
it’s likely overkill for an application that will run a
used for development.

Bringing deep learning to embedded systems I 4 March 2019

There are many different types of processors to shown in Figure 3. The AM5749 has two Arm®
consider when choosing one for deep learning, Cortex®-A15 cores for system processing, two
and they all have their strengths and weaknesses. C66x digital signal processor (DSP) cores for
Graphics Processing Units (GPUs) are usually running traditional machine vision algorithms and
the first consideration because they are widely two Embedded Vision Engines (EVE) for running the
used during network training. Although extremely inference. TI’s deep learning (TIDL) software offering
capable, GPUs have had trouble gaining traction in includes the TIDL library, which runs on either C66x
the embedded space given the power, size and cost DSP cores or the EVEs, enabling multiple inferences
constraints often found in embedded applications. to run simultaneously on the device. Additionally,
Power- and size-optimized “inference engines” are the AM5749 provides a rich peripheral set; an
increasingly available as deep learning grows in industrial communications subsystem (ICSS) for
popularity. These engines are specialized hardware implementation of factory floor protocols such as
offerings aimed specifically at performing the deep EtherCat; and acceleration for video encode/decode
learning inference. Some engines are optimized to the and 3D and 2D graphics, facilitating the use of this
point of using 1-bit weights and can perform simple SoC in an embedded space that also performs
functions like key phrase detection, but optimizing deep learning.
this much to save power and compute comes with a Choosing a processor for an embedded application
tradeoff of limited system functionality and precision. is often the most critical component selection for a
The smaller inference engines may not be powerful
enough if the application needs to classify objects
or perform fine-grain work. When evaluating these
engines, make sure that they are right-sized for the
application. A limitation on these inference engines
comes when the application needs additional
processing aside from the deep learning inference.
More often than not, the engine will need to be
used alongside another processor in the system,
functioning as a deep learning co-processor.

An integrated system on chip (SoC) is often a

good choice in the embedded space because in
addition to housing various processing elements
capable of running the deep learning inference, an
SoC also integrates many components necessary Figure 3. Block diagram of the Sitara™ AM5749 SoC.
to cover the entire embedded application. Some
integrated SoCs include display, graphics, video product, and this is true for many industry-changing
acceleration and industrial networking capabilities, products that will bring machine learning to the
enabling a single-chip solution that does more than edge. Hopefully, this paper provided some insight
just run deep learning. into what you should consider when selecting a
An example of a highly integrated SoC for deep processor: consider the entire application, choose
learning is the AM5749 device from Texas Instruments, the right performance point, think embedded, and
ensure ease of use.

Bringing deep learning to embedded systems I 5 March 2019

Related websites:
• Learn more about Sitara AM57x processors.

• Download the Processor software development

kit (SDK) for Sitara AM57x processors with deep
learning for embedded applications.

• Download the Deep Learning Inference for

Embedded Applications Reference Design

Important Notice: The products and services of Texas Instruments Incorporated and its subsidiaries described herein are sold subject to TI’s standard terms and
conditions of sale. Customers are advised to obtain the most current and complete information about TI products and services before placing orders. TI assumes
no liability for applications assistance, customer’s applications or product designs, software performance, or infringement of patents. The publication of information
regarding any other company’s products or services does not constitute TI’s approval, warranty or endorsement thereof.
The platform bar is a trademark of Texas Instruments. All other trademarks are the property of their respective owners.

IMPORTANT NOTICE AND DISCLAIMER

TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you
permission to use these resources only for development of an application that uses the TI products described in the resource. Other
reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third
party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims,
damages, costs, losses, and liabilities arising out of your use of these resources.
TI’s products are provided subject to TI’s Terms of Sale (www.ti.com/legal/termsofsale.html) or other applicable terms available either on
ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable
warranties or warranty disclaimers for TI products.

Bank Managemnet System
100% (1)
Bank Managemnet System
34 pages
Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
No ratings yet
Individual Paper - Nina Luksha - ITEC 625 9080 - Updated
11 pages
CSC162 Manual PDF
No ratings yet
CSC162 Manual PDF
89 pages
Bringing Deep Learning To Embedded Systems: Mark Nadeski
No ratings yet
Bringing Deep Learning To Embedded Systems: Mark Nadeski
7 pages
MythicWhitepaper 2019oct31
No ratings yet
MythicWhitepaper 2019oct31
9 pages
Sensors 21 04412
No ratings yet
Sensors 21 04412
44 pages
Applsci 12 10771 v2
No ratings yet
Applsci 12 10771 v2
44 pages
Matteo Samsa - Rad Za KI
No ratings yet
Matteo Samsa - Rad Za KI
9 pages
A Review of Embedded Machine Learning Based On Hardware, Application
No ratings yet
A Review of Embedded Machine Learning Based On Hardware, Application
55 pages
AI For Embedded Systems
No ratings yet
AI For Embedded Systems
5 pages
(River Publishers Series in Communications and Networking) Ovidiu Vermesan, Mario Diaz Nava, Björn Debaillie - Embedded Artificial Intelligence - Devices, Embedded Systems, and Industrial Applications
No ratings yet
(River Publishers Series in Communications and Networking) Ovidiu Vermesan, Mario Diaz Nava, Björn Debaillie - Embedded Artificial Intelligence - Devices, Embedded Systems, and Industrial Applications
143 pages
High Performance FPGA Based CNN Accelerator
No ratings yet
High Performance FPGA Based CNN Accelerator
4 pages
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
No ratings yet
Efficient Deep Learning Infrastructures For Embedded Computing Systems: A Comprehensive Survey and Future Envision
101 pages
Benchmarking Contemporary Deep Learning Hardware and Frameworks A Survey of Qualitative Metrics
No ratings yet
Benchmarking Contemporary Deep Learning Hardware and Frameworks A Survey of Qualitative Metrics
8 pages
Dlau: A Scalable Deep Learning Accelerator Unit On Fpga N.Vimala, B.Alekya Himabindu, Y.Mallikarjuna Rao, G.Sowmya, S.Girish Babu, M.Mahesh Kumar
No ratings yet
Dlau: A Scalable Deep Learning Accelerator Unit On Fpga N.Vimala, B.Alekya Himabindu, Y.Mallikarjuna Rao, G.Sowmya, S.Girish Babu, M.Mahesh Kumar
15 pages
Accelerating The Power of Deep Learning With Neural Networks and Gpus
No ratings yet
Accelerating The Power of Deep Learning With Neural Networks and Gpus
6 pages
Applications of Embedded System Using AI A Review - HBRP Publication
No ratings yet
Applications of Embedded System Using AI A Review - HBRP Publication
7 pages
What Are The Embedded Systems
No ratings yet
What Are The Embedded Systems
3 pages
EAI - Lecture 2
No ratings yet
EAI - Lecture 2
21 pages
An Implementation of Convolutional Neural Networks
No ratings yet
An Implementation of Convolutional Neural Networks
23 pages
Embeded System
No ratings yet
Embeded System
26 pages
Computer Vision
No ratings yet
Computer Vision
51 pages
AWPR0007 Transforming Machine Learning With Alif MCUs v1.0
No ratings yet
AWPR0007 Transforming Machine Learning With Alif MCUs v1.0
8 pages
Embedded System Document 226w5a0412
No ratings yet
Embedded System Document 226w5a0412
8 pages
Embedded System Design BSC 01
No ratings yet
Embedded System Design BSC 01
80 pages
Tensorflow Lite Micro
No ratings yet
Tensorflow Lite Micro
12 pages
Chapter-1 Introduction To Embedded Systems: Characteristics
No ratings yet
Chapter-1 Introduction To Embedded Systems: Characteristics
37 pages
Accelerated Deep Learning Inference From Constrained Embedded Devices
No ratings yet
Accelerated Deep Learning Inference From Constrained Embedded Devices
5 pages
ESD Material
No ratings yet
ESD Material
8 pages
A Deep Learning Prediction Process Accelerator Based FPGA PDF
No ratings yet
A Deep Learning Prediction Process Accelerator Based FPGA PDF
4 pages
The Ultimate Guide To ML For Embedded Systems
No ratings yet
The Ultimate Guide To ML For Embedded Systems
22 pages
Hardware
No ratings yet
Hardware
10 pages
Intro To Embedded System
No ratings yet
Intro To Embedded System
9 pages
Embedded System Development Trends: December 2006
No ratings yet
Embedded System Development Trends: December 2006
11 pages
Research On Opencl Optimization For Fpga Deep Learning Application
No ratings yet
Research On Opencl Optimization For Fpga Deep Learning Application
19 pages
Embedded Systems Design Using The MSP430FR2355 LaunchPad 2nd Edition Brock J. Lameres 2024 Scribd Download
100% (2)
Embedded Systems Design Using The MSP430FR2355 LaunchPad 2nd Edition Brock J. Lameres 2024 Scribd Download
49 pages
2.1 Embedded Systems
No ratings yet
2.1 Embedded Systems
7 pages
Embedded Systems: Printers
No ratings yet
Embedded Systems: Printers
12 pages
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
No ratings yet
May 2025 - Top 10 Read Articles in Artificial Intelligence and Applications (IJAIA)
36 pages
Training Report
No ratings yet
Training Report
41 pages
Face Recognization Based Car Security System
No ratings yet
Face Recognization Based Car Security System
86 pages
Project Report On Embedded System
100% (14)
Project Report On Embedded System
63 pages
Introduction To Embedded Systems
No ratings yet
Introduction To Embedded Systems
18 pages
A General Neural Network Hardware Architecture On FPGA
No ratings yet
A General Neural Network Hardware Architecture On FPGA
6 pages
Embedded System Report
100% (1)
Embedded System Report
38 pages
Embedded Systems Intro
No ratings yet
Embedded Systems Intro
7 pages
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
No ratings yet
Accelerating Binarized Neural Networks Comparison of FPGA CPU GPU and ASIC
8 pages
Training: Appin Technologies
No ratings yet
Training: Appin Technologies
52 pages
Transforming Edge Ai With Npus in Microcontrollers
No ratings yet
Transforming Edge Ai With Npus in Microcontrollers
12 pages
Module-1Characteristics of Embedded Systems
No ratings yet
Module-1Characteristics of Embedded Systems
12 pages
Embedded Deep Learning Accelerators A Survey On Recent Advances
No ratings yet
Embedded Deep Learning Accelerators A Survey On Recent Advances
19 pages
Embedded System Design Issues
No ratings yet
Embedded System Design Issues
12 pages
Report - Automated Survillance Robot For Military Application
No ratings yet
Report - Automated Survillance Robot For Military Application
63 pages
Embeddedsystems 140418034737 Phpapp02
No ratings yet
Embeddedsystems 140418034737 Phpapp02
11 pages
TOPIC 1 Introduction To Embedded Systems
No ratings yet
TOPIC 1 Introduction To Embedded Systems
20 pages
01 Lec Intro
No ratings yet
01 Lec Intro
33 pages
REN - Vision Ai Across Hardware Platforms R01wp0028eu0100 - WHP - 20250325
No ratings yet
REN - Vision Ai Across Hardware Platforms R01wp0028eu0100 - WHP - 20250325
9 pages
Embedded Systems
No ratings yet
Embedded Systems
8 pages
Capra 2020
No ratings yet
Capra 2020
48 pages
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
From Everand
Study Guide 300-435 ENAUTO: Automating and Programming Cisco Enterprise Solutions Certification Exam
Anand Vemula
No ratings yet
Software Defined Networking (SDN) - a definitive guide
From Everand
Software Defined Networking (SDN) - a definitive guide
Rajesh Kumar Sundararajan
2/5 (2)
Embedded Systems Programming with C++: Real-World Techniques
From Everand
Embedded Systems Programming with C++: Real-World Techniques
Robert Johnson
No ratings yet
SUN2000-330KTL-H2 Output Characteristics Curve
No ratings yet
SUN2000-330KTL-H2 Output Characteristics Curve
7 pages
FlexLM (FlexNet) Error Code List
No ratings yet
FlexLM (FlexNet) Error Code List
4 pages
Pratt Chapter 2
No ratings yet
Pratt Chapter 2
41 pages
TFT STL RX 5290-546738
No ratings yet
TFT STL RX 5290-546738
28 pages
Os Chapter 1
No ratings yet
Os Chapter 1
8 pages
p75n02ldg - PDF Mosfet
100% (1)
p75n02ldg - PDF Mosfet
3 pages
CSS 10 - Lesson 3
No ratings yet
CSS 10 - Lesson 3
23 pages
02 Handout 1
No ratings yet
02 Handout 1
2 pages
enteliWEB 4.20 Operator Guide
No ratings yet
enteliWEB 4.20 Operator Guide
244 pages
Cyber SYLLABUS Feb2023
No ratings yet
Cyber SYLLABUS Feb2023
8 pages
Smu01 Controller 6di6do Web SNMP
No ratings yet
Smu01 Controller 6di6do Web SNMP
2 pages
Packing List For UCSB-5108-AC2 (ss3) PDF
No ratings yet
Packing List For UCSB-5108-AC2 (ss3) PDF
2 pages
FYB.com
No ratings yet
FYB.com
12 pages
Guide To Downloading and Installing The WebMethods Free Trial Version - Wiki
No ratings yet
Guide To Downloading and Installing The WebMethods Free Trial Version - Wiki
9 pages
ICT913 Networking - (Assessment 2) (Student-1)
No ratings yet
ICT913 Networking - (Assessment 2) (Student-1)
9 pages
74HC240 74HCT240: 1. General Description
No ratings yet
74HC240 74HCT240: 1. General Description
14 pages
Uav
No ratings yet
Uav
31 pages
61811333035
No ratings yet
61811333035
3 pages
Solace Pubsub Platform Datasheet
No ratings yet
Solace Pubsub Platform Datasheet
6 pages
Ge Elec M3
No ratings yet
Ge Elec M3
3 pages
Lenovo X200 Tablet Data Sheet
No ratings yet
Lenovo X200 Tablet Data Sheet
4 pages
Rahul Report
No ratings yet
Rahul Report
27 pages
Design and Implementation of 500W Remote Controlled Transformer-Less Solar System
No ratings yet
Design and Implementation of 500W Remote Controlled Transformer-Less Solar System
6 pages
Jconnect™ For JDBC™: Installation Guide
No ratings yet
Jconnect™ For JDBC™: Installation Guide
32 pages
TI Impedance Front End Design
No ratings yet
TI Impedance Front End Design
18 pages
Red Hat Satellite 6.13 Installing Satellite Server in A Connected Network Environment
No ratings yet
Red Hat Satellite 6.13 Installing Satellite Server in A Connected Network Environment
102 pages
Discrete Fourier Transform 55 64
No ratings yet
Discrete Fourier Transform 55 64
10 pages
Sage Configuration
No ratings yet
Sage Configuration
17 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Sway 020 A

Uploaded by

Sway 020 A

Uploaded by

Bringing machine learning to

Large data sets Trained Network End product

Training Format Conversion Inference – deployment

Figure 1. Traditional deep learning development flow.

Bringing deep learning to embedded systems I 2 March 2019

Bringing deep learning to embedded systems I 3 March 2019

• Think embedded. Selecting the right network is

Bringing deep learning to embedded systems I 4 March 2019

An integrated system on chip (SoC) is often a

Bringing deep learning to embedded systems I 5 March 2019

• Download the Processor software development

• Download the Deep Learning Inference for

© 2019 Texas Instruments Incorporated SWAY020A

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.