0% found this document useful (0 votes)
57 views7 pages

Sway 020 A

Uploaded by

shahzaib3769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Sway 020 A

Uploaded by

shahzaib3769
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Bringing machine learning to

embedded systems

Mark Nadeski
Embedded Processing
Texas Instruments
It is hard to understate the promise of machine
learning, the latest evolution of which, deep learning,
has been called a foundational technology that will
impact the world to the same degree as the internet,
or the transistor before that.
Brought on by great advancements in computing power and the availability of enormous
labeled data sets, deep learning has already brought major improvements to image
classification, virtual assistants and game playing, and will likely do the same for
countless industries. Compared to traditional machine learning, deep learning can
provide improved accuracy, greater versatility and better utilization of big data – all with
less required domain expertise.

In order for machine learning to fulfill its promise in Training and inference
many industries, it is necessary to be able to deploy In the subset of machine learning that is deep
the inference (the part that executes the trained learning, there are two main pieces: training and
machine learning algorithm) into an embedded inference, which can be executed on completely
system. This deployment has its own unique set different processing platforms, as shown in Figure 1,
of challenges and requirements. This white paper below. The training side of deep learning usually
will address the challenges of deploying machine occurs offline on desktops or in the cloud and
learning in embedded systems and the primary entails feeding large labeled data sets into a deep
considerations when choosing an embedded neural network (DNN). Real-time performance or
processor for machine learning. power is not an issue during this phase. The result
of the training phase is a trained neural network that

Large data sets Trained Network End product

Training Format Conversion Inference – deployment


(PC/GPU) (if necessary) (embedded processor)

Figure 1. Traditional deep learning development flow.

Bringing deep learning to embedded systems I 2 March 2019


when deployed can perform a specific task, such • Power. Power is always a priority for embedded
as inspecting a bottle on an assembly line, counting systems. Moving data consumes power. The further
and tracking people within a room, or determining the data needs to travel, the more energy needed.
whether a bill is counterfeit. The deployment of the
trained neural network on a device that executes Choosing an embedded processor
the algorithm is known as the inference. Given the for machine learning
constraints imposed by an embedded system, Many of the concerns requiring local processing
the neural network will often be trained on a overlap with those inherent in embedded systems,
different processing platform than the one running particularly power and reliability. Embedded
the inference. This paper focuses on processor systems also have several other factors to consider
selection for the inference part of deep learning. that are related to or caused by the system’s
The terms “deep learning” and “machine learning” physical limitations. There are frequently inflexible
in the rest of this paper refer to the inference. requirements regarding size, memory, power,
temperature, longevity and, of course, cost.
Machine learning at the edge
In the midst of balancing all of the requirements
The concept of pushing computing closer to and concerns for a given embedded application,
where sensors gather data is a central point of there are a few important factors to consider when
modern embedded systems – i.e. the edge of the choosing a processor to execute machine learning
network. With deep learning, this concept becomes inference for the edge:
even more important to enable intelligence and
• Consider the entire application. One of the
autonomy at the edge. For many applications – from
first things to understand before selecting a
automated machinery and industrial robots on a
processing solution is the scope of the entire
factory floor, to self-guided vacuums in the home,
application. Will running the inference be the only
to an agricultural tractor in the field – the processing
processing required or will there be a combination
must happen locally.
of traditional machine vision with the addition of
The reasons for local processing can be quite varied
a deep learning inference? It can often be more
depending on the application. Here are just a few of
efficient for a system to run a traditional computer
the concerns driving the need for local processing:
vision algorithm at a high level and then run deep

• Reliability. Relying on an internet connection is learning when needed. For example, an entire

often not a viable option. input image at high frames per second (fps)
can run classical computer vision algorithms to
• Low latency. Many applications need an
perform object tracking with deep learning used
immediate response. An application may not be
on identified sub-regions of the image at a lower
able to tolerate the time delay in sending data
fps for object classification. In this example, the
somewhere else for processing.
classification of objects across multiple subregions
• Privacy. The data may be private and therefore may require multiple instances of inference, or
should not be transmitted or stored externally. possibly even different inferences running on each
• Bandwidth. Network bandwidth efficiency is often sub-region. In the latter case, you must choose a
a key concern. Connecting to a server for every processing solution that can run both traditional
use case is not sustainable. computer vision and deep learning, as well as

Bringing deep learning to embedded systems I 3 March 2019


more embedded-friendly network on a 244 x 244
region of interest.

• Think embedded. Selecting the right network is


just as important as selecting the right processor.
Not every neural net architecture will fit on an
embedded processor. Limiting models to those
with fewer operations will help achieve real-time
performance. You should prioritize benchmarks
of an embedded-friendly network, one that will
tradeoff accuracy for significant computational
savings, instead of more well-known networks
Figure 2. Example of object classification using embedded deep learning.
like AlexNet and GoogleNet, which were not

multiple instances of different deep learning designed for the embedded space. Similarly, look

inferences. Figure 2 shows an example usage of for processors capable of efficiently leveraging

tracking multiple objects through sub-regions of the tools that bring these networks into the

an image and performing classification on each embedded space. For example, neural networks
can tolerate lots of errors; using quantization is a
object being tracked.
good way to reduce performance requirements
• Choose the right performance point. Once
with minimal decreases in accuracy. Processors
you have a sense of the scope of the entire
that can support dynamic quantization and
application, it becomes important to understand
efficiently leverage other tricks like sparsity (limiting
how much processing performance is necessary
the number of non-zero weights) are good choices
to satisfy the application needs. This can be
in the embedded space.
difficult to understand when it comes to machine
• Ensure ease of use. Ease of use refers
learning because so much of the performance
to both ease of development and ease of
is application-specific. For example, the
evaluation. As mentioned earlier, right-sizing the
performance of a convolutional neural net (CNN)
processor performance is an important design
that classifies objects on a video stream depends
consideration. The best way to do this correctly
on what layers are used in the network, how
is to run the chosen network on an existing
deep the network is, the video’s resolution, the
processor. Some offerings provide tools that,
fps requirement and how many bits are used for
given a network topology, will show achievable
the network weights – to name just a few. In an
performance and accuracy on a given processor,
embedded system, however, it is important to try
thus enabling a performance evaluation without
and get a measure of the performance needed
the need for actual hardware or finalization of a
because throwing too powerful a processor at the
network. For development, being able to easily
problem generally comes at a trade-off against
import a trained network model from popular
increased power, size and/or cost. Although a
frameworks like Caffe or TensorFlow is a must.
processor may be capable of 30fps at 1080p of
ResNet-10, a popular neural net model used in Additionally, support for open ecosystems like

high power, centralized deep learning applications, ONNX (Open Neural Network eXchange) will
support an even larger base of frameworks to be
it’s likely overkill for an application that will run a
used for development.

Bringing deep learning to embedded systems I 4 March 2019


There are many different types of processors to shown in Figure 3. The AM5749 has two Arm®
consider when choosing one for deep learning, Cortex®-A15 cores for system processing, two
and they all have their strengths and weaknesses. C66x digital signal processor (DSP) cores for
Graphics Processing Units (GPUs) are usually running traditional machine vision algorithms and
the first consideration because they are widely two Embedded Vision Engines (EVE) for running the
used during network training. Although extremely inference. TI’s deep learning (TIDL) software offering
capable, GPUs have had trouble gaining traction in includes the TIDL library, which runs on either C66x
the embedded space given the power, size and cost DSP cores or the EVEs, enabling multiple inferences
constraints often found in embedded applications. to run simultaneously on the device. Additionally,
Power- and size-optimized “inference engines” are the AM5749 provides a rich peripheral set; an
increasingly available as deep learning grows in industrial communications subsystem (ICSS) for
popularity. These engines are specialized hardware implementation of factory floor protocols such as
offerings aimed specifically at performing the deep EtherCat; and acceleration for video encode/decode
learning inference. Some engines are optimized to the and 3D and 2D graphics, facilitating the use of this
point of using 1-bit weights and can perform simple SoC in an embedded space that also performs
functions like key phrase detection, but optimizing deep learning.
this much to save power and compute comes with a Choosing a processor for an embedded application
tradeoff of limited system functionality and precision. is often the most critical component selection for a
The smaller inference engines may not be powerful
enough if the application needs to classify objects
or perform fine-grain work. When evaluating these
engines, make sure that they are right-sized for the
application. A limitation on these inference engines
comes when the application needs additional
processing aside from the deep learning inference.
More often than not, the engine will need to be
used alongside another processor in the system,
functioning as a deep learning co-processor.

An integrated system on chip (SoC) is often a


good choice in the embedded space because in
addition to housing various processing elements
capable of running the deep learning inference, an
SoC also integrates many components necessary Figure 3. Block diagram of the Sitara™ AM5749 SoC.
to cover the entire embedded application. Some
integrated SoCs include display, graphics, video product, and this is true for many industry-changing
acceleration and industrial networking capabilities, products that will bring machine learning to the
enabling a single-chip solution that does more than edge. Hopefully, this paper provided some insight
just run deep learning. into what you should consider when selecting a
An example of a highly integrated SoC for deep processor: consider the entire application, choose
learning is the AM5749 device from Texas Instruments, the right performance point, think embedded, and
ensure ease of use.

Bringing deep learning to embedded systems I 5 March 2019


Related websites:
• Learn more about Sitara AM57x processors.

• Download the Processor software development


kit (SDK) for Sitara AM57x processors with deep
learning for embedded applications.

• Download the Deep Learning Inference for


Embedded Applications Reference Design

Important Notice: The products and services of Texas Instruments Incorporated and its subsidiaries described herein are sold subject to TI’s standard terms and
conditions of sale. Customers are advised to obtain the most current and complete information about TI products and services before placing orders. TI assumes
no liability for applications assistance, customer’s applications or product designs, software performance, or infringement of patents. The publication of information
regarding any other company’s products or services does not constitute TI’s approval, warranty or endorsement thereof.
The platform bar is a trademark of Texas Instruments. All other trademarks are the property of their respective owners.

© 2019 Texas Instruments Incorporated SWAY020A


IMPORTANT NOTICE AND DISCLAIMER

TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you
permission to use these resources only for development of an application that uses the TI products described in the resource. Other
reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third
party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims,
damages, costs, losses, and liabilities arising out of your use of these resources.
TI’s products are provided subject to TI’s Terms of Sale (www.ti.com/legal/termsofsale.html) or other applicable terms available either on
ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable
warranties or warranty disclaimers for TI products.

Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2019, Texas Instruments Incorporated

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy