Sway 020 A
Sway 020 A
embedded systems
Mark Nadeski
Embedded Processing
Texas Instruments
It is hard to understate the promise of machine
learning, the latest evolution of which, deep learning,
has been called a foundational technology that will
impact the world to the same degree as the internet,
or the transistor before that.
Brought on by great advancements in computing power and the availability of enormous
labeled data sets, deep learning has already brought major improvements to image
classification, virtual assistants and game playing, and will likely do the same for
countless industries. Compared to traditional machine learning, deep learning can
provide improved accuracy, greater versatility and better utilization of big data – all with
less required domain expertise.
In order for machine learning to fulfill its promise in Training and inference
many industries, it is necessary to be able to deploy In the subset of machine learning that is deep
the inference (the part that executes the trained learning, there are two main pieces: training and
machine learning algorithm) into an embedded inference, which can be executed on completely
system. This deployment has its own unique set different processing platforms, as shown in Figure 1,
of challenges and requirements. This white paper below. The training side of deep learning usually
will address the challenges of deploying machine occurs offline on desktops or in the cloud and
learning in embedded systems and the primary entails feeding large labeled data sets into a deep
considerations when choosing an embedded neural network (DNN). Real-time performance or
processor for machine learning. power is not an issue during this phase. The result
of the training phase is a trained neural network that
• Reliability. Relying on an internet connection is learning when needed. For example, an entire
often not a viable option. input image at high frames per second (fps)
can run classical computer vision algorithms to
• Low latency. Many applications need an
perform object tracking with deep learning used
immediate response. An application may not be
on identified sub-regions of the image at a lower
able to tolerate the time delay in sending data
fps for object classification. In this example, the
somewhere else for processing.
classification of objects across multiple subregions
• Privacy. The data may be private and therefore may require multiple instances of inference, or
should not be transmitted or stored externally. possibly even different inferences running on each
• Bandwidth. Network bandwidth efficiency is often sub-region. In the latter case, you must choose a
a key concern. Connecting to a server for every processing solution that can run both traditional
use case is not sustainable. computer vision and deep learning, as well as
multiple instances of different deep learning designed for the embedded space. Similarly, look
inferences. Figure 2 shows an example usage of for processors capable of efficiently leveraging
tracking multiple objects through sub-regions of the tools that bring these networks into the
an image and performing classification on each embedded space. For example, neural networks
can tolerate lots of errors; using quantization is a
object being tracked.
good way to reduce performance requirements
• Choose the right performance point. Once
with minimal decreases in accuracy. Processors
you have a sense of the scope of the entire
that can support dynamic quantization and
application, it becomes important to understand
efficiently leverage other tricks like sparsity (limiting
how much processing performance is necessary
the number of non-zero weights) are good choices
to satisfy the application needs. This can be
in the embedded space.
difficult to understand when it comes to machine
• Ensure ease of use. Ease of use refers
learning because so much of the performance
to both ease of development and ease of
is application-specific. For example, the
evaluation. As mentioned earlier, right-sizing the
performance of a convolutional neural net (CNN)
processor performance is an important design
that classifies objects on a video stream depends
consideration. The best way to do this correctly
on what layers are used in the network, how
is to run the chosen network on an existing
deep the network is, the video’s resolution, the
processor. Some offerings provide tools that,
fps requirement and how many bits are used for
given a network topology, will show achievable
the network weights – to name just a few. In an
performance and accuracy on a given processor,
embedded system, however, it is important to try
thus enabling a performance evaluation without
and get a measure of the performance needed
the need for actual hardware or finalization of a
because throwing too powerful a processor at the
network. For development, being able to easily
problem generally comes at a trade-off against
import a trained network model from popular
increased power, size and/or cost. Although a
frameworks like Caffe or TensorFlow is a must.
processor may be capable of 30fps at 1080p of
ResNet-10, a popular neural net model used in Additionally, support for open ecosystems like
high power, centralized deep learning applications, ONNX (Open Neural Network eXchange) will
support an even larger base of frameworks to be
it’s likely overkill for an application that will run a
used for development.
Important Notice: The products and services of Texas Instruments Incorporated and its subsidiaries described herein are sold subject to TI’s standard terms and
conditions of sale. Customers are advised to obtain the most current and complete information about TI products and services before placing orders. TI assumes
no liability for applications assistance, customer’s applications or product designs, software performance, or infringement of patents. The publication of information
regarding any other company’s products or services does not constitute TI’s approval, warranty or endorsement thereof.
The platform bar is a trademark of Texas Instruments. All other trademarks are the property of their respective owners.
TI PROVIDES TECHNICAL AND RELIABILITY DATA (INCLUDING DATASHEETS), DESIGN RESOURCES (INCLUDING REFERENCE
DESIGNS), APPLICATION OR OTHER DESIGN ADVICE, WEB TOOLS, SAFETY INFORMATION, AND OTHER RESOURCES “AS IS”
AND WITH ALL FAULTS, AND DISCLAIMS ALL WARRANTIES, EXPRESS AND IMPLIED, INCLUDING WITHOUT LIMITATION ANY
IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT OF THIRD
PARTY INTELLECTUAL PROPERTY RIGHTS.
These resources are intended for skilled developers designing with TI products. You are solely responsible for (1) selecting the appropriate
TI products for your application, (2) designing, validating and testing your application, and (3) ensuring your application meets applicable
standards, and any other safety, security, or other requirements. These resources are subject to change without notice. TI grants you
permission to use these resources only for development of an application that uses the TI products described in the resource. Other
reproduction and display of these resources is prohibited. No license is granted to any other TI intellectual property right or to any third
party intellectual property right. TI disclaims responsibility for, and you will fully indemnify TI and its representatives against, any claims,
damages, costs, losses, and liabilities arising out of your use of these resources.
TI’s products are provided subject to TI’s Terms of Sale (www.ti.com/legal/termsofsale.html) or other applicable terms available either on
ti.com or provided in conjunction with such TI products. TI’s provision of these resources does not expand or otherwise alter TI’s applicable
warranties or warranty disclaimers for TI products.
Mailing Address: Texas Instruments, Post Office Box 655303, Dallas, Texas 75265
Copyright © 2019, Texas Instruments Incorporated