Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform
A Thesis
Of
Drexel University
by
of
June 2015
© Copyright 2015
I dedicate this thesis to my father, who taught me what it takes to persevere and succeed
in the world we live in today. Without the support and teachings from my father I would
not be the person I am today, and I cannot thank him enough for taking the effort and
doing the job he did to assure that I can succeed throughout life. The research for this
thesis took roughly a year and a half to complete, and my motivation throughout the
entire process was based on the person I’ve looked up to since I was a little boy, and that
is what helped me persevere and complete the thesis that I have presented. Thank you
dad, you are the best role model and father a man can ask for and this thesis could not
I would like to acknowledge and express my appreciation for my advisor, Dr. Prawat
Nagvajara for his support and assistance throughout this entire thesis. His ideas and
approaches were necessary for the completion of this thesis. I would also like to thank
Dr. Andrew Cohen and Dr. Nagarajan Kandasamy for reviewing my research and thesis
defense.
Table of Contents
ABSTRACT ...…………………………………………………………………………. vi
1. INTRODUCTION ………………………………………………...………….……… 1
2. APPLICATIONS …………………………………………………………………….. 7
5. CONCLUSION ……………………………………………………………………... 34
v
List of Figures
11. Horizontal and Vertical kernels used for Edge Detection .………………………… 26
12. The convolution process used when processing frames with a kernel …………..… 26
14. Code used for processing an image for Edge Detection ..………………………..… 27
17. Gaussian Operator utilized in the Improved Edge Detection Filter ………………... 30
Abstract
The thesis covers detailed description of a development platform for video processing
system targeted for ON-camera applications. Platforms such as the Zynq and Xilinx Inc.,
core peripheral, and input-output video signal interfaces into a single FPGA chip. The
advantages of using this method include size, weight, and power (SWAP) requirements in
applications such as pilot helmet vision and binocular video processors. The contents
include an overview of the processor system and IP cores on the FPGA architecture,
video processing IP cores, Integrated Design Environment (IDE) tools, and case studies
on grey scale conversion and canny edge detection. The results from the case studies
IP core peripherals enable real-time processing, which is difficult to meet under SWAP
constraints using software alone. The thesis presents studies on the design and
Chapter 1: Introduction
precision and speed in processing has never been more prevalent. Computers are
becoming smaller and faster than ever before. The idea of using drones and other
technological advances for surveillance and other services leads to the idea that real-time
processing must be computationally precise and swift. This paper will provide the
platform.
This type of board was invented by Xilinx in 1984 and is successfully replacing
application-specific integrated circuits (ASICs) and processors for the use of signal
processing and control applications [1]. The convenient part about using an FPGA is the
simple fact that custom hardware can be developed without the tedious process of wiring
a breadboard or using a soldering iron. When FPGAs were first invented, one would need
FPGA. Over the years there have been many advances to the high-level design tools, and
as a result, developing hardware on FPGAs is more convenient than ever before [1].
operations will not have to compete for the same resource because FPGAs are designed
2
to dedicate each independent processing task to a different section on the chip. This
notion allows for a greater amount of processing to be done without affecting the overall
In the business world of the industrial market, companies want fast and reliant
designs in order to maximize their profits. FPGA designs allow for a faster time to market
than ASIC designs. Using FPGAs, developers can design a custom solution without the
fabrication and assembly time delays that are typically associated with ASIC designs [2].
FPGAs also expedite the process of fixing bugs and modifying hardware to suit customer
needs. Lastly, high-level software tools come with prebuilt functions (Intellectual
Property (IP) cores) that can be used to quickly develop advanced control and signal
Compared to the costs of an ASIC design, the FPGA costs are substantially lower.
There are no fabrication costs and modifications do not require the large expense of
starting over as in ASIC designs. Many customers will need custom hardware
functionality for the tens to hundreds of systems in development, which gives FPGAs a
chip that is dedicated to each individual processing task. Unlike other processing systems,
FPGAs do not utilize an operating system to manage memory and processer bandwidth
and therefore rely solely on the true parallel execution and the deterministic hardware on
the chip.
FPGA design. Technology is constantly changing and advancing, and therefore custom
3
designs will always need modifications in order to keep up with the ever-changing
technology [1]. FPGAs are quickly reconfigurable and can keep up with future
1.3 Vivado
Vivado is a Design Suite that is used for next generation development with FPGA
The IDE for Vivado provides a graphical user interface (GUI) for the user to
create innovative designs. Opening a design loads the design net list at that particular
stage of the design flow, assigns the constraints to the design, and then applies the design
to the target device [3]. The interface for Vivado is constructed in a way to allow the user
to interact and visualize each of the stages in the user’s design concept. Vivado contains
an Intellectual Property (IP) catalog that contains cores used to create the Processing
System (PS) and Programming Logic (PL). Once the desired cores are wired, a bit stream
is generated that contains the logic of the system. This is generated after the Synthesis,
Implementation, and Design Rule Check (DRC) tests are completed. These tests are
important to assure that the logic, design, and timing constraints are logically accurate
before exporting the design to the software design kit (SDK) for the application creation
process.
4
1.3.2 Synthesis
design into a gate-level representation [4]. During the synthesis, the design is optimized
and time-driven for performance and memory usage [4]. After synthesis is complete, a
utilization report is created that will give the user information about the utilization of
memory, I/O, clocking, and other features dependent on the individual design.
1.3.3 Implementation
After the synthesis of the design is complete, the implementation process begins.
During implementation, all the necessary steps are executed to place and route the net list
onto the FPGA device resources, while meeting the design’s logical, physical, and timing
constraints [5]. This is important to assure that each IP core is assigned to the proper
resource area on the FPGA based on the constraints provided by either Vivado or the
user.
DRC takes a physical layout and assures that a series of rules are satisfied based
on recommended parameters known as the design rules. It is important to run this test
after synthesis and implementation are completed in order to find any lingering critical
warnings or errors that might be found in the physical layer after mapping and
After the Synthesis, Implementation, and Design Rule Check tests are satisfied,
Vivado generates a bit stream of the hardware that can be used to export to the SDK for
application design and testing. The bit stream can also be programmed right into the
for creating software applications targeted for Xilinx embedded processors [6]. The SDK
comes with a GNU-based compiler tool chain (GCC compiler, GDB debugger, utilities
and libraries), JTAG debugger, flash programmer, drivers for Xilinx IP and bare-metal
IDE for C/C++ bare-metal and Linux application development and debugging [6].
When the hardware is exported to the SDK, the libraries and Application Program
Interfaces (API) for the IP cores located in the hardware are exported as well. From here,
user’s can write software in C/C++ to interface with the hardware itself. After the
applications are written they can be compiled and built in the SDK. If a user wants to
have access to the various peripherals and features in their FPGA board, a Board Support
After the developed code is compiled, the user has several options to uploading
the application to the development platform. One method is to upload the application
through the JTAG port which is made possible by having a connection through the JTAG
port to the computer in which the application has been developed on. The other method is
to create and store a first stage boot loader onto a SD card, and then set the FPGA
development platform to load the application from the SD card. The latter method is used
The design flow displayed in Figure 1 is the general design flow for creating an
embedded system with Vivado tools. Each design requires a processing system (PS)
based on the FPGA development platform used in the design. From there, the processing
system can be configured for clocking, I/O management, and memory management. After
the PS is configured IP cores can be added and wired to the design. The IP cores can be
taken from the IP core library provided by Vivado or they can be custom made by the
user. After all the logical IP cores are properly configured, placed, and wired, a bitstream
can be generated if the user wishes to test the design on the FPGA development platform,
or the user can export the design for application development. In order for a bitstream
generation to take place, synthesis, implementation, and design rule checks must be
programmed right to the FPGA development platform. If the bitstream is exported to the
SDK, applications can be created for the hardware design. After the applications are
Chapter 2: Applications
There are several applications that can use video processing techniques using an
FPGA development platform. Embedded systems used in surveillance, drones, and other
mobile embedded devices will take advantage of video processing techniques in a mobile
setting. For instance, unmanned aerial reconnaissance systems (UARS), also known as
drones, use embedded video processing solutions for purposes of multimodal ground
imagery, mosaicking for live updating of terrain maps, and feature-based target tracking
[20]. Handheld Imaging Devices can use video processing techniques for purposes of
Military systems can also take advantage of the video processing techniques made
available using an FPGA platform. The Military needs a very high performance processor
with minimal size, weight, and power (SWAP) requirements to enable the design and
require embedded systems to be small and mobile in order to fit into constricted spaces.
FPGA boards are relatively small in size for the processing that can be utilized on them.
A system like this can be used to provide unprecedented awareness and target
coordination, while also satisfying the challenges of high-end floating point processing
power, and low power [21]. The military also can use this type of system for their
weapons. Some weapons require tracking devices and video processing techniques can be
used to clearly detect objects and people in order to assure that the weapons are being
The medical field is another crucial area that can take advantage of video
processing using FPGA development platforms. FPGA boards can be used to improve
design can be used to innovate 3-D vision and precise control for robotic-assisted,
minimally invasive surgery, which means less trauma and faster recover for the patients
[22]. Another application that can use this design in the medical field is electrosurgery.
as a means to cut the tissue. The main benefit of this type of operation is to make precise
Vivado is used to design the hardware and software platforms for digital signal
logic are designed using the Integrated Design Environment (IDE) and High-Level
Synthesis (HLS) tools; meanwhile the software platform is designed using the Software
Design Kit (SDK). After the design is completed and compiled in Vivado, it is then
programmed onto a Zynq-7000 series FPGA board. The board used for the following
design is the ZC702 evaluation board, which is part of the Zynq-7000 series of FPGA
boards. The hardware and software processing are tested using a GUI application that
allows the user to easily switch between hardware and software processing.
The ZC702 evaluation board provides a hardware environment for developing and
evaluating targeted designs. The important features that are used from this board include
the 1 GB DDR3 component memory, a tri-mode Ethernet PHY, general purpose I/O, two
9
UART interfaces, and the FMC component that is used to attach the VITA-57 FPGA
mezzanine cards [10]. The board also includes an HDMI codec, which is the port used to
display the results from performing digital signal processing with images and video.
Lastly, the SD Card Interface is used to load files from an 8 GB Flash Card. A block
The hardware design contains four major components: The processing system
(PS), HDMI input/output, and video direct memory access (VDMA), and the hardware
filter design in High-level Synthesis (HLS). These three components are connected with
represents the general block diagram for this design. Input is received from the HDMI
Input Source and streamed to the FMC HDMI Input IP core. There is a video timing
controller in this core that will retrieve necessary timing information. Video data from
this core is stored in the video direct memory access (VDMA). After data is stored into
the VDMA it is sent to the Filter core for processing and then back to the VDMA after
processing is complete. The processed frames are then sent to the output display. This is a
The AXI Interconnect core connects one or more AXI memory-mapped master
devices to one or more memory-mapped slave devices [7]. There are four AXI
Interconnects used throughout the block diagram shown in Figure 3. Three of them are
major AXI Interconnects that connects the processing system to the HDMI Input and
output, and th VDMA. The fourth AXI Interconnect has the formal name of Video In to
AXI4-Stream and is used to interface from a video source to the AXI4-Stream Video
Protocol Interface [9]. This core is essential to provide an interface between the video
input signal and the video processing core. This core is used in parallel with the
functionality of the Video Timing Controller (VTC) in order to detect the line standard of
the incoming video. The VTC is also responsible for detecting timing values, such as the
number of active pixels per line and the number of active lines available to video
The Processing System (PS) for this design utilizes a dual Cortex-A9 core
processor located on the ZC702 FPGA board. This processor implements the ARMx7
architecture and runs 32-bit ARM instructions [8]. The IP core displayed in Figure 4
represents the actual IP core used for the processing system in this design. The ports
S_AXI_HP0 and S_AXI_HP2 are used in order to connect the processing system to the
HDMI Input and Output, the VDMA (video direct memory access) and the hardware
filter for hardware processing. In between the high performance ports and the
components listed are AXI Interconnects, which allow for interaction between the IP
cores listed.
3.2.2.1 S_AXI_HP
The S_AXI_HP is a high performance slave AXI interface that is used to connect
the programming logic (PL) to the processing system (PS). The HP port enables a high
throughput data path between AXI masters in the PL and the PS DDR3 memory [8]. This
is needed in order to smoothly stream data continuously from the DDR to the PL, and the
reverse process as well (PL -> DDR). The Programmable logic in this design runs on a
150 MHz clock, meanwhile the DDR side is running at 355 MHz which is 66% of the
3.2.2.2 Interrupts
signals that the PS must be aware of. Three of the interrupts come from the VDMAs in
order to buffer incoming frames from the HDMI input signal, transfer frames from the
VDMA to the processing filter, and lastly, send processed frames to the VDMA for
13
storage. Another interrupt is used to retrieve the frames from the incoming HDMI input
signal. One interrupt is used for the filter when the filter is available to process new
frames, and the remaining two interrupts are used for the AXI performance monitor and
the logic video controller. In this processing system, the general interrupt controller
collects the interrupts from the various sources and then distributes the interrupts to each
of the ARM cores [8]. The highest priority interrupt will be handled first in the case when
more than one interrupt is pending. Equal priority interrupts are handled based on the
lowest ID. The order of interrupt priority for this design is as follows: The HDMI input
signal is the highest priority to assure that all the input frames are streamed into the
HDMI port. The second highest priority interrupt belongs to the filter interrupt since the
filter should always be processing new frames when available. The next three priority
interrupts are given to the VDMAs in this order: streaming from VDMA to the filter,
streaming from the filter to the VDMA, and lastly, streaming frames from the HDMI
input to the VDMA. The lowest priority interrupts belong to the logic video controller
The IIC_1 port shown in Figure 4 is made external in order to represent the
HDMI Input/Output FMC Module made from Avnet Electronics Marketing displayed in
Figure 5.
14
The features of this module include two HDMI interfaces (input and output), an
interface for the ON Vita Semiconductor image sensor, a video clock synthesizer, and the
I2C Configuration [11]. For the purpose of this design, the HDMI output port and the
interface for the ON Vita Semiconductor are not used. The HDMI input utilizes YCbCr
4:2:2 video format in order to represent pixels in 16 bits rather than 24 bits [11]. The
clock synthesizer is used to generate the video clock that is used to drive the display
output. The video input interface consists of the 16-bit video data bus, a data enable, and
horizontal and vertical sync signals [8]. As discussed before, there is an interrupt for the
HDMI port in order to detect if a video signal is incoming or not. The input HDMI
provide information on the video resolutions supported by the video sync (the display
15
monitor) to the display controller [8]. The display controller will use this information in
order to generate timing signals that drive the display coming from the HDMI transmitter.
HDMI Input core. The IO_HDMII port is connected externally to the input HDMI
interface of the FMC Module displayed in Figure 5. The sel[0:0] is used to switch
between the HDMI input from the FMC Module and the Test Generated Pattern which
will be discussed later. The M_AXI_S2MM is the stream coming from the output of this
core associated with the HDMI input and is connected to the AXI Interconnect which is
in turn connected to the processing system as discussed before. This port essentially
streams the HDMI input data to the VDMA for buffering. Lastly, the s2mm_fsync_out
port is connected to the f_sync port of the other VDMA IP that is associated with the
Input, Video In to AXI4-Stream, Video Timing Controller, Test Pattern Generator, and
The Clock Multiplexor is used to switch between the clock sourced generated
from the HDMI Interface and the onboard clock generator. If there is a valid HDMI input
signal, then the clock selected is the one generated from the HDMI Interface. If this
signal is not present, then the onboard clock is selected and the Test Generated Pattern is
This core is provided by FMC-Imageon and is used to receive the video signal
from the FMC module displayed in Figure 5. The video format is received in YCrCb
from a video source (clocked parallel video data with synchronization signals – active
video with either syncs, blanks, or both) to the AXI4-Stream Video Protocol Interface.
This core works with the VTC and provides a bridge between a video input and video
processing cores with AXI4-Stream Video Protocol Interface [13]. In this design this core
is used to handle video data clock boundary crossing between the video clock domain and
video timing generator and detector. The core comes with a comprehensive set of
interrupt bits, which provides an easy integration into a processor system for in-system
control of the block in real-time [12]. In short, this core is used to synchronize the process
of streaming video to the VDMA from the HDMI Input Interface. The input side of this
timing, and active video pixels [8]. In this design, the application software utilizes the
information from the VTC in order to decide whether to switch to the external video
source or not based on measurements of resolution. Another use of the VTC in this
design is to generate horizontal and vertical blanking and synchronization pulses. These
pulses are then used by the Test Pattern Generator to generate a video test pattern.
The Xilinx LogiCORE IP Test Patter Generator generates test patterns for video
systems bring up, evaluation, and debugging [14]. The core provides a wide variety of
test patterns that can be used for evaluation of performance. In this design a test pattern
generator is used when an HDMI input video source is not available. The TPG will
generate color bars and a moving box. The resolution of this pattern is 1920x1080.
The Video Direct Memory Access core provides high-bandwidth direct memory
access between memory and AXI4-Stream video type target peripherals including
peripherals, which support AXI4-Stream Video Protocol [15]. Applications that require
18
frame buffers to handle frame rate changes or changes to the image dimensions use AXI
video interface and AXI4 interface [15]. There are the interfaces involved with the
VDMA: AXI Streaming and AXI memory-mapped. The AXI streaming interface is used
to receive the video stream and the AXI memory-mapped interface is used to map the
video interface into memory. Associated with these two interfaces are two channels:
this design, the MM2S channel reads the number of data bits programmed through the
MM2S’ max burst length parameter and sends it to the slave device connected through
the streaming interface [8]. The S2MM channel receives data from the master device
order to determine the width of the streaming interface. Data received on the streaming
interface is then written into the system memory through the memory-mapped device [8].
The streaming interface data width is set at 32 bits and the memory-mapped interface is
configured to 64 bits. The maximum burst length is set to 16 bits in order to achieve the
enhanced, by enabling the store and forward feature of the VDMA [8].
The IP core displayed in Figure 7 represents the core used to stream data to the
display monitor via the HDMI output port. The HDMI output makes use of the
Multilayer Video Controller (MVC) in order to control the output display. The MVC
receives the video data through the MM2S channel of the VDMA. The data is the sent via
and CRT displays provided by Xylon [16]. The main function of this controller is to
provide flexible display control. The logiCVC-ML controller refreshes the display image
by reading the video memory and converting the read data into a data stream acceptable
Vivado High-Level Synthesis (HLS) design tools are used to bridge software and
implementation that can then be synthesized into a Xilinx Field FPGA. The top-level of
the hardware filter IP core that is shown in Figure 8 contains the hardware processing
filter that is created using Vivado HLS. There are several benefits to using the HLS
design methodology for both hardware and software designers: First, development time is
20
will improve the likelihood of finding the most-optimal implementation [17]. This will
The design flow of creating an IP core using Vivado HLS is shown in Figure 9.
First the C-specifications are written in C, C++, or System-C. Then the code is simulated
with a test bench created in C to assure that the logic is accurate and the correct results
are produced with no errors. After the simulation is completed, synthesis is performed on
the C specifications. Reports are generated from the synthesis in order to understand the
After the IP core is packaged, it can be used in the Vivado IDE block diagram.
Underneath the top-level design of the processing core displayed in Figure 8 is the filter
created in Vivado HLS. Video data is taken from the VDMA through the M_AXI_MM2S
port and is processed using the filter. After the processing is complete, the data is sent
back to the VDMA through the M_AXI_S2MM port. The mm2s_fsync is used to sync
21
the frames that are streaming through the HDMI input core as shown in Figure 6.
The Software Design component consists of the software architecture used for the
design of this thesis. The top-level view of this design can be viewed in Figure 10. The
software design makes use of a Qt-based GUI in order to display the resulting video
stream from the processed data. There are several device drivers that are used in the
kernel in order to capture data, process the data, and then display the data. Only the
achieved through the filter created in the HLS. The GUI has a feature that allows the user
to easily switch between the HW and SW filtering. Several binaries are also created using
Linux, which are stored on an 8 GB SD (Secure Digital) card. This flash card is used for
the purpose of implementing the hardware design into the FPGA development board as
The Boot loader is responsible for the power-on boot-up process; the non-
changeable boot code resides in the boot ROM. At power-on, the boot ROM reads the
boot mode register to determine the boot mode, which is user-configurable [8]. For the
purpose of this design the boot mode is configured for SD Card booting. In the SD Card
booting mode, the boot ROM reads the boot configuration header from the SD Card,
which is located in the binary named, “BOOT.bin” [8]. The other file elements located in
this binary are the boot header, the first stage boot loader (FSBL), the bit stream, and the
u-boot. The boot header contains the information about the other contests located in the
“BOOT.bin” binary as well as their offset, sizes, and security information [8]. The first
stage boot loader is responsible for initializing the minimum required hardware to
program the PL bitstream, and load and execute the u-boot, which is the second-stage
boot loader [8]. The u-boot is a universal boot loader that is used across various
embedded platforms. In this architecture, the u-boot loads the kernel image in the DDR
memory and is also responsible for completing the hardware initialization [8]. Lastly, the
There are several drivers used in the Linux kernel. The Xylon DRM driver is used
to drive the display controller to display the application UI and control data path [8]. The
Xilinx VDMA engine driver is used to configure, start, and stop the VDMA. The Xilinx
video pipeline driver will communicate with the Xilinx VDMA engine driver by using a
slave-DMA API [8]. The Xilinx IIC driver will configure the IIC controller and provide
IIC read and write functions. The ADV7511 driver is used as an HDMI encoder for the
FPGA. The Xilinx video pipeline driver for the purpose of transmitting the HDMI signal
calls the ADV7611 driver [8]. Lastly, the User space I/O driver (UIO) allows the ability
to write the majority of a driver in user space with only the shell of the driver in the
kernel. This driver uses char device and sysfs to interact with a user space process to
process interrupts and control memory accesses. For the purpose of this design the UIO
driver maps the device address space, and then controls the device using read/write as
defined by the register map [8]. This is used to monitor the performance of the system
There are four major parts to the design flow: Input, Memory, Processing, and
Output. The input signal is streamed from an HDMI source. In the previous section, there
was a discussion about the two VDMAs used in this design. The first VDMA is used to
store the initial video signal from the HDMI input source. The second VDMA is used to
store the processed signal before the stream is sent to the HDMI output source. The
processing can be done in either hardware or software. This is the general design flow,
the hardware is designed in the Vivado IDE and the software drives the IP cores. The
24
next major section will describe how the software drives the design for successful digital
signal processing. Code examples from the design itself will be documented in the next
Xilinx provides a GUI for the ease of displaying the processed video frames as
shown in Figure 10 [8]. The GUI is designed using the Qt framework and has several
features involving the input and the output of the HDMI sources. The GUI is designed
with a mouse as the input device for the purpose of changing certain settings. For the
purpose of this design, the test pattern generator is used if there is no available input
HDMI source. The user is able to change between no filtering, software filtering, or
hardware filtering. The GUI also displays two graphs corresponding to the CPU usage
and High performance port usage when video frames are processed. The output from the
both a Sobel Filter and a Gaussian filter in order to detect significant edges in the frames
The Sobel operator is used particularly for detecting edges. The operator
calculates the gradient of the image intensity at each point in order to give the direction of
the largest possible increase from light to dark within the frame [18]. This abrupt change
is a pattern that will tell how likely an edge is present at this particular pixel in the frame,
In order to successfully detect the edges in the frame, two 3x3 kernels are used to
calculate approximations of the derivatives in both the horizontal and vertical directions
of the frame. Figure 11 displays the values used for the two kernels, the kernel on the left
represents the values used in the horizontal direction, and the kernel on the right
represents the values used in the vertical direction. Figure 12 displays the convolution
process that takes place in order to mathematically detect the edges in the frame. Figures
13 and 14 display the code used for the Sobel part of the improved edge detection filter.
26
Figure 11: Horizontal (left) and Vertical (right) kernels used for edge detection in the
improved edge detection Filter
Figure 12: The convolution process used when processing frames with a kernel
In Figure 12, the original image frame is represented by the pixels “a##”, and the
processed image frame is represented by “b##”. The kernel is represented by “k##”. The
convolution process begins at the 2nd row and column of pixels because the edges of the
frame will cause the kernel to go outside of the boundaries of the frame, therefore the
edges are not processed with the kernel. After “b22” is calculated, the kernel will shift one
pixel to the right and begin the calculation of “b23”. This Sobel operator uses intensity
values in a 3x3 region around each image point to approximate the corresponding image
27
gradient [18]. In order to calculate the gradient magnitude, the resulting values from the
vertical and horizontal kernel convolutions are squared individually and then added
together. The gradient magnitude is the square root of this result. If Bx is the resulting
pixel from the horizontal convolution, and By is the corresponding pixel from the vertical
convolution, then the gradient magnitude is the square root of (Bx2 + By2).
// Y-direction gradient
y_weight += (window->getval(i, j) * y_op[i][j]);
}
}
// Combine the weights
edge_weight = ABS(x_weight) + ABS(y_weight);
Figure 14: Code used for processing an image for edge detection [8]
28
Figure 13 shows the values that correspond to Sobel filter kernel as displayed in
Figure 11. The actual processing takes place in Figure 14. The x_weight and y_weight
are calculated using two for loops in order to cover the 3x3 area around the new pixel in
the horizontal and vertical gradient directions. The arrays “x_op” and “y_op” contain the
kernel values for the Sobel filtering in their respective directions. After the convolution
process is complete as shown in Figure 12, the two weights are added together to form
The Gaussian filter is used to blur the image before the Sobel filter is applied. The
difference between the original image and the blurred image will be the amount of detail
in the image. The blurred image will have less detail than the original image. In essence,
the Sobel filter will detect more significant edges of the original image.
In order to successfully blur the image, one 3x3 kernel is used in a similar fashion
as the Sobel filter. Rather than deriving the pixels, the Gaussian filter will take a rough
average of the pixels around the center image of the kernel using the kernel displayed in
Figure 15. The convolution process shown in Figure 12 is used with the kernel shown in
In the case of the Gaussian filter: b22 = ((a11 * k11) + … + (a33 * k33)) / 16
The code shown in Figure 16 shows the flow of the canny filter. The
“AXIvideo2Mat” function will take in the stream frame by frame. Each frame is then
processed using the Gaussian filter first, and then the Sobel filter second. The values in
the Sobel filter that being with a “C” are realized from the function in Figure 13 when the
“image_filter” function is called. Lastly, each frame is sent to the output stream after
processing is complete. From the parameters, “video_in” represents the stream coming
into the filter from the VDMA. The frames are stored into a matrix, “img_0”. From there
the gauss filter is applied and the resulting image is stored in “img_gauss_tmp”. Next, the
sobel filter is applied and the resulting frame is stored into “img_1”. Lastly, the processed
The code shown in Figure 17 represents the processing needed to blur the image.
The values that begin with “X” are the kernel values that are shown in Figure 15. Once
again, two for loops are used in order to cover the 3x3 area that the kernel is convoluting.
The pixel values are multiplied by their respective kernel values and then the total is
divided by 16 in order to get an average which in turn represents a blurred pixel with
YUV_PIXEL pixel;
char i, j;
return pixel;
}
Figure 12: Gaussian Operator utilized in the improved edge detection Filter
31
The main idea behind a grayscale filter is to convert the value of the original pixel
to a single pixel that contains solely the intensity information [19]. The grayscale filter
will produce a frame that varies from white to black. Shades of black represent the pixels
with the weakest intensity; meanwhile shades of white represent pixels with the strongest
intensity.
There are several different algorithms that can be utilized in order to convert a
frame consisting of RGB pixels to a frame that consists of pixels in grayscale. The
method used in this design first converts the RGB pixels to YUV format. After the frame
is in YUV format, the Chroma value of each pixel is set to 128. The result from this
The code shown in Figure 18 represents the code used to describe the grayscale
process. Two for loops are used in order to cover the entire frame. In each iteration of the
32
for-loop, the Chroma value of each YUV_PIXEL is set to 128 in order to give the
The image displayed in Figure 19 represents the original image used before
images. One can clearly see the Gaussian filter has blurred the image from the original
image displayed in Figure 19. The grayscale result gives a black and white representation
of the original image. The improved edge detection filter shows the edges that were
detected from the image processed with the Gaussian Filter. Lastly, the result from the
Sobel filter is displayed in order to show that there are more edges detected when the
33
image is not blurred initially. More significant edges can be detected using the Canny
Figure 20: Image Results from Gaussian (top-left), Grayscale (top-right), Improved edge
detection (bottom-left), and Sobel (bottom-right) filtering
34
Chapter 5: Conclusion
embedded vision devices will be all around the world in our homes, in the workplace, in
transportation, in the medical field, and in the military field. The subject of video
processing can automate certain tasks with the ability to detect characteristics in frames
and perform different tasks based on the recognition algorithms that can be implemented.
This thesis provides in full detail the base design for implementing video processing on
an FPGA development platform. From this design, other filters can be easily
Avnet predicts that there will be over 14 million embedded vision processing
units by 2018 [24]. FPGA development platforms are improving every day, and the
processing power will only become greater in the future. Therefore, FPGA development
platforms will be utilized to perform tasks that a Graphical Processing Unit (GPU) would
mobile setting.
35
List of References
2. Mike Thompson, (2004). FPGAs accelerate time to market for industrial designs,
http://www.design-reuse.com/articles/8190/fpgas-accelerate-time-to-market-for-
industrial-designs.html, 5 May 2014.
3. Xilinx Inc., (2014). Vivado Design Suite User Guide: Using the Vivado IDE.
Ledgewood, NJ.
4. Xilinx Inc., (2014). Vivado Design Suite User Guide: Synthesis. Ledgewood, NJ.
5. Xilinx Inc., (2014). Vivado Design Suite User Guide Implementation. Ledgewood, NJ.
6. Xilinx Inc., (2014). Vivado Design Suite User Guide Embedded Processor Hardware
Design. Ledgewood, NJ.
7. Xilinx Inc., (2014). LogiCORE IP AXI Interconnect v2.1 Product Guide Vivado
Design Suite. Ledgewood, NJ.
8. Xilinx Inc., (2014). Zynq-7000 All Programmable SoC ZC702 Base Targeted
Reference Design User Guide. Ledgewood, NJ.
9. Xilinx Inc., (2014). LogiCORE IP Video In to AXI4-Stream v3.0 Product Guide Vivado
Design Suite. Ledgewood, NJ.
10. Xilinx Inc., (2014). ZC702 Evaluation Board for the Zynq-7000 XC7Z020 All
Programmable SoC User Guide. Ledgewood, NJ.
11. Avnet Electronics Marketing, (2012). FMC ON Hardware Guide. Parsippany, NJ.
12. Xilinx Inc., (2014). Video Timing Controller v6.1 LogiCORE IP Product Guide
PG016. Ledgewood, NJ.
13. Xilinx Inc., (2014). LogiCORE IP Video In to AXI4-Stream v3.0 Product Guide
Vivado Design Suite PG043. Ledgewood, NJ.
14. Xilinx Inc., (2014). Test Pattern Generator v6.0 LogiCORE IP Product Guide Vivado
Design Suite PG103. Ledgewood, NJ.
36
15. Xilinx Inc., (2014). LogiCORE IP AXI Video Direct Memory Access v6.2 Product
Guide Vivado Design Suite PG020. Ledgewood, NJ.
16. Xylon Technologies, (2014). Xylon Linux Framebuffer Driver For Use with Xylon’s
Display Controller IP Core – logiCVC-ML Compact Multilayer Video Controller User’s
Manual Version: 3.00.a. Kaloor, India.
17. Xilinx Inc., (2014). Vivado Design Suite User Guide: High-Level Synthesis, UG902
(v2014.1). Ledgewood, NJ.
18. H.M., (1996). Algorithm Designs: Graph Theory. Pedia Press, Mainz, Germany.
19. Stephen Johnson (2006). Stephen Johnson on Digital Photography. O'Reilly Media,
Sebastopol, CA.
21. BittWare FPGA Platforms, (2015). Light-weight Radar System for UAVs and
Manned Systems, http://www.bittware.com/fpga-dsp-applications/applications-
stories/radar-system, 14 May 2015.
22. Kamran Khan, (2012). FPGAs Help Drive Innovation in Complex Medical Systems,
http://www.medicalelectronicsdesign.com/article/fpgas-help-drive-innovation-complex-
medical-systems, 14 May 2015.
23. 5-D Systems, (2014). Vision Processor for Helemet System (VPHS),
http://www.5dsystems.com/vision-processor-for-helmet-systems-vphs.php, 15 May 2015.