0% found this document useful (0 votes)
58 views45 pages

Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform

Uploaded by

ahmed khazal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views45 pages

Implementation of Video Processing Techniques On A Field Programmable Gate Array Development Platform

Uploaded by

ahmed khazal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

View metadata, citation and similar papers at core.ac.

uk brought to you by CORE


provided by Drexel Libraries E-Repository and Archives

Implementation of Video Processing Techniques on a Field Programmable Gate

Array Development Platform

A Thesis

Submitted to the Faculty

Of

Drexel University

by

Michael Thomas Amoruso Jr.

in partial fulfillment of the

requirements for the degree

of

Master of Science in Computer Engineering

June 2015
© Copyright 2015

Michael T. Amoruso Jr. All Rights Reserved.


Dedications

I dedicate this thesis to my father, who taught me what it takes to persevere and succeed

in the world we live in today. Without the support and teachings from my father I would

not be the person I am today, and I cannot thank him enough for taking the effort and

doing the job he did to assure that I can succeed throughout life. The research for this

thesis took roughly a year and a half to complete, and my motivation throughout the

entire process was based on the person I’ve looked up to since I was a little boy, and that

is what helped me persevere and complete the thesis that I have presented. Thank you

dad, you are the best role model and father a man can ask for and this thesis could not

have been completed without your support.


Acknowledgements

I would like to acknowledge and express my appreciation for my advisor, Dr. Prawat

Nagvajara for his support and assistance throughout this entire thesis. His ideas and

approaches were necessary for the completion of this thesis. I would also like to thank

Dr. Andrew Cohen and Dr. Nagarajan Kandasamy for reviewing my research and thesis

defense.
Table of Contents

LIST OF FIGURES ..……………………………………………………………………. v

ABSTRACT ...…………………………………………………………………………. vi

1. INTRODUCTION ………………………………………………...………….……… 1

1.1 Field Programmable Gate Array …………………………………………………….. 1

1.2 Benefits of FPGA ……………………………………………………………………. 1

1.3 Vivado ……………………………………………………………………………….. 3

1.3.1 Integrated Design Environment ……………………………………...……………. 3

1.3.2 Synthesis …………………………………………………………………………... 4

1.3.3 Implementation ……………………………………………………………………. 4

1.3.4 Design Rule Check ………………………………………………………………... 4

1.4 Software Design Kit …………………………………………………………………. 5

1.5 Summary of Design Flow …………………………………………………………… 6

2. APPLICATIONS …………………………………………………………………….. 7

3. THE DEVELOPMENT PLATFORM ……………………………………………….. 8

3.1 ZC702 Evaluation FPGA Board …………………………………………………….. 8

3.2 Hardware Design Component ……………………………………………………… 10

3.2.1 AXI Interconnect ………………………………………………………………… 11

3.2.2 Processing System ……………………………………………………………….. 11

3.2.3 HDMI Input ……………………………………………………………………… 15

3.2.4 HDMI Output …………………………………………………………………….. 18

3.2.5 Filter …………………………………………………………………………….... 19


3.3 Software Design Component ………………………………………………………. 21

3.3.1 Boot Loader ……………………………………………………………………… 22

3.3.2 Xilinx Linux Kernel ……………………………………………………………… 23

3.3.3 Software Design Flow …………………………………………………………..... 23

3.3.4 Xilinx GUI ……………………………………………………………………….. 24

4 CASE STUDIES ……………………………………………………………………... 25

4.1 Canny Filter ………………………………………………………………………... 25

4.2 Grayscale Filter …………………………………………………………………….. 31

4.3 Image Results ………………………………………………………………………. 32

5. CONCLUSION ……………………………………………………………………... 34
v

List of Figures

1. Embedded System design flow using Vivado Tools ......………………..……………. 6

2. ZC702 Block Diagram ………………………………………………………………... 9

3. Block Diagram of Hardware Design Component …………………………………… 10

4. The Processing System IP Core used in Vivado ..…………………………………… 11

5. Avnet HDMI Input/Output FMC Module …………………………………………… 14

6. The top-level FMC HDMI Input IP Core used in Vivado …...……………………… 15

7. The HDMI Output IP Core used in Vivado …….…………………………………… 19

8. The top-level hardware-processing core used in Vivado .…………………………… 20

9. Design Flow in Vivado High-Level Synthesis ……………………………………… 21

10. GUI view with Processed Frame in Background ...………………………………… 24

11. Horizontal and Vertical kernels used for Edge Detection .………………………… 26

12. The convolution process used when processing frames with a kernel …………..… 26

13. Code used for the Sobel Kernel ………………...…..……………………………… 27

14. Code used for processing an image for Edge Detection ..………………………..… 27

15. Gaussian Filter Kernel ………………………...…………………………………… 28

16. Improved Edge Detection Data Flow ……………………………………………… 29

17. Gaussian Operator utilized in the Improved Edge Detection Filter ………………... 30

18. Code Utilized to convert image to Grayscale ……………………………………… 31

19. Original Image ………………………………...…………………………………… 32

20. Image Results from Filtering Methods ………..…………………………………… 33


vi

Abstract

Implementation of Video Processing Techniques on a Field Programmable Gate Array


Development Platform
Michael T Amoruso Jr.
Prawat Nagvajara, Ph.D.

The thesis covers detailed description of a development platform for video processing

system targeted for ON-camera applications. Platforms such as the Zynq and Xilinx Inc.,

integrate high-performance processing, programmable digital intellectual property (IP),

core peripheral, and input-output video signal interfaces into a single FPGA chip. The

advantages of using this method include size, weight, and power (SWAP) requirements in

applications such as pilot helmet vision and binocular video processors. The contents

include an overview of the processor system and IP cores on the FPGA architecture,

video processing IP cores, Integrated Design Environment (IDE) tools, and case studies

on grey scale conversion and canny edge detection. The results from the case studies

display the effectiveness of the design and implementation methodology. Programmable

IP core peripherals enable real-time processing, which is difficult to meet under SWAP

constraints using software alone. The thesis presents studies on the design and

implementation methodology and FPGA video processor platform.


1

Chapter 1: Introduction

In a world where technology is advancing at an exponential rate, the idea of

precision and speed in processing has never been more prevalent. Computers are

becoming smaller and faster than ever before. The idea of using drones and other

technological advances for surveillance and other services leads to the idea that real-time

processing must be computationally precise and swift. This paper will provide the

analysis of performing real-time processing using a Field Programmable Gate Array

platform.

1.1 Field Programmable Gate Array

The Field Programmable Gate Array (FPGA) is a reprogrammable silicon chip.

This type of board was invented by Xilinx in 1984 and is successfully replacing

application-specific integrated circuits (ASICs) and processors for the use of signal

processing and control applications [1]. The convenient part about using an FPGA is the

simple fact that custom hardware can be developed without the tedious process of wiring

a breadboard or using a soldering iron. When FPGAs were first invented, one would need

a deep understanding of digital hardware design to successfully design hardware using an

FPGA. Over the years there have been many advances to the high-level design tools, and

as a result, developing hardware on FPGAs is more convenient than ever before [1].

1.2 Benefits of using an FPGA

The five primary benefits of using an FPGA development platform are:

Performance, Time to Market, Cost, Reliability, and Long-Term Maintenance [1].

FPGA development platforms use true hardware parallelism. Multiple processing

operations will not have to compete for the same resource because FPGAs are designed
2

to dedicate each independent processing task to a different section on the chip. This

notion allows for a greater amount of processing to be done without affecting the overall

performance of the hardware [1].

In the business world of the industrial market, companies want fast and reliant

designs in order to maximize their profits. FPGA designs allow for a faster time to market

than ASIC designs. Using FPGAs, developers can design a custom solution without the

fabrication and assembly time delays that are typically associated with ASIC designs [2].

FPGAs also expedite the process of fixing bugs and modifying hardware to suit customer

needs. Lastly, high-level software tools come with prebuilt functions (Intellectual

Property (IP) cores) that can be used to quickly develop advanced control and signal

processing FPGA designs.

Compared to the costs of an ASIC design, the FPGA costs are substantially lower.

There are no fabrication costs and modifications do not require the large expense of

starting over as in ASIC designs. Many customers will need custom hardware

functionality for the tens to hundreds of systems in development, which gives FPGAs a

competitive cost advantage [1].

The reliability of FPGAs is unmatched due to the deterministic hardware on the

chip that is dedicated to each individual processing task. Unlike other processing systems,

FPGAs do not utilize an operating system to manage memory and processer bandwidth

and therefore rely solely on the true parallel execution and the deterministic hardware on

the chip.

The long-term maintenance of FPGAs is reliable due to the expedited process of

FPGA design. Technology is constantly changing and advancing, and therefore custom
3

designs will always need modifications in order to keep up with the ever-changing

technology [1]. FPGAs are quickly reconfigurable and can keep up with future

modifications with a low monetary cost.

1.3 Vivado

Vivado is a Design Suite that is used for next generation development with FPGA

boards. Vivado addresses the productivity bottlenecks in system-level integration and

implementation. An integrated design environment (IDE) is utilized in order for

engineers to create designs for FPGA development platforms.

1.3.1 Integrated Design Environment (IDE)

The IDE for Vivado provides a graphical user interface (GUI) for the user to

create innovative designs. Opening a design loads the design net list at that particular

stage of the design flow, assigns the constraints to the design, and then applies the design

to the target device [3]. The interface for Vivado is constructed in a way to allow the user

to interact and visualize each of the stages in the user’s design concept. Vivado contains

an Intellectual Property (IP) catalog that contains cores used to create the Processing

System (PS) and Programming Logic (PL). Once the desired cores are wired, a bit stream

is generated that contains the logic of the system. This is generated after the Synthesis,

Implementation, and Design Rule Check (DRC) tests are completed. These tests are

important to assure that the logic, design, and timing constraints are logically accurate

before exporting the design to the software design kit (SDK) for the application creation

process.
4

1.3.2 Synthesis

Synthesis is the process of transforming a Register-transfer level (RTL) specified

design into a gate-level representation [4]. During the synthesis, the design is optimized

and time-driven for performance and memory usage [4]. After synthesis is complete, a

utilization report is created that will give the user information about the utilization of

memory, I/O, clocking, and other features dependent on the individual design.

1.3.3 Implementation

After the synthesis of the design is complete, the implementation process begins.

During implementation, all the necessary steps are executed to place and route the net list

onto the FPGA device resources, while meeting the design’s logical, physical, and timing

constraints [5]. This is important to assure that each IP core is assigned to the proper

resource area on the FPGA based on the constraints provided by either Vivado or the

user.

1.3.4 Design Rule Check (DRC)

DRC takes a physical layout and assures that a series of rules are satisfied based

on recommended parameters known as the design rules. It is important to run this test

after synthesis and implementation are completed in order to find any lingering critical

warnings or errors that might be found in the physical layer after mapping and

optimization have taken place.

After the Synthesis, Implementation, and Design Rule Check tests are satisfied,

Vivado generates a bit stream of the hardware that can be used to export to the SDK for

application design and testing. The bit stream can also be programmed right into the

FPGA device if no application software is required.


5

1.4 Software Design Kit (SDK)

The Xilinx Software Development Kit (SDK) provides a complete environment

for creating software applications targeted for Xilinx embedded processors [6]. The SDK

comes with a GNU-based compiler tool chain (GCC compiler, GDB debugger, utilities

and libraries), JTAG debugger, flash programmer, drivers for Xilinx IP and bare-metal

board support packages, middleware libraries for application-specific functions, and an

IDE for C/C++ bare-metal and Linux application development and debugging [6].

When the hardware is exported to the SDK, the libraries and Application Program

Interfaces (API) for the IP cores located in the hardware are exported as well. From here,

user’s can write software in C/C++ to interface with the hardware itself. After the

applications are written they can be compiled and built in the SDK. If a user wants to

have access to the various peripherals and features in their FPGA board, a Board Support

Package can be created in the SDK to perform these actions.

After the developed code is compiled, the user has several options to uploading

the application to the development platform. One method is to upload the application

through the JTAG port which is made possible by having a connection through the JTAG

port to the computer in which the application has been developed on. The other method is

to create and store a first stage boot loader onto a SD card, and then set the FPGA

development platform to load the application from the SD card. The latter method is used

for this thesis.


6

1.5 Summary of Design Flow

Figure 1: Embedded System design flow using Vivado Tools [6]

The design flow displayed in Figure 1 is the general design flow for creating an

embedded system with Vivado tools. Each design requires a processing system (PS)

based on the FPGA development platform used in the design. From there, the processing

system can be configured for clocking, I/O management, and memory management. After

the PS is configured IP cores can be added and wired to the design. The IP cores can be

taken from the IP core library provided by Vivado or they can be custom made by the

user. After all the logical IP cores are properly configured, placed, and wired, a bitstream

can be generated if the user wishes to test the design on the FPGA development platform,

or the user can export the design for application development. In order for a bitstream

generation to take place, synthesis, implementation, and design rule checks must be

satisfied. After the bitstream is generated, it can be exported to software tools, or

programmed right to the FPGA development platform. If the bitstream is exported to the

SDK, applications can be created for the hardware design. After the applications are

compiled and built, they can be executed on the FPGA board.


7

Chapter 2: Applications

There are several applications that can use video processing techniques using an

FPGA development platform. Embedded systems used in surveillance, drones, and other

mobile embedded devices will take advantage of video processing techniques in a mobile

setting. For instance, unmanned aerial reconnaissance systems (UARS), also known as

drones, use embedded video processing solutions for purposes of multimodal ground

imagery, mosaicking for live updating of terrain maps, and feature-based target tracking

[20]. Handheld Imaging Devices can use video processing techniques for purposes of

electronic stabilization, local contrast enhancement, rolling shutter correction, object

tracking, and other filtering techniques [20].

Military systems can also take advantage of the video processing techniques made

available using an FPGA platform. The Military needs a very high performance processor

with minimal size, weight, and power (SWAP) requirements to enable the design and

fabrication of advanced binocular helmet-mounted displays [23]. Military applications

require embedded systems to be small and mobile in order to fit into constricted spaces.

FPGA boards are relatively small in size for the processing that can be utilized on them.

A system like this can be used to provide unprecedented awareness and target

coordination, while also satisfying the challenges of high-end floating point processing

power, and low power [21]. The military also can use this type of system for their

weapons. Some weapons require tracking devices and video processing techniques can be

used to clearly detect objects and people in order to assure that the weapons are being

used for their intended use.


8

The medical field is another crucial area that can take advantage of video

processing using FPGA development platforms. FPGA boards can be used to improve

image-processing quality and performance and equipment responsiveness [22]. FPGA

design can be used to innovate 3-D vision and precise control for robotic-assisted,

minimally invasive surgery, which means less trauma and faster recover for the patients

[22]. Another application that can use this design in the medical field is electrosurgery.

Electrosurgery is the application of a high-frequency electric current to biological tissue

as a means to cut the tissue. The main benefit of this type of operation is to make precise

cuts with limited blood loss.

Chapter 3: The Development Platform

Vivado is used to design the hardware and software platforms for digital signal

processing on FPGA development platforms. The processing system and programming

logic are designed using the Integrated Design Environment (IDE) and High-Level

Synthesis (HLS) tools; meanwhile the software platform is designed using the Software

Design Kit (SDK). After the design is completed and compiled in Vivado, it is then

programmed onto a Zynq-7000 series FPGA board. The board used for the following

design is the ZC702 evaluation board, which is part of the Zynq-7000 series of FPGA

boards. The hardware and software processing are tested using a GUI application that

allows the user to easily switch between hardware and software processing.

3.1 ZC702 Evaluation FPGA Board

The ZC702 evaluation board provides a hardware environment for developing and

evaluating targeted designs. The important features that are used from this board include

the 1 GB DDR3 component memory, a tri-mode Ethernet PHY, general purpose I/O, two
9

UART interfaces, and the FMC component that is used to attach the VITA-57 FPGA

mezzanine cards [10]. The board also includes an HDMI codec, which is the port used to

display the results from performing digital signal processing with images and video.

Lastly, the SD Card Interface is used to load files from an 8 GB Flash Card. A block

diagram of the ZC702 board can be viewed in Figure 2.

Figure 2: ZC702 Block Diagram [10]


10

3.2 Hardware Design Component

The hardware design contains four major components: The processing system

(PS), HDMI input/output, and video direct memory access (VDMA), and the hardware

filter design in High-level Synthesis (HLS). These three components are connected with

Advanced eXtensible Interface (AXI) Interconnects. The diagram displayed in Figure 3

represents the general block diagram for this design. Input is received from the HDMI

Input Source and streamed to the FMC HDMI Input IP core. There is a video timing

controller in this core that will retrieve necessary timing information. Video data from

this core is stored in the video direct memory access (VDMA). After data is stored into

the VDMA it is sent to the Filter core for processing and then back to the VDMA after

processing is complete. The processed frames are then sent to the output display. This is a

general flow of the hardware design as shown in Figure 3.

Figure 3: Block Diagram of the Hardware Design Component


11

3.2.1 AXI Interconnect

The AXI Interconnect core connects one or more AXI memory-mapped master

devices to one or more memory-mapped slave devices [7]. There are four AXI

Interconnects used throughout the block diagram shown in Figure 3. Three of them are

major AXI Interconnects that connects the processing system to the HDMI Input and

output, and th VDMA. The fourth AXI Interconnect has the formal name of Video In to

AXI4-Stream and is used to interface from a video source to the AXI4-Stream Video

Protocol Interface [9]. This core is essential to provide an interface between the video

input signal and the video processing core. This core is used in parallel with the

functionality of the Video Timing Controller (VTC) in order to detect the line standard of

the incoming video. The VTC is also responsible for detecting timing values, such as the

number of active pixels per line and the number of active lines available to video

processing cores downstream of the Video In to AXI4-Stream interface [9].

3.2.2 Processing System (PS)

Figure 4: The Processing System IP Core used in Vivado


12

The Processing System (PS) for this design utilizes a dual Cortex-A9 core

processor located on the ZC702 FPGA board. This processor implements the ARMx7

architecture and runs 32-bit ARM instructions [8]. The IP core displayed in Figure 4

represents the actual IP core used for the processing system in this design. The ports

S_AXI_HP0 and S_AXI_HP2 are used in order to connect the processing system to the

HDMI Input and Output, the VDMA (video direct memory access) and the hardware

filter for hardware processing. In between the high performance ports and the

components listed are AXI Interconnects, which allow for interaction between the IP

cores listed.

3.2.2.1 S_AXI_HP

The S_AXI_HP is a high performance slave AXI interface that is used to connect

the programming logic (PL) to the processing system (PS). The HP port enables a high

throughput data path between AXI masters in the PL and the PS DDR3 memory [8]. This

is needed in order to smoothly stream data continuously from the DDR to the PL, and the

reverse process as well (PL -> DDR). The Programmable logic in this design runs on a

150 MHz clock, meanwhile the DDR side is running at 355 MHz which is 66% of the

DDR_Clock (533 MHz).

3.2.2.2 Interrupts

The IRQ_F2P[6:0] port displayed on the PS in Figure 4 controls the interrupt

signals that the PS must be aware of. Three of the interrupts come from the VDMAs in

order to buffer incoming frames from the HDMI input signal, transfer frames from the

VDMA to the processing filter, and lastly, send processed frames to the VDMA for
13

storage. Another interrupt is used to retrieve the frames from the incoming HDMI input

signal. One interrupt is used for the filter when the filter is available to process new

frames, and the remaining two interrupts are used for the AXI performance monitor and

the logic video controller. In this processing system, the general interrupt controller

collects the interrupts from the various sources and then distributes the interrupts to each

of the ARM cores [8]. The highest priority interrupt will be handled first in the case when

more than one interrupt is pending. Equal priority interrupts are handled based on the

lowest ID. The order of interrupt priority for this design is as follows: The HDMI input

signal is the highest priority to assure that all the input frames are streamed into the

HDMI port. The second highest priority interrupt belongs to the filter interrupt since the

filter should always be processing new frames when available. The next three priority

interrupts are given to the VDMAs in this order: streaming from VDMA to the filter,

streaming from the filter to the VDMA, and lastly, streaming frames from the HDMI

input to the VDMA. The lowest priority interrupts belong to the logic video controller

(HDMI output) controller, and the performance monitor.

3.2.2.3 I2C Sub-System

The IIC_1 port shown in Figure 4 is made external in order to represent the

HDMI Input/Output FMC Module made from Avnet Electronics Marketing displayed in

Figure 5.
14

Figure 5: Avnet HDMI Input/Output FMC Module [8]

The features of this module include two HDMI interfaces (input and output), an

interface for the ON Vita Semiconductor image sensor, a video clock synthesizer, and the

I2C Configuration [11]. For the purpose of this design, the HDMI output port and the

interface for the ON Vita Semiconductor are not used. The HDMI input utilizes YCbCr

4:2:2 video format in order to represent pixels in 16 bits rather than 24 bits [11]. The

clock synthesizer is used to generate the video clock that is used to drive the display

output. The video input interface consists of the 16-bit video data bus, a data enable, and

horizontal and vertical sync signals [8]. As discussed before, there is an interrupt for the

HDMI port in order to detect if a video signal is incoming or not. The input HDMI

transmitter also utilizes an Extended Display Identification Data (EDID) in order to

provide information on the video resolutions supported by the video sync (the display
15

monitor) to the display controller [8]. The display controller will use this information in

order to generate timing signals that drive the display coming from the HDMI transmitter.

3.2.3 HDMI Input

Figure 6: The top-level FMC HDMI Input IP Core used in Vivado

The IP core displayed in Figure 6 is the top-level representation of the FMC

HDMI Input core. The IO_HDMII port is connected externally to the input HDMI

interface of the FMC Module displayed in Figure 5. The sel[0:0] is used to switch

between the HDMI input from the FMC Module and the Test Generated Pattern which

will be discussed later. The M_AXI_S2MM is the stream coming from the output of this

core associated with the HDMI input and is connected to the AXI Interconnect which is

in turn connected to the processing system as discussed before. This port essentially

streams the HDMI input data to the VDMA for buffering. Lastly, the s2mm_fsync_out

port is connected to the f_sync port of the other VDMA IP that is associated with the

hardware-processing filter. Underneath this top-level representation of the

fmc_hdmi_input module are six IP cores: Clock Multiplexor, FMC-Imageon HDMI


16

Input, Video In to AXI4-Stream, Video Timing Controller, Test Pattern Generator, and

AXI Video Direct Memory Access.

3.2.3.1 Clock Multiplexor

The Clock Multiplexor is used to switch between the clock sourced generated

from the HDMI Interface and the onboard clock generator. If there is a valid HDMI input

signal, then the clock selected is the one generated from the HDMI Interface. If this

signal is not present, then the onboard clock is selected and the Test Generated Pattern is

used as the input.

3.2.3.2 FMC-Imageon HDMI Input

This core is provided by FMC-Imageon and is used to receive the video signal

from the FMC module displayed in Figure 5. The video format is received in YCrCb

4:2:2 as discussed before with embedded vblank and hblank signals.

3.2.3.3 Video In to AXI4-Stream

The Xilinx LogiCORE IP Video In to AXI4-Stream core is designed to interface

from a video source (clocked parallel video data with synchronization signals – active

video with either syncs, blanks, or both) to the AXI4-Stream Video Protocol Interface.

This core works with the VTC and provides a bridge between a video input and video

processing cores with AXI4-Stream Video Protocol Interface [13]. In this design this core

is used to handle video data clock boundary crossing between the video clock domain and

the AXI4-Stream clock domain [8].


17

3.2.3.4 Video Timing Controller (VTC)

The Xilinx LogiCORE IP Video Timing Controller core is a general-purpose

video timing generator and detector. The core comes with a comprehensive set of

interrupt bits, which provides an easy integration into a processor system for in-system

control of the block in real-time [12]. In short, this core is used to synchronize the process

of streaming video to the VDMA from the HDMI Input Interface. The input side of this

core automatically detects horizontal and vertical synchronization pulses, blanking

timing, and active video pixels [8]. In this design, the application software utilizes the

information from the VTC in order to decide whether to switch to the external video

source or not based on measurements of resolution. Another use of the VTC in this

design is to generate horizontal and vertical blanking and synchronization pulses. These

pulses are then used by the Test Pattern Generator to generate a video test pattern.

3.2.3.5 Test Pattern Generator (TPG)

The Xilinx LogiCORE IP Test Patter Generator generates test patterns for video

systems bring up, evaluation, and debugging [14]. The core provides a wide variety of

test patterns that can be used for evaluation of performance. In this design a test pattern

generator is used when an HDMI input video source is not available. The TPG will

generate color bars and a moving box. The resolution of this pattern is 1920x1080.

3.2.3.6 Video Direct Memory Access

The Video Direct Memory Access core provides high-bandwidth direct memory

access between memory and AXI4-Stream video type target peripherals including

peripherals, which support AXI4-Stream Video Protocol [15]. Applications that require
18

frame buffers to handle frame rate changes or changes to the image dimensions use AXI

VDMA in order to allow for efficient high-bandwidth access between AXI4-Stream

video interface and AXI4 interface [15]. There are the interfaces involved with the

VDMA: AXI Streaming and AXI memory-mapped. The AXI streaming interface is used

to receive the video stream and the AXI memory-mapped interface is used to map the

video interface into memory. Associated with these two interfaces are two channels:

MM2S (memory-mapped to streaming) and S2MM (streaming to memory-mapped). In

this design, the MM2S channel reads the number of data bits programmed through the

MM2S’ max burst length parameter and sends it to the slave device connected through

the streaming interface [8]. The S2MM channel receives data from the master device

connected through the streaming interface. Once again a parameter is programmed in

order to determine the width of the streaming interface. Data received on the streaming

interface is then written into the system memory through the memory-mapped device [8].

The streaming interface data width is set at 32 bits and the memory-mapped interface is

configured to 64 bits. The maximum burst length is set to 16 bits in order to achieve the

best possible throughput. System throttling is reduced, and system performance is

enhanced, by enabling the store and forward feature of the VDMA [8].

3.2.4 HDMI Output

The IP core displayed in Figure 7 represents the core used to stream data to the

display monitor via the HDMI output port. The HDMI output makes use of the

Multilayer Video Controller (MVC) in order to control the output display. The MVC

receives the video data through the MM2S channel of the VDMA. The data is the sent via

the vid_io port to an external display (display monitor).


19

Figure 7: The HDMI Output IP core used in Vivado

3.2.4.1 Multilayer Video Controller

The logiCVC-ML IP core is an advanced display graphics controller for LCD

and CRT displays provided by Xylon [16]. The main function of this controller is to

provide flexible display control. The logiCVC-ML controller refreshes the display image

by reading the video memory and converting the read data into a data stream acceptable

for the display interface [8].

3.2.5 Filter (High-Level Synthesis)

Vivado High-Level Synthesis (HLS) design tools are used to bridge software and

hardware together by compiling C specifications into Register Transfer Level (RTL)

implementation that can then be synthesized into a Xilinx Field FPGA. The top-level of

the hardware filter IP core that is shown in Figure 8 contains the hardware processing

filter that is created using Vivado HLS. There are several benefits to using the HLS

design methodology for both hardware and software designers: First, development time is
20

decreased due to C-level specification design and verification. Second, optimization

directives allow creation of specific high-performance hardware implementations and

will improve the likelihood of finding the most-optimal implementation [17]. This will

allow for the filtering in hardware to be performed in real-time.

Figure 8: The top-level hardware-processing core used in Vivado

The design flow of creating an IP core using Vivado HLS is shown in Figure 9.

First the C-specifications are written in C, C++, or System-C. Then the code is simulated

with a test bench created in C to assure that the logic is accurate and the correct results

are produced with no errors. After the simulation is completed, synthesis is performed on

the C specifications. Reports are generated from the synthesis in order to understand the

performance of the implementation [17]. Lastly, the IP core is packaged as shown in

Figure 8 to then be utilized in the Vivado IDE.

After the IP core is packaged, it can be used in the Vivado IDE block diagram.

Underneath the top-level design of the processing core displayed in Figure 8 is the filter

created in Vivado HLS. Video data is taken from the VDMA through the M_AXI_MM2S

port and is processed using the filter. After the processing is complete, the data is sent

back to the VDMA through the M_AXI_S2MM port. The mm2s_fsync is used to sync
21

the frames that are streaming through the HDMI input core as shown in Figure 6.

Therefore, the mm2s_fsync port displayed in Figure 8 is connected to the

s2mm_fsync_out port displayed in Figure 6.

Figure 9: Design Flow in Vivado High-Level Synthesis

3.3 Software Design Component

The Software Design component consists of the software architecture used for the

design of this thesis. The top-level view of this design can be viewed in Figure 10. The

software design makes use of a Qt-based GUI in order to display the resulting video

stream from the processed data. There are several device drivers that are used in the

kernel in order to capture data, process the data, and then display the data. Only the

software processing is actually done in the application, the hardware processing is


22

achieved through the filter created in the HLS. The GUI has a feature that allows the user

to easily switch between the HW and SW filtering. Several binaries are also created using

Linux, which are stored on an 8 GB SD (Secure Digital) card. This flash card is used for

the purpose of implementing the hardware design into the FPGA development board as

well as starting up the application and the GUI.

3.3.1 Boot Loader

The Boot loader is responsible for the power-on boot-up process; the non-

changeable boot code resides in the boot ROM. At power-on, the boot ROM reads the

boot mode register to determine the boot mode, which is user-configurable [8]. For the

purpose of this design the boot mode is configured for SD Card booting. In the SD Card

booting mode, the boot ROM reads the boot configuration header from the SD Card,

which is located in the binary named, “BOOT.bin” [8]. The other file elements located in

this binary are the boot header, the first stage boot loader (FSBL), the bit stream, and the

u-boot. The boot header contains the information about the other contests located in the

“BOOT.bin” binary as well as their offset, sizes, and security information [8]. The first

stage boot loader is responsible for initializing the minimum required hardware to

program the PL bitstream, and load and execute the u-boot, which is the second-stage

boot loader [8]. The u-boot is a universal boot loader that is used across various

embedded platforms. In this architecture, the u-boot loads the kernel image in the DDR

memory and is also responsible for completing the hardware initialization [8]. Lastly, the

bitstream contains the PL hardware that is programmed into the FSBL.


23

3.3.2 Xilinx Linux Kernel

There are several drivers used in the Linux kernel. The Xylon DRM driver is used

to drive the display controller to display the application UI and control data path [8]. The

Xilinx VDMA engine driver is used to configure, start, and stop the VDMA. The Xilinx

video pipeline driver will communicate with the Xilinx VDMA engine driver by using a

slave-DMA API [8]. The Xilinx IIC driver will configure the IIC controller and provide

IIC read and write functions. The ADV7511 driver is used as an HDMI encoder for the

FPGA. The Xilinx video pipeline driver for the purpose of transmitting the HDMI signal

calls the ADV7611 driver [8]. Lastly, the User space I/O driver (UIO) allows the ability

to write the majority of a driver in user space with only the shell of the driver in the

kernel. This driver uses char device and sysfs to interact with a user space process to

process interrupts and control memory accesses. For the purpose of this design the UIO

driver maps the device address space, and then controls the device using read/write as

defined by the register map [8]. This is used to monitor the performance of the system

when the signals are being processed.

3.3.3 Software Design Flow

There are four major parts to the design flow: Input, Memory, Processing, and

Output. The input signal is streamed from an HDMI source. In the previous section, there

was a discussion about the two VDMAs used in this design. The first VDMA is used to

store the initial video signal from the HDMI input source. The second VDMA is used to

store the processed signal before the stream is sent to the HDMI output source. The

processing can be done in either hardware or software. This is the general design flow,

the hardware is designed in the Vivado IDE and the software drives the IP cores. The
24

next major section will describe how the software drives the design for successful digital

signal processing. Code examples from the design itself will be documented in the next

major section as well in order to completely describe the software process.

3.3.4 Xilinx GUI

Xilinx provides a GUI for the ease of displaying the processed video frames as

shown in Figure 10 [8]. The GUI is designed using the Qt framework and has several

features involving the input and the output of the HDMI sources. The GUI is designed

with a mouse as the input device for the purpose of changing certain settings. For the

purpose of this design, the test pattern generator is used if there is no available input

HDMI source. The user is able to change between no filtering, software filtering, or

hardware filtering. The GUI also displays two graphs corresponding to the CPU usage

and High performance port usage when video frames are processed. The output from the

video pipeline is displayed behind the GUI as shown in Figure 10.

Figure 10: GUI view with processed frame in the background


25

Chapter 4: Case Studies

4.1 Improved Edge Detection Filter

The improved edge detection filter implemented in this design is a combination of

both a Sobel Filter and a Gaussian filter in order to detect significant edges in the frames

of the input video stream.

The Sobel operator is used particularly for detecting edges. The operator

calculates the gradient of the image intensity at each point in order to give the direction of

the largest possible increase from light to dark within the frame [18]. This abrupt change

is a pattern that will tell how likely an edge is present at this particular pixel in the frame,

as well as the orientation of the edge.

In order to successfully detect the edges in the frame, two 3x3 kernels are used to

calculate approximations of the derivatives in both the horizontal and vertical directions

of the frame. Figure 11 displays the values used for the two kernels, the kernel on the left

represents the values used in the horizontal direction, and the kernel on the right

represents the values used in the vertical direction. Figure 12 displays the convolution

process that takes place in order to mathematically detect the edges in the frame. Figures

13 and 14 display the code used for the Sobel part of the improved edge detection filter.
26

Figure 11: Horizontal (left) and Vertical (right) kernels used for edge detection in the
improved edge detection Filter

Figure 12: The convolution process used when processing frames with a kernel

In Figure 12, the original image frame is represented by the pixels “a##”, and the

processed image frame is represented by “b##”. The kernel is represented by “k##”. The

convolution process begins at the 2nd row and column of pixels because the edges of the

frame will cause the kernel to go outside of the boundaries of the frame, therefore the

edges are not processed with the kernel. After “b22” is calculated, the kernel will shift one

pixel to the right and begin the calculation of “b23”. This Sobel operator uses intensity

values in a 3x3 region around each image point to approximate the corresponding image
27

gradient [18]. In order to calculate the gradient magnitude, the resulting values from the

vertical and horizontal kernel convolutions are squared individually and then added

together. The gradient magnitude is the square root of this result. If Bx is the resulting

pixel from the horizontal convolution, and By is the corresponding pixel from the vertical

convolution, then the gradient magnitude is the square root of (Bx2 + By2).

void hls_sobel(IplImage *_src, IplImage *_dst)


{
Mat src(_src);
Mat dst(_dst);
AXI_STREAM src_axi, dst_axi;
cvMat2AXIvideo(src, src_axi);
image_filter(src_axi, dst_axi, src.rows, src.cols,
1, 0, -1, 2, 0, -2, 1, 0, -1,
1, 2, 1, 0, 0, 0, -1, -2, -1,
HLS_SOBEL_HIGH_THRESH_VAL,
HLS_SOBEL_LOW_THRESH_VAL,
HLS_SOBEL_INVERT_VAL);
AXIvideo2cvMat(dst_axi, dst);
}

Figure 10: Code used for the Sobel kernel [8]

// Compute approximation of the gradients in the X-Y direction


for (int i = 0; i < 3; i++){
for (int j = 0; j < 3; j++){
// X-direction gradient
x_weight += (window->getval(i, j) * x_op[i][j]);

// Y-direction gradient
y_weight += (window->getval(i, j) * y_op[i][j]);
}
}
// Combine the weights
edge_weight = ABS(x_weight) + ABS(y_weight);
Figure 14: Code used for processing an image for edge detection [8]
28

Figure 13 shows the values that correspond to Sobel filter kernel as displayed in

Figure 11. The actual processing takes place in Figure 14. The x_weight and y_weight

are calculated using two for loops in order to cover the 3x3 area around the new pixel in

the horizontal and vertical gradient directions. The arrays “x_op” and “y_op” contain the

kernel values for the Sobel filtering in their respective directions. After the convolution

process is complete as shown in Figure 12, the two weights are added together to form

the new processed pixel.

The Gaussian filter is used to blur the image before the Sobel filter is applied. The

difference between the original image and the blurred image will be the amount of detail

in the image. The blurred image will have less detail than the original image. In essence,

the Sobel filter will detect more significant edges of the original image.

In order to successfully blur the image, one 3x3 kernel is used in a similar fashion

as the Sobel filter. Rather than deriving the pixels, the Gaussian filter will take a rough

average of the pixels around the center image of the kernel using the kernel displayed in

Figure 15. The convolution process shown in Figure 12 is used with the kernel shown in

Figure 15 to form the new processed image.

In the case of the Gaussian filter: b22 = ((a11 * k11) + … + (a33 * k33)) / 16

Figure 11: Gaussian Filter Kernel


29

The code shown in Figure 16 shows the flow of the canny filter. The

“AXIvideo2Mat” function will take in the stream frame by frame. Each frame is then

processed using the Gaussian filter first, and then the Sobel filter second. The values in

the Sobel filter that being with a “C” are realized from the function in Figure 13 when the

“image_filter” function is called. Lastly, each frame is sent to the output stream after

processing is complete. From the parameters, “video_in” represents the stream coming

into the filter from the VDMA. The frames are stored into a matrix, “img_0”. From there

the gauss filter is applied and the resulting image is stored in “img_gauss_tmp”. Next, the

sobel filter is applied and the resulting frame is stored into “img_1”. Lastly, the processed

frame is streamed to the VDMA through “video_out”.

#pragma HLS dataflow


hls::AXIvideo2Mat(video_in, img_0);
gauss_filter_core(img_0, img_gauss_tmp, rows, cols,
1, 2, 1, 2, 4, 2, 1, 2, 1);
sobel_filter_core(img_gauss_tmp, img_1, rows, cols,
C_XR0C0, C_XR0C1, C_XR0C2,
C_XR1C0, C_XR1C1, C_XR1C2,
C_XR2C0, C_XR2C1, C_XR2C2,
C_YR0C0, C_YR0C1, C_YR0C2,
C_YR1C0, C_YR1C1, C_YR1C2,
C_YR2C0, C_YR2C1, C_YR2C2,
c_high_thresh, c_low_thresh,
c_invert);
hls::Mat2AXIvideo(img_1, video_out);

Figure 16: Improved edge detection filter dataflow


30

The code shown in Figure 17 represents the processing needed to blur the image.

The values that begin with “X” are the kernel values that are shown in Figure 15. Once

again, two for loops are used in order to cover the 3x3 area that the kernel is convoluting.

The pixel values are multiplied by their respective kernel values and then the total is

divided by 16 in order to get an average which in turn represents a blurred pixel with

respect to the original pixel.

YUV_PIXEL gauss_operator(Y_window *window,


int XR0C0, int XR0C1, int XR0C2,
int XR1C0, int XR1C1, int XR1C2,
int XR2C0, int XR2C1, int XR2C2)
{
short x_weight = 0;

YUV_PIXEL pixel;

char i, j;

const char x_op[3][3] = {{XR0C0, XR0C1, XR0C2},


{XR1C0, XR1C1, XR1C2},
{XR2C0, XR2C1, XR2C2}};
// Compute approximation of the gradients in the X-Y direction
for (i = 0; i < 3; i++){
for (j = 0; j < 3; j++) {
// X direction gradient
x_weight += (window->getval(i,j) *
x_op[i][j]);
}
}

pixel.val[0] = x_weight / 16;


pixel.val[1] = 128;

return pixel;
}

Figure 12: Gaussian Operator utilized in the improved edge detection Filter
31

4.2 Grayscale Filter

The main idea behind a grayscale filter is to convert the value of the original pixel

to a single pixel that contains solely the intensity information [19]. The grayscale filter

will produce a frame that varies from white to black. Shades of black represent the pixels

with the weakest intensity; meanwhile shades of white represent pixels with the strongest

intensity.

There are several different algorithms that can be utilized in order to convert a

frame consisting of RGB pixels to a frame that consists of pixels in grayscale. The

method used in this design first converts the RGB pixels to YUV format. After the frame

is in YUV format, the Chroma value of each pixel is set to 128. The result from this

algorithm is displayed in Figure 18.

// Grayscale filter core

void grayscale_core(YUV_IMAGE &src, YUV_IMAGE &dst, int rows, int cols)


{
YUV_PIXEL new_pix;

for (int row = 0; row < rows + 1; row++){


for (int col = 0; col < cols + 1; col++){
#pragma HLS loop_flatted off
#pragma HLS PIPELINE II = 1

src >> new_pix;


new_pix.val[1] = 128;
dst << new_pix;
}
}
}

Figure 18: Code utilized to covert image to grayscale

The code shown in Figure 18 represents the code used to describe the grayscale

process. Two for loops are used in order to cover the entire frame. In each iteration of the
32

for-loop, the Chroma value of each YUV_PIXEL is set to 128 in order to give the

appearance of a black and white (grayscale) image.

4.3 Image Results

Figure 19: Original Image

The image displayed in Figure 19 represents the original image used before

processing is performed. The images displayed in Figure 20 represent the processed

images. One can clearly see the Gaussian filter has blurred the image from the original

image displayed in Figure 19. The grayscale result gives a black and white representation

of the original image. The improved edge detection filter shows the edges that were

detected from the image processed with the Gaussian Filter. Lastly, the result from the

Sobel filter is displayed in order to show that there are more edges detected when the
33

image is not blurred initially. More significant edges can be detected using the Canny

Filter if the image is further blurred.

Improved Edge Detection Filter

Figure 20: Image Results from Gaussian (top-left), Grayscale (top-right), Improved edge
detection (bottom-left), and Sobel (bottom-right) filtering
34

Chapter 5: Conclusion

In conclusion, video processing can be quick and precise in mobile applications

by utilizing next generation FPGA development platforms. In the future, mobile

embedded vision devices will be all around the world in our homes, in the workplace, in

transportation, in the medical field, and in the military field. The subject of video

processing can automate certain tasks with the ability to detect characteristics in frames

and perform different tasks based on the recognition algorithms that can be implemented.

This thesis provides in full detail the base design for implementing video processing on

an FPGA development platform. From this design, other filters can be easily

implemented into the Filter section for different processing methods.

Avnet predicts that there will be over 14 million embedded vision processing

units by 2018 [24]. FPGA development platforms are improving every day, and the

processing power will only become greater in the future. Therefore, FPGA development

platforms will be utilized to perform tasks that a Graphical Processing Unit (GPU) would

normally be required for. By integrating these advanced algorithms onto a FPGA

development platform, embedded devices can perform GPU processing techniques in a

mobile setting.
35

List of References

1. National Instruments, (2012). Introduction to FPGA Technology: Top 5 Benefits,


http://www.ni.com/white-paper/6984/en/, 5 May 2014.

2. Mike Thompson, (2004). FPGAs accelerate time to market for industrial designs,
http://www.design-reuse.com/articles/8190/fpgas-accelerate-time-to-market-for-
industrial-designs.html, 5 May 2014.

3. Xilinx Inc., (2014). Vivado Design Suite User Guide: Using the Vivado IDE.
Ledgewood, NJ.

4. Xilinx Inc., (2014). Vivado Design Suite User Guide: Synthesis. Ledgewood, NJ.

5. Xilinx Inc., (2014). Vivado Design Suite User Guide Implementation. Ledgewood, NJ.

6. Xilinx Inc., (2014). Vivado Design Suite User Guide Embedded Processor Hardware
Design. Ledgewood, NJ.

7. Xilinx Inc., (2014). LogiCORE IP AXI Interconnect v2.1 Product Guide Vivado
Design Suite. Ledgewood, NJ.

8. Xilinx Inc., (2014). Zynq-7000 All Programmable SoC ZC702 Base Targeted
Reference Design User Guide. Ledgewood, NJ.

9. Xilinx Inc., (2014). LogiCORE IP Video In to AXI4-Stream v3.0 Product Guide Vivado
Design Suite. Ledgewood, NJ.

10. Xilinx Inc., (2014). ZC702 Evaluation Board for the Zynq-7000 XC7Z020 All
Programmable SoC User Guide. Ledgewood, NJ.

11. Avnet Electronics Marketing, (2012). FMC ON Hardware Guide. Parsippany, NJ.

12. Xilinx Inc., (2014). Video Timing Controller v6.1 LogiCORE IP Product Guide
PG016. Ledgewood, NJ.

13. Xilinx Inc., (2014). LogiCORE IP Video In to AXI4-Stream v3.0 Product Guide
Vivado Design Suite PG043. Ledgewood, NJ.

14. Xilinx Inc., (2014). Test Pattern Generator v6.0 LogiCORE IP Product Guide Vivado
Design Suite PG103. Ledgewood, NJ.
36

15. Xilinx Inc., (2014). LogiCORE IP AXI Video Direct Memory Access v6.2 Product
Guide Vivado Design Suite PG020. Ledgewood, NJ.

16. Xylon Technologies, (2014). Xylon Linux Framebuffer Driver For Use with Xylon’s
Display Controller IP Core – logiCVC-ML Compact Multilayer Video Controller User’s
Manual Version: 3.00.a. Kaloor, India.

17. Xilinx Inc., (2014). Vivado Design Suite User Guide: High-Level Synthesis, UG902
(v2014.1). Ledgewood, NJ.

18. H.M., (1996). Algorithm Designs: Graph Theory. Pedia Press, Mainz, Germany.

19. Stephen Johnson (2006). Stephen Johnson on Digital Photography. O'Reilly Media,
Sebastopol, CA.

20. SRI International, (2015). Computer Vision for Embedded Systems,


http://www.sri.com/engage/products-solutions/embedded-computer-vision, 13 May 2015.

21. BittWare FPGA Platforms, (2015). Light-weight Radar System for UAVs and
Manned Systems, http://www.bittware.com/fpga-dsp-applications/applications-
stories/radar-system, 14 May 2015.

22. Kamran Khan, (2012). FPGAs Help Drive Innovation in Complex Medical Systems,
http://www.medicalelectronicsdesign.com/article/fpgas-help-drive-innovation-complex-
medical-systems, 14 May 2015.

23. 5-D Systems, (2014). Vision Processor for Helemet System (VPHS),
http://www.5dsystems.com/vision-processor-for-helmet-systems-vphs.php, 15 May 2015.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy