100% found this document useful (1 vote)

120 views146 pages

General and Special Purpose Hardware For DSP

General and special purpose Hardware for DSP

Uploaded by

HABEEBU RAHIMAN C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

100% found this document useful (1 vote)

120 views146 pages

General and Special Purpose Hardware For DSP

General and special purpose Hardware for DSP

Uploaded by

HABEEBU RAHIMAN C

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 146

614 11 General- and special- purpose hardware for DSP 11.1 Introduction 615 11.2 Computer architectures for signal processing, 615 11.3 General-purpose digital signal processors 628 11.4 Implementation of DSP algorithms on general-purpose digital signal processors 66 11.5 Special-purpose DSP hardware 662 11.6 Summary 668 References on Bibliography ont Appendix on The main objectives of this chapter are to provide an understanding of the key issues underlying general- and special-purpose processors for DSP, the impact of DSP algorithms on the hardware and software architectures of these processors, and how key DSP algorithms are implemented for real-time execution on general-purpose digital signal processors or realized as a piece of special- purpose hardware. Real-time often implics ‘as soon as possible’ but within specified time limits. Real-time processing may be divided into two broad categories (although further subdivision is possible): stream processing, for example digitalComputer architectures for signal processing 615 filtering, where data is processed one sample at a time, and block processing, for example FFT and correlation, where fixed blocks of data points are processed at a time. The implementation of DSP algorithms in real time requires both hardware and software. The hardware may be an array of processors, standard microprocessors, DSP chips or microprogrammed special-purpose devices. The software is often low level assembly codes or microcodes native to the hardware, although the trend now is to write software codes in an efficient high level language, such as C. 11.1 Introduction For convenience, DSP processors can be divided into two broad categories: general purpose and special purpose. General-purpose DSP processors include devices such as the Texas Instruments TMS320C25, and ADSPS6000 from Motorola. There are two types of special-purpose hardware. (1) Hardware designed for efficient execution of specific DSP algorithms such as digital filters, fast Fourier transform. This type of special-purpose hardware is sometimes called an algorithm-specific digital signal processor (2) Hardware designed for specific applications, for example for telecommunications, digital audio, or control applications. This type of hardware is sometimes called an application-specific digital signal processor. In most cases application-specific digital signal processors execute specific algorithms, such as PCM encoding/decoding, but are also required to perform other application-spevific operations. Examples of special-purpose processors are DSP56200, an FIR digital filter from Motorola, and A100, an FIR digital filter from INMOS. Both general-purpose and special-purpose processors can be designed with single chips or with individual blocks of multipliers, ALUs, memories, and so on. First, we will discuss the architectural features of digital signal processors that have made real-time DSP in many areas possible. 11.2 Computer architectures for signal processing prin Most processors available today are based on the von Neumann concepts, where operations are performed sequentially. Figure 11.1 shows a simplified architecture for a standard von Neumann processor. When an instruction is616 General- and special-purpose hardware for DSP vo devices ‘Address bus i Optional Progr and data memory Product register }------~~-------------, \ Data bus, | Figure 11.1 A simplified architecture for standard microprocessors. processed in such a processor, units of the processor not involved at each instruction phase wait idly until control is passed on to them. Increase in processor speed is achieved by making the individual units operate faster, but there is a limit on how fast they can be made to operate. If it is to operate in real time, a DSP processor must have its architecture optimized for exccuting DSP functions. Figure 11.2 shows a generic hardware architecture suitable for real-time DSP. It is characterized by the following: Multiple bus structure with separate memory space for data and program instructions. Typically the data memories hold input data, intermediate data values, output samples, as well as fixed coefficients for, for example, digital filters or FFTs. The program instructions are stored in the program memory. The I/O port provides a means of passing data to and from external devices such as the ADC and DAC or for passing digital data to other processors. Direct memory access (DMA), if available, allows for rapid transfer of blocks of data directly to or from data RAM, typically under external control. Arithmetic units for logical and arithmetic operations, which include an ALU, a hardware multiplier. Why is such an architecture necessary? Most DSP algorithms (such as filtering, correlation and fast Fourier transform) involve repetitive arithmetic operations such as multiply, add, memory accesses, and heavy data flow through the CPU.Computer architectures for signal processing 617 Xaddress bus Address I generator Yaddress bus } I 1 Paddress bus Xdata ‘memory Product register iN X data bus, iS X l Y data bus u P data bus. Figure 11.2, Hardware architecture for signal processing. The architecture of standard microprocessors is not suited to this type of actiy- ity. An important goal in DSP hardware design is to optimize both the hardware architecture and the instruction set for DSP operations. In digital signal processors, this is achieved by making extensive use of the concepts of parallel- ism. In particular, the following techniques are used: Harvard architecture; pipelining; fast, dedicated hardware muitiplier/accumulator; special instructions dedicated to DSP; replication; on-chip memory/cache. For successful DSP design, it is important to understand these key architectural features. 11.2.1. Harvard architecture The principal feature of the Harvard architecture is that the program and data memories lie in two separate spaces, permitting a full overlap of instruction618 General- and special-purpose hardware for DSP fetch and execution. Standard microprocessors, such as the Intel 6502, are characterized by a single bus structure for both data and instructions, as shown in Figure 11.1. Suppose that in a standard microprocessor we wish to read a value opt at address ADR1 in memory into the accumulator and then store it at two other addresses, ADR2 and ADRS. The instructions could be LDA ADR1 load the operand op1 into the accumulator from ADR STA ADR2 store op1 in address ADR2 STA ADR3 store opt in address ADR3 Typically, each of these instructions would involve three distinct steps: instruction fetch; © — instruction decode; © instruction execute. In our case, the instruction fetch involves fetching the next instruction from memory, and instruction execute involves either reading or writing data into memory. In a standard processor, without Harvard architecture, the program instructions (that is, the program code) and the data (operands) are held in one memory space; see Figure 11.3. Thus the fetching of the next instruction while the current one is executing is not allowed, because the fetch and execution phases each require memory access. In a Harvard architecture (Kigure 11.4), since the program instructions and data lie in separate memory spaces, the fetching of the next instruction can overlap the execution of the current instruction; see Figure 11.5. Normally, the program memory holds the program code, while the data memory stores vari ables such as the input data samples. Strict Harvard architecture is used by some digital signal processors (for example, Motorola DSP56000), but most use a modified Harvard architecture (for example, the TMS320 family of processors). In the modified architecture used by the TMS$320, for exampie, separate program and data memory spaces are still maintained, but communication between the two memory spaces is permissible, unlike in the strict Harvard architecture. 11.2.2 Pipelining Pipelining is a technique which allows two or more operations to overlap during execution. In pipelining, a task is broken down into a number of distinct subtasks which are then overlapped during execution. It is used extensively in digital signal processors to increase speed. A pipeline is akin to a typical production line in a factory, such as a car or television assembly plant. As in the production line, the task is broken down into small, independent subtasksComputer architectures for signal processing 619 LOA ADRI Instruction? | STAADR2 Instruction 3_| STAADR3 ADR ADR2 ADRS () “ULI LU ua | | ' r | Fetch | Decode | Execute | UoanpAr pee cos ' |_ Fetch | Decode | Execute | STAADRZ Pacha ta |_ Fetch | Decode | _ Execute STAADRS Lan a ae ee o) Figure 11.3 An illustration of instruction fetch, decode and execute in a non-Harvard architecture with single memory space: (a) instruction fetch from memory; (b) timing diagram called pipe stages. The pipe stages are connected in series to form a pipe and the stages executed sequentially. As we have seen in the last section, an instruction can be broken down into three steps. Each step in the instruction can be regarded as a stage in a pipeline and so can be overlapped. By overlapping the instructions, a new instruction is started at the start of each clock cycle (Figure 11.6(a)). Figure 11.6(b) gives the timing diagram for a three-stage pipeline, drawn to highlight the instruction steps. Typically, each step in the pipeline takes one machine cycle. Thus during a given cycle up to three different instructions may be active at the same time, although each will be at a different stage of comple- tion. The key to an instruction pipeline is that the three parts of the instruction (that is, fetch, decode and execute) are independent and so the execution of multiple instructions can be overlapped. In Figure 11.6(b), it is seen that, at the ith cycle, the processor could be simultaneously fetching the ith instruction, decoding the (i — 1)th instruction and at the same time executing the (i — 2)th instruction.620 General- and special-purpose hardware for DSP Data memory address bus d] Program memory address bus } y Program Data memory memory A f Program data bus i iz Data bus Figure 11.4 Basic Harvard architecture with separate data and program memory spaces. Data and program instruction fetches can be overlapped as two independent memories are used. 1 ' | { Tan [fash st Decode, |_Frecute, | i | i | Fete! STAADR2 j! block BO 7 OQFF) 0300 : . Onstip Pages 6-7 OFEFF O3FF | OFFOO 0400) External Pages 8-S11 OFFFF OFFFF Figure 11.13 Memory map of the TMS320C25. The on-chip block BO can be configured cither as a data memory (0200h-02FFh) or as a program memory (OFFOOh-OFFFFh). When configured as a program memory, the data memory locations 0200-02FFh cease to exist. graphics and image processing, which are difficult to implement on the earlier generations. The ability of DSP chips to perform DSP operations in floating point is a welcome development. This minimizes finite wordlength effects such as overflows, roundoff errors, and coefficient quantization errors inherent in DSP. It also facilitates algorithm development, as a designer can develop an algorithm ona large computer in a high level language and then port it to a DSP device. In the TMS320C30, floating point multiplication requires 32-bit operands and produces a 40-bit normalized floating point product. Integer multiplication requires 24-bit inputs and yields 32-bit results. Three floating point formats are supported. The first is a 16-bit short floating point format, with 4-bit expo- nents, 1 sign bit and 11 bits for the mantissa. This format is for immediate floating point operations. The second is a single-precision format with an 8-bit exponent, | sign bit and 23-bit fractions (32 bits). The third is a 40-bit extended precision format which has an 8-bit exponent, 1 sign bit, and 31-bit fractions. The floating point representation differs from that of standard IEEE, but facilities are provided to allow conversion between the two formats. The TMS320C30 combines the features of Harvard architecture (separate buses for program instructions, data and I/O) and von Neumann processor (unified address space). Special instructions provided by the C30 include block repeat, bit reversal addressing, and those that can execute in parallel.General-purpose digital signal processors 633 16. Y address bus is } addre is MUX 16 Xaddress bus :xternal t 16 Paaddress bus address ‘bus xternal lata bus 24 YY data bus m4 Y X data bus 4 Program data bus mY Global data bus Figure 11.14 Simplified architecture of the DSP56000/1/2. 11.3.2 Motorola DSP56000 family The Motorola DSP56000 is a high precision fixed point digital signal processor. Its architecture is depicted in Figure 11.14. Internally, it has two independent data memory spaces, the X-data and Y-data memory spaces, and one program memory space. Having two separate data memory spaces allows a natural partitioning of data for DSP operations and facilitates the execution of the algorithm. For example, in graphics applications data can be stored as X and Y data, in FIR filtering as coefficients and data, and in FFT as real and imaginary. During program execution, pairs of data samples can be fetched or stored in internal memory simultaneously in one cycle. Externally, the two data spaces are multi- plexed into a single data bus, reducing somewhat the benefits of the dual internal data memory. The data memories are expandable off chip to 128K words and the program memory is expandable to 64K words. The memory map for the DSP56000 is shown in Figure 11.15. The arithmetic units consists of two 56-bit accumulators and a parallcl fixed point hardware multiplier—accumulator (MAC). The MAC accepts 24-bit inputs and produces a 56-bit product. The 24-bit wordlength provides sufficient accuracy for representing most DSP variables while the 56-bit accumulator wordlength prevents arithmetic overflows. These wordlengths are adequate for most applications, including digital audio, which impose stringent require- ments. The DSPS56000 has an excellent host interface port that simplifies inter- facing to other systems. A uscful feature in the 56000 is that the I/O port can be programmed to insert wait states when slow memory or other peripherals are used. Up to 15 wait states, from half a machine cycle to 7.5 cycles,634 General- and special-purpose hardware for DSP Program Xdata Y data FFFF peripherals yeripherals FRO) ene External program memory ExwernalX External memory Ymemory oy OLFF) Internal CUFF Internal program pa YROM eee GOFF] internal (OFF) internal XRAM ol __YRAM Figure 11.15 Memory map for DSP 56000. can be programmed. The DSPS6000 provides special instructions that allow zero-overhead looping and bit-reversed addressing capability for scrambling input data before FFT or unscrambling the fast Fourier transformed data. There are other derivatives of the DSP56000 family. For example, the DSP56156 is a 16-bit version with built-in codec (coder—decoder) aimed at the telecommunication applications. The DSP96002 is a 32-bit floating/fixed point digital signal processor. It is architecturally similar to the fixed point DSP56000, but with considerable enhancement and greater precision. Like the DSP56000, it has two separate on-chip data memory spaces, X-data and Y-data memory, and one program memory. Each memory space is expandable to 4G words, divided into 0.5G word areas. The wordlength is 32 bits. The multiplier takes 32-bit floating point inputs and produces a 44-bit product, or 32-bit fixed point inputs and produces a 64-bit product. It is equipped with 96-bit/32-bit accumulators. The instruction set supports zero-overhead looping, circular and bit reversal addressing capabilities. Like the DSP56000, wait states can be configured in hardware or programmable in software. 11.3.3. Analog Devices ADSP2100 family The Analog Devices ADSP2100 is one of the few general-purpose DSP chips with no on-chip memory, but it is also unique in having two separate external memory spaces; one holds data only, and the other holds program code as well as data. The block diagram of the ADSP2100 internal architecture is depicted in Figure 11.16. The main components are the ALU, multiplier—accumulator, and shifters. The MAC accepts 16 x 16-bit inputs and produces a 32-bit pro-General-purpose digital signal processors 635 4 Program memory address bus 4 Data memory address bus ‘Cache memory instruction registers Program sequencer Program memory data bus t 16 ¥_Data memory data bus Result bus, Figure 11.16 Simplified architecture of ADSP2100. duct in 1 cycle. A useful feature of the ADSP2100 is that all the arithmetic and logic units (MAC, ALU and shifter) are connected to a common 16-bit result (R) bus. Thus the result of an arithmetic operation from one unit can be used immediately as an input for the next operation by any of the units. The ADSP2100 departs from the strict Harvard architecture, as it allows the storage of both data and program instructions in the program memory. A signal line (data access signal) is used to indicate when data and not program instructions are being fetched from the program memory. Storage of data in the program memory inhibits a steady data flow through the CPU as data and instruction fetches cannot occur simultaneously. To avoid a bottleneck, the ADSP2100 has an on-chip program memory cache which holds the last 16 instructions executed. This eliminates the need, especially when executing program loops, for repeated instruction fetches from the program memory. The ADSP2100 provides special instructions for zero-overhead looping and supports a bit-reversing addressing facility for FFT. It has only hardware wait states. The processor also provides facilities for context switching, that is on interrupt a fast exchange of working registers and shadow registers is performed. After interrupt servicing, the registers are exchanged again, restoring the CPU to its original state. The lack of on-chip memory in the ADSP2100 is a severe restriction, especially in low budget projects. To run at full speed fast memories are necessary, which may be too expensive for low budget applications. Later derivatives ‘of the ADSP2100 have on-chip memory, but combine program and data buses externally much like the DSP56000. They also have software programmable wait states.636 General- and special-purpose hardware for DSP A new generation of the family, the ADSP21000, is a floating point device. The multiplier accepts 32-bit floating point inputs and produces a 32-bit result or 40-bit fixed point inputs and 40-bit results. Also, 32-bit fixed point operands yield 64-bit fixed point results. It is equipped with two 80-bit fixed point accumulators. The processor has a 32 x 48-bit instruction cache, a data memory expandable to 4G x 40-bit words, and program memory expandable externally to 16M x 48-bit words. 11.4 Implementation of DSP algorithms on os general-purpose digital signal processors 11.4.1 FIR digital filtering Nonrecursive N-point FIR filters, with the structure given in Figure 11.17(a), are characterized by the following difference equation (see Chapter 6 for details): N= y(n) = 2 h(k)x(n — k) (11.3 A fragment of a C language implementation of the general FIR filter is given in Program 11.1. For real-time FIR filtering, the data and coefficients are stored in memory, conceptually, as shown in Figure 11.17(b). To appreciate how the FIR filter works, consider the simple case of N = 3, with the following differ ence equation: y(n) = h(0)x(n) + A(1)x(n — 1) + A(2)x(n — 2) (114) x(n) represents the latest input sample, x(/1 — 1) the last sample, and x(a -2) the sample before that. Suppose the three-coefficient digital filter is fed from an ADC. The first thing to do is to allocate two sets of contiguous memory locations (in RAM), one for storing the input data (x(n), x(n — 1), x(m—2)) and the other for the filter coefficients (h(0), h(1), h(2)) as depicted below Data Coefficient RAM memory 0 A(0) 0 ACA) 0 h(2) At initialization, the RAM locations where the data samples are to be stored are set to zero since we always start with no data. The following operations are then performed.Implementation of DSP algorithms on general-purpose digital signal processors 637 x{n = (N= 1] N= 1) vin) (a) La | x(a 1) [aay A(O) kn -(W— )] »] (b) Figure 11.17 Implementation of FIR filter: (a) filter structure; (b) coéfficient and data memory map; (c) alternative memory map. Program 11.1 A C language pseudo-code for FIR filtering. amt=N-1; yn=0; for(k=0; k 60 dB passband edge frequencies 1.575 and 2.175 kHz passband ripple <0.01 dB sampling frequency 7.5 kHz number of coefficients 61 A 61-point, optimal FIR filter satisfies the above specifications. The design of this filter was discussed in detail in Section 6. Here, we will concentrate only on the implementation. The coefficients of the filter are quantized to 16 bits640 General and special-purpose hardware for DSP Table 11.2 Filter coefficients for Example 11.3, Quantized coefficients FILTER LENGTH = 61 ssres IMPULSE RESPONSE «+++ H( 1) = 0.12743640E—02 = H(61) 42 H( 2) = 0.26730640E-05 = H(60) 0 H( 3) = -0.23681110E-02 = H(69) -78 H( 4) = -0.17416350E-05 — (58) ° H( 5) = 0.43428480E-02 = H(57) 142 (H 6) = 0.53579250E-05 = H(56) 0 H( 7) = -0.71570240E-02 = H(55) -235 H( 8) = -0.49028620E-05 = H(54) ° H( 9) = 0.10897540E-01 = H(53) 357 H(10) = 0.89629280E-05 = H(52) o H(11) = —0.15605960E-01 = H(51) -511 H(12) = -0.85508990E-05 = H(50) 0 H(13) = 0.21226410E-01 = H(49) 695 H(14) = 0.12250150E-04 = H(48) 0 H(18) = -0.27630130E-01 = H(47) -905 H(16) = —0.11091200E-04 = H(46) 0 H(17) = 0,34579770E-01 = H(45) 1133 H(18) = 0,13800660E-04 = H(44) 0 H(19) = -0.41774130E-01 = H(43) -1369 H(20) = -0.11560390E-04 = H(42) 0 H(21) = 0.48832790E-01 = H(41) 1600 H(22) = 0.12787590E-04 = H(40) 0 H(23) = -0,55359840E-01 = H(39) 1814 H(24) - —0,90065860E—05 = H(38) ° H(25) = 0,60944450E-01 = H(37) 1997 H(26) = 0.88997300E—-05 = H(36) 0 H(27) = —0.65232190E 01 = H(85) =2137 H(28) = -0.38167120E-05 = H(34) 0 H(29) = 0.67925720E-01 = H(33) 2226 H(30) = 0.27041150E-05 = H(S2) ° H(31) = 0.93115220E +00 = H(31) 30512 (Q15 format), by multiplying each coefficient by 25, and then rounding to the nearest integer. The quantized and unquantized coefficients are listed in Table 11.2. A TMS320C10 program for the notch filter is given on the disk for the book. The flowchart for real-time implementation using the TMS320C10 is n in Figure 11.18. The TMS320C10 implementation makes use of the indirect addressing features of the TMS320, and contains a loop controlled by the BANZ instruction as illustrated in Figure 11.18. We have used indirect addressing to reduce program size and to make the program general purpose. Only the filter length, N, and the coefficients need to be changed to use the programImplementation of DSP algorithms on general-purpose digital signal processors 641 Transfer filter coefficient from TBLR preeram memory to a Initialize a memory on chip Read input data sample x(n) > XN INXN Read data Compute product H(k)x(n—k) > P Me Sum products and load next sample into T register LTD Multiply, accumulate and Shift data sample shift data ‘up next higher address BANZ LOOP YN < sum of products Output YN to DAC Figure 11.18 Flowchart for the TMS320 FIR filter. for a different FIR filter. Program 11A.1 in the appendix which is a TMS320C10 implementation of the design example in Section 6.11 illustrates how the notch filter program (on the disk) can be used for other filters. However, because of the overhead associated with the loop control this ap- proach is slower in execution than if it was coded in straight line.642 General- and special-purpose hardzare for DSP At initialization, the coefficients are transferred from program memory to the data memory using the TBLR instruction. The processor waits for the new input sample x(n) from the ADC to become available (the BIOZ line goes low). When the new sample becomes available, it is read into the memory and the output sample calculated. The two auxiliary registers, ARO and ARI, are used as pointers to the data and coefficient. One is used as a loop counter. The TMS320C25 provides a single-cycle multiply—accumulate instruction, MACD, which helps to cut down the time to execute an FIR filter. With the TMS320C10, most of the time is spent in the BANZ loop. In the TMS320C25, FIR filters with large numbers of coefficients can be efficiently implemented using the instruction pair RPTK NM1 MACD —-HNM1, XNM1 The instruction RPTK NM1 loads the filter length minus 1 (N ~1) into the repeat instruction counter, and causes the MACD instruction following it to be repeated N times. The MACD combines the instruction pair LTD-MPY into a single instruction, enabling faster execution. The instruction pair RPTK and MACD is a good example of time-saving special instructions available in DSP chips. The TMS320C25 implementation of the 61-point FIR filter is also on the disk for the book. The C25 code is more compact and faster than that of the C10. In the C25 implementation the coefficients and data sample sequence are stored in the data memory as shown in Figure 11.17(c). 11.4.2 IIR digital filtering 11.4.2.1 The basic building blocks for IIR filters Second-order IIR filter sections form the basic building blocks for digital IIR filters. The two most widely used second-order structures are the canonic section (Figure 11.19) and the direct form (Figure 11.20). The canonic second- order section is characterized by the following equations: w(n) = sf,x(n) — byw(n — 1) — byw(n — 2) (11.52) y(n) = agw(n) + ayw(n — 1) + aaw(n — 2) (11.56) where x(n) represents the input data, w(») represents the internal node, y(n) is the filter output sample and sf; is a scale factor, equal to 1/s;. A TMS320C10 implementation of this IIR filter section, in straight line code, is given in Program 11.3. The memory map showing the storage of the filter coefficients and internal input data sequence, w(m), is depicted in Figure 11.19(b). (For the TMS320, the feedback coefficients, 6, and b», are in factImplementation of DSP algorithms on general-purpose digital signal processors 643 w(n) x(n) 1 y(n) (oy Higher address fa) Figure 11.20, Implementation of the direct form second-order section: (a) realization diagram; (b) data and coefficient storage. stored with their signs reversed.) As in the case of FIR filter implementation, the multiply-and-add operations implicit in Equations 11.5 are performed with the instruction pair LTD and MPY. The difference equation for the direct form second-order IIR filter (Figure 11.20(h)) section is given by y(n) = agx(n) + ayx(n — 1) + anx(n — 2) — by(n — 1) = boy(n — 2) (11.6) where x(n — k) are the input data sequence and y(n — k) are the output data644° General- and special-purpose hardware for DSP Program 11.3 TMS320C10 straight-line code for a canonic section. NXTPT IN XN, PAO ce XN MPY SFI PAC LT WNM1 MPY BI LTA WNM2 MPY B2 APAG SACH WN ZAC MPY A2 LTD WNM1 MPY At LTD WN MPY AO APAC SACH YN OUT YN, PAO B NXTPT ;READ NEXT DATA ;Scale input sample sf1+x(n) sbt-w(n-1) jstt-x(n) +bt-w(n—1) :b2-win—2) ssave w(n) ;a0-w(n) at-w(n—1) | a0-w(n) ;save output of Program 1.4 TMS320 straight-line code for a direct form second-order section. NXTPT IN XN, ADC ZAC iy XNM2 MPY A2 LTD XNM1 MPY At LTD XN MPY AO LTA YNM2 MPY B2 LTD YNM1 MPY BI LTA YN SACH YN,1 OUT YN,DAC B NXTPT sload T register with data sample x(n—2) sapx(n-2) jSUM=apx(n-2) ‘ayx(n~1) jSUM—a,x(n—2)+a,x(n—1) ‘@ox(n) sSUM=apx(n—2)+a,x(n—1)+a9x(n) sbay(n—2) {SUM=apx(n—2) +a, x(n= 1) +aox(n)+ sboy(n—1) sbiy(n—1) ;SUM=a,x(n—2)+a,x(n—1)-+agx(n)+ ibay(n~2)+b,y(n—1) save the upper 16 bits in data memory location YNImplementation of DSP algorithms on general-purpose digital signal processors 645 sequence. The data and coefficient storage for the direct structure is depicted in Figure 11.20(b) and the TMS32010 code given in Program 11.4. The direct form filter is simpler to program and can lead to a somewhat faster implementation than the canonic section because of the simpler indexing involved: compare, for example, Equations 11.5 and 11.6. 1.4.2.2 Higher-order IIR filters Higher-order IIR filters are realized as either a cascade or a parallel combina- tion of the second-order filter sections (see Chapter 7 for more details). Cascade realization The transfer function, H(z), of an Nth-order IIR filter, using second-order sections in cascade, is given by [ : i Go + a1¢27! + rez H(z) = (11.7) kat 1 bygz! — bz The cascade realization of a fourth-order (N = 4) IIR filter using second-order canonic sections is shown in Figure 11.21(a). The storage of the filter variables (data and coefficients) is shown in Figure 11.21(b). The set of difference equations for the fourth-order IIR filter, using canonic sections, is given by wi(n) = sfix(n) — bywy(n — 1) — baywi(n — 2) (11.8a) YUN) = ay (n) + aywi(n — 1) + aaywi(n — 2) (11.8b) wa(m) = yi(m) — bywi(n — 1) ~ bawo(n — 2) (11.80) Y2(72) = aonwa(n) + ay2w2(n — 1) + aywo(n — 2) (11.8) A C language pseudo-code for an IIR filter realized as a cascade of second- order canonic sections is given in Program 11.5. Example 11.4 = Design and implement a lowpass IIR digital filter using the TMS320C10-based target board (see Chapter 12) to meet the following specifications: sampling frequency 15 kHz passband 0-3 kHz transition width 450 Hz passband ripple 0.5 dB stopband attenuation 45 dB646 General- and special-purpose hardware for DSP x(a) —————— (b) Figure 11.21 Cascade realization of an IIR filter: (a) realization diagram; (b) data and coefficient storage. Program 11.5 C language pseudo-code for a cascade IIR filter. :++n){ /+Nsamples no of data samples «/ for(n=0; n<(Nsamples—1) xn=x{n]; for(k=1; Kshift and save delay node data -/ yin]=yk+ yin]; Table 11.4 Implementation of fourth-order IIR filter of Example 11.5: filter coefficients before and after quantization to 16 bits. Unquantized coefficients Quantized coefficients sfl 0.18100 5931 © 0.249923 79 8190 on -0.1329225 21063 au =0.1805232 32670 bu 0.028994 16251 bas 0.044 5416 —24965 2 0.058534 =4756 ay 0.5084205 20653 by 0.048 4899 27178 ba 0.017951 10061 st; 0.403 32 13216 realization. The transfer function becomes 58.534 + 0.508.420z7! — 0.048 4899z~! + 0.017951 1z 51 = 5.5244844, 5) = 2.4794 + 0.249 92379 Table 11.4 gives the coefficient values before and after quantization to 16 bits. The TMS320C10 and TMS320C25 codes for the filter are available on the PC disk for this book. Extension of the implementation techniques discussed above for both the cascade and parallel structures to higher-order IIR filters is relatively easy.650 General- and special-purpose hardware for DSP However, a more compact code may be obtained by implementing the second- order building block as a subroutine. 11.4.3. FFT processing The discrete Fourier transform (DFT) of a finite data sequence, x(n), is de fined as 1 XH) = 5 xwet where Wy, often called the twiddle factor, is a set of complex coefficients. Direct computation of the DFT coefficients, X(k), is time consuming when N is large. FFT algorithms provide efficient ways of computing X(k) with significant reduction in computation time. As discussed in Chapter 2, the butterfly and twiddle factor are central to FFT algorithms. 11.4.3.1 Implementation of the butterfly Figures 11.23(a) and 11.23(b) depict the two types of butterflies used in the radix-2 FFT. FFTs based on these butterflies lead to the same result. For the decimation in time (Figure 11.23(a)) the butterfly takes a pair of input data, A and B, and produces a pair of outputs: A'=A+ BWK (11.10) B'=A- BWwWk (11.10b) In general the input and output data samples as well as the twiddle factors are all complex and can be expressed as A=A,+jAj (11.118) B= B, + 5B G1.) WK =e P/N = cos(2ak/N) — jsin (2rk/N) (1.110) where the suffix r indicates the real part and i the imaginary part of the data. The butterfly operation in Equations 11.10 involves complex arithmetic, but in practice it is often carried out using real arithmetic. To express the operation in a form suitable for real arithmetic, we note that the product of B and W in Equations 11.10 has the form BW, = B,cos(X) + Bjsin(X) + j[B,cos(X) — B,sin(X)} (11.12) where X = 2nk/N. Using Equations 11.11 and 11.12 in Equations 11.10a and 11.10b we haveImplementation of DSP algorithms on general-purpose digital signal processors 651 A A= A+ WEB A A'=A+B ) © Bow B= A-WhB B WAN p= (4- BW Figure 11.23 The two types of butterflies used in radix-2 FFT algorithm: (a) butterfly for the decimation in time radix-2 FFT; (b) butterfly for the decimation in frequency radix-2 FFT. Program 11.7 A C language pseudo-code for pre-calculating the twiddle factor values. pi=6.28315307179586/N; for(k=0; kK w,(k)x(a — k) (1d) fo where w,(k), k =0, 1, N — 1, are the digital filter coefficients (often called weights) and x(n — k), k=0, 1, ..., N 1, is a sequence of the input data. The implementation of the digital filtcr given in Equation 11.14 is very similar to that of a standard FIR filter discussed earlier. Thus a C language implementation of the filter would have the familiar form yin} rogram 11A.1 A TMS320C10-based implementation of an FIR digital bandpass filter. METAi Assembler 4.00 ©1988 Crash Barrier Page 1 Assembler »0000000 y0000000 »0000000 ¥0000000 30000000 30000000 30000000 90000000 30000000 90000000 v000000 y0000000 90000000 90000000 ¥9000000 »9000000 99000000 90000000 »9000000 ¥9000000 ¥0000000 '000000 F900002B ‘0000028 '0000000 ‘0000028, ‘9000029 10000081 10000078. 000007C 0000000 0000001 0000002 0000003 ‘0000002 0000000 0000001 ‘000002 0000002 0000002 0000002 FEOSFFFEFF58019F 0000002 0283038E03D9 0000009 035001D9FF97FCES 0000009 FAS7F881 30 31 32 33 34 35 36 37 38 39 40 4 42 43 44 45 45 46 46 Q00000F F7EAFSDDFBS2FEES 47 ‘Thu Nov 19 00:37:40 1992 targbpf.asm c:\metai\32010.tab/ NMt XN XNM1 Ho HNMt YN ONE PAO PAI PA2 PAS COEFF RO Rt ctrl 27,15 SEGMENT word at 0000 ‘ram’ FIR BANDPASS FILTER Filter specification: filter type bandpass filter : ; Sampling frequency 15 kHz i ; passband 900-1100 Hz ; ; transition width 450 Hz j passband ripple < 0.87 0B : stopband attenuation > 30 dB filter length 4 Hardware: TMS320C10 Target board with 8-bit ADC/DAC. : 8 START EQU 40 iN-1 EQu 0 ;CURRENT I/P SAMPLE EQu NM1 Equ NM1+1 EQU HO+NM1 EQU 123 EQU 124 EQU 0 jaddress of /O for D/A IN TARGET BOARD EQu 1 EQU 2 EQU 3 EQU 2 ;START ADDRESS OF COEFFS. EQU o EQu 1 STABLE OF COEFFS. THESE ARE INITIALLY ;STORED IN PROGRAM MEMORY. Dc.w ~503,—2,~165,415,691,910,985 Dc.w 848,473, —105,—792,—-1449,-1919

Design and FPGA Implementation of A Digital Signal Processor
No ratings yet
Design and FPGA Implementation of A Digital Signal Processor
48 pages
Advanced Processors: Overview of DSP Unit-5 Unit-6
No ratings yet
Advanced Processors: Overview of DSP Unit-5 Unit-6
58 pages
DSP Lab Manual DSK Technical Programming With C, MATLAB Programs 2008 B.Tech ECE IV-I JNTU Hyd V1.9
80% (5)
DSP Lab Manual DSK Technical Programming With C, MATLAB Programs 2008 B.Tech ECE IV-I JNTU Hyd V1.9
52 pages
UNIT 5 (DSP Processor)
78% (9)
UNIT 5 (DSP Processor)
51 pages
Unit 1
No ratings yet
Unit 1
44 pages
Unit 5 DSP System
100% (2)
Unit 5 DSP System
30 pages
DSP Processor and Architecture
No ratings yet
DSP Processor and Architecture
45 pages
Wa0002.
No ratings yet
Wa0002.
586 pages
DSP Notes
No ratings yet
DSP Notes
15 pages
Chap 15
No ratings yet
Chap 15
60 pages
DSP Processors
No ratings yet
DSP Processors
114 pages
DSP Processors
100% (1)
DSP Processors
24 pages
54x New
No ratings yet
54x New
10 pages
Tiger SHARC Processor
No ratings yet
Tiger SHARC Processor
36 pages
DSP 5th Unit
No ratings yet
DSP 5th Unit
26 pages
DSP C16 - UNIT-6 (Ref-2)
No ratings yet
DSP C16 - UNIT-6 (Ref-2)
26 pages
10 - Chapter 2 PDF
No ratings yet
10 - Chapter 2 PDF
15 pages
Digital Signal Processing Unit V: DSP Processor
No ratings yet
Digital Signal Processing Unit V: DSP Processor
20 pages
DSP Architechture
No ratings yet
DSP Architechture
6 pages
DR Tahir Zaidi: Targets For Algorithms
No ratings yet
DR Tahir Zaidi: Targets For Algorithms
37 pages
DSP Hardware: EKT353 Lecture Notes by Professor Dr. Farid Ghani
No ratings yet
DSP Hardware: EKT353 Lecture Notes by Professor Dr. Farid Ghani
44 pages
DSP Unit-5 Final
No ratings yet
DSP Unit-5 Final
97 pages
Evolution of DSP Processors
No ratings yet
Evolution of DSP Processors
23 pages
ECT303 M5 Ktunotes - in
No ratings yet
ECT303 M5 Ktunotes - in
61 pages
DSP Unit-6
No ratings yet
DSP Unit-6
26 pages
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
100% (21)
DSP Lab Manual C Matlab Programs Draft 2008 B.Tech ECE IV-I JNTU Hyd V 1.9
47 pages
Typical Characteristics: Microprocessor Digital Signal Processing
No ratings yet
Typical Characteristics: Microprocessor Digital Signal Processing
3 pages
Module 5 - DSP
No ratings yet
Module 5 - DSP
38 pages
Seminar Report DSP
No ratings yet
Seminar Report DSP
34 pages
Characteristics of DSP
100% (1)
Characteristics of DSP
15 pages
The Evolution of DSP Processors
No ratings yet
The Evolution of DSP Processors
9 pages
Applications of DSP: Harvard Architecture
No ratings yet
Applications of DSP: Harvard Architecture
4 pages
Rohini 69490291128
No ratings yet
Rohini 69490291128
5 pages
02 Architecture of Arm
No ratings yet
02 Architecture of Arm
43 pages
Digital Signal Processor: Architecture
No ratings yet
Digital Signal Processor: Architecture
3 pages
Unit V Detail
No ratings yet
Unit V Detail
9 pages
CSC 503 - Microprocessor and Microcomputer Theory 1
No ratings yet
CSC 503 - Microprocessor and Microcomputer Theory 1
10 pages
DSP - Module 5
No ratings yet
DSP - Module 5
16 pages
Elec327b DSP Processors 1
100% (1)
Elec327b DSP Processors 1
21 pages
Introduction To Digital Signal Processors (DSPS) - Student
No ratings yet
Introduction To Digital Signal Processors (DSPS) - Student
24 pages
DSP - Presentation - Sumit 4
No ratings yet
DSP - Presentation - Sumit 4
55 pages
DSP - Presentation - Sumit 3
No ratings yet
DSP - Presentation - Sumit 3
63 pages
DSP - Presentation - Sumit 5
No ratings yet
DSP - Presentation - Sumit 5
45 pages
DSP Unit-5
No ratings yet
DSP Unit-5
28 pages
Unit V
No ratings yet
Unit V
7 pages
Esd Ii
No ratings yet
Esd Ii
36 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

General and Special Purpose Hardware For DSP

Uploaded by

General and Special Purpose Hardware For DSP

Uploaded by

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.