0% found this document useful (0 votes)
47 views128 pages

Doubt Clearing Session No Anno

Uploaded by

praedie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
47 views128 pages

Doubt Clearing Session No Anno

Uploaded by

praedie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 128
Doubt Clearing Session Course on Computer Organisation and Architecture Sanchit Jain « Lesson 19 + June 28, 202 put / Output management + Ingeneral, when we say a computer, we understand a CPU + Memory (cache, main memory) + But a computer does not serve any purpose if it cannot receive data from the outside world or cannot transmit the data to outside world. * I/o or peripheral devices are those independent devices which serve this purpose. + So, while designing i/o for a computer we must know the number of i/o device and the capacity of each device. Interface we cannot directly connect a i/o device to computer because of the following reasons + Speed: - The speed of CPU and i/o device will usually different. Format: - The data code and format of CPU and peripherals may be different. E.g. ASCII, Unicode etc. Physical orientation: - Different device have organizations like optical, magnetic, electrochemical and different controlling functions. Signal conversion: - peripherals are electromagnetic and electrochemical device and their manner of operation is different from the operations of CPU and memory, which are electronic device, signal conversion is required * Address bus: - Is used to identify the correct i/o devices among the number of i/o device , so CPU put an address of a specific i/o device on the address line, all devices keep monitoring this address bus and decode it, and if it is a match then it activates control and data lines. * Control bus: - After selecting a specific i/o device CPU sends a functional code on the control line. The selected device(interface) reads that functional code and execute it. E.g. i/o command, control command, status command ete. + Data bus: - In the final step depending on the operation either CPU will put data on the data line and device will store it or device will put data on the data line and CPU will store it. Q Consider a main memory system that consists of 8 memory modules attached to the system bus, which is one word wide. When awrite request is made, the bus is occupied for 100 nanoseconds (ns) by the data, address, and control signals. During the same 100 ns, and for 500 ns thereafter, the addressed memory module executes one cycle accepting and storing the data. The (internal) operation of different memory modules may overlap in time, but only one request can be on the bus at any time. The maximum number of stores (of one word each) that can be initiated in 1 millisecond is (Gate-2014) (2 Marks) (a) 1,000 (8) 10,000 (c) 1,00,000 () 100 Py) los wS S56. _ How a computer (CPU) deals with Memory and I/O devices Memory Mapped i/o WO Device Woe w 3085 Here there is no separate i/o instructions. The CPU can manipulate i/o data residing in the interface registers with the same instructions that are used to manipulate memory words. Here computer can use memory type instructions for i/o data. + Eg. 8085 * Advantage: - In typical computer there are more memory reference instruction than i/o instruction, but in memory mapped i/o all instructions that refer to memory are also available for i/o. Disadvantage: -Total address get divided, some range is occupied by i/o while some memory. Add WDeveeA Weve ® + The Commodore 64, also known as the C64 or the CBM 64, is an 8-bit home computer introduced in January 1982 by Commodore international (first shown at the Consumer Electronics Show, in Las Vegas, January 7~10, 1982) + Ithas been listed in the Guinness World Records as the highest-selling single computer model of all time, with independent estimates placing the number sold between 10 and 17 million units. Isolated i/o Here common bus to transfer data between memory or i/o and CPU. The distinction between a memory and i/o transfer is made through separate read and write line. i/o read and i/o write control lines are enabled during an i/o transfer. Memory read and memory write control lines are enabled during a memory transfer. E.g 8086 * Advantage: - Here memory is used efficiently as the same address can be used two times. * Disadvantage: - Need different control lines one for memory and other for i/o devices. 1/O Processor Computer has independent sets of data, address and control buses, one for accessing memory and the other for i/o. this is done in computers that provide a separate i/o processor other than CPU. Memory communicate with both the CPU and iop through a memory bus. lop communicates also with the input and output devices through a separate i/o bus with its own address, data and control lines. The purpose of iop is to provide an independent pathway for the transfer of information between external devices and internal memory. eat = a) | sau | | foal l i = oe Synchronous Vs Asynchronous data transfer ‘Synchronization is achieved by a device called master generator, which generate a periodic train of clock pulse. The internal operations in a digital system are synchronized by means of clock pulses supplied by a common pulse generator. When communication happens between devices which are under a same control unit or same clock then it is called synchronous communication. e.g. communication between CPU and its registers. oy Se, —s Asynchronous elle Asynchronous communication When the timing units of two devices are independent that is they are under different control then it is called asynchronous communication. Asynchronous data transfer between two independent units required that control signals be transmitted between the communicating units to indicate the time at which data is being transmitted. ‘One way to achieving this by means of strobe pulse supplied by one of the units to indicate to the other unit which when the transfer has to occur. + the unit receiving the data item response with another control signal to acknowledge the receipt of the data. this type of agreement between two independent units is referred to as handshaking. + The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to input output transfer. in fact they are used extensively on numerous occasions requiring the transfer of data between two independent units. [=| Data bus : al Source Initiated |/O (a) Block diagram Data ‘Valid data ——e (b) Timing diagram * the unit receiving the data item response with another control signal to acknowledge the receipt of the data. this type of agreement between two independent units is referred to as handshaking. + The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to input output transfer. in fact they are used extensively on numerous occasions requiring the transfer of data between two independent units. Destination Initiated |/O (a) Block diagram Data Valid data wz J Li (b) Timing diagram + the unit receiving the data item response with another control signal to acknowledge the receipt of the data. this type of agreement between two independent units is referred to as handshaking. * The strobe pulse method and the handshaking method of asynchronous data transfer are not restricted to input output transfer. in fact they are used extensively on numerous occasions requiring the transfer of data between two independent units. Source unit Destination unit Disable dara accepted. ne*Ginitial state Source Initiated Transfer Data bus Source Data valid = Ready for data () Block diageam Ready for daa Data vanid (©) Timing diagram Souree unit Destination unit Modes of Data Transfer Here we deal with the problem that how data communication will take place CPU and i/o device. There are popularly three methods of data transfer: - + Programmed i/o * Interrupt initiated i/o + Direct Memory Transfer Programmed 1/O * Inthis i/o device cannot access the memory directly. To complete data transfer number of instructions will be executed out of which input instruction are those which transfer data from device to CPU and store instructions from CPU to memory. cu F = Flag bit ‘Step 1:- i/o device will put data on the data bus and will enable valid data signal Step 2: - Interface is continuously sensing for data valid signal and when it receives the signal it wll copy data from the data bus into it’s data register and set its flag bit to 1 and enable data accepted line. ‘Step 3:- CPU is continuously monitoring the status register in the programmed mode and as soon as it sees flag bit as 1, it immediately copies data from data register on the data bus and clear flag bit to zero, Step 4: - Now interface will disable data accepted line to tell i/o device, | am ready for new transitions. Conclusion: - CPU works in programmed mode or in busy wait mode so no of clock cycles are wasted. It will be difficult to handle multiple i/o device at the same time. Itis not appropriate with the high-speed i/o devices. Wore Data vale » mu device Dus ace: + The best known example of a PC device that uses programmed I/O is the ATA interface Parallel ATA (PATA), originally AT Attachment, is an interface standard for the connection of storage devices such as hard disk drives, floppy disk drives, and optical disc drives in computers. Interrupt ated i/o In this method 1/o device interrupt CPU when it is ready for data transfer. CPU keep executing instructions and after executing one instruction and before starting another instructions CPU wait and see if there is interrupt or not and if there is interrupt then takes a decision whether CPU should entertain this interrupt or continue with the execution. Note instruction is always absolute in nature and there is nothing like partial execution of instruction. ‘The method of handling interrupts of different i/o devices are different, therefor every device has a program or routine called ISR (interrupt service routine) which tell CPU how the interrupt should be managed (and it saves CPU time). + Interrupt can be of two types: - + Non-vectored interrupt: - Here there is a mutual understanding between CPU and device that where this routine is. stored in memory (high priority device) + Vectored interrupt: - Some interrupts may be vectored where the interrupting device will also tell the address that, where this routine is stored in the memory. + Itis possible that different i/o devices interrupt at the same time. Now CPU must have a priority decision that which interrupt should be service first and which should be service later. + H/w solution: - it can be serial or parallel, serial solution is known as Daisy Channing is a h/w solution which is used to decide priority among different i/o devices (VAD- vector address device) Interrupt Request Interrupt Acknowledge * Out of all possible devices 1 or more device may send an interrupt with a common line. * When CPU completes 1 instruction and check interrupt in line and found an interrupt, then CPU will enable interrupt acknowledgment line to 1. The i/o device placed first in the arrangement will always get acknowledgement first. If it wants to perform i/o then it will put on priority outline and will put the address of its interrupt service routine (ISR) on the vector address line. If device do not want to perform i/o then will set priority output as 1 and will give chance to the second device, and the processor continue. Advantage: - very simple, easy to use, easy to understand, relatively fast. Disadvantage: - here priority fixed and even in case of requirement we cannot change it. Interrupt Request * Intel 82C59A Q The following are some events that occur after a device controller issues an interrupt while process L is under execution. (GATE-2018) (2 Marks) (P) The processor pushes the process status of L onto the control stack. (Q) The processor finishes the execution of the current instruction. (R) The processor executes the interrupt service routine. (S) The processor pops the process status of L from the control stack. (T) The processor loads the new PC value based on the interrupt. (B) PTRSQ (C) TRPQS (D)aTPRS QACPU handles interrupt by executing interrupt service subroutine (NET-DEC-2015) (a) By checking interrupt register after execution of each instruction (b) by checking interrupt register at the end of the fetch cycle (c) whenever an interrupt is registered (d) by checking interrupt register at regular time interval QA CPU generally handles an interrupt by executing an interrupt service routine: (GATE-2009) (1 Marks) / a) As soon as an interrupt is raised. b) By checking the interrupt register at the end of fetch cycle. ¢) By checking the interrupt register after finishing the execution of the current instructior d) By checking the interrupt register at fixed time intervals. daisy chain scheme of connecting I/O devices, which of the following | statements is true? (GATE-1996) (1 Marks) a) It gives non-uniform priority to various devices b) It gives uniform priority to all devices ¢) It is only useful for connecting slow devices to a processor device d) It requires a separate interrupt pin on the processor for each device QA device with data transfer rate 10 KB/sec is connected to a CPU. Data is transferred byte-wise. Let the interrupt overhead be 4 microsec. The byte transfer time between the device interface register and CPU or memory is negligible. What is the minimum performance gain of operating. ‘the device under interrupt mode over operating it under program-controlled mode? (GATE-2005) (B) 25 (c)35 Direct Memory Access (DMA) When we want to perform i/o operations then the actual source or destination is either i/o device or memory, but CPU is placed in between just to manage and control the transfer. DMAis the idea where we use a new device is call DMA controller using which CPU allow DMA controller to take control of system buses and perform direct data transfer either from device to memory or from memory to. device. sng-d9-20, Sequence of DMA transfer: - Step 1:- 1/0 device will send a DMA request to DMA controller to perform an i/o oper + Step 2: - DMA controller will set interrupt and bus request line to 1. + Step 3:- CPU using address bus will select device and registers and then will initiate i/o address register(location), counter (no of words) ‘+ Step 4: - CPU will put on the bus grant line to tell DMA controller, now you are the master of system buses. + Step 5:- now DMA controller will put 1 in DMA acknowledgement and using read/write control lines and address line will perform i/o directly between memory and device. Mode of transfer: - + Burst mode: - when entire i/o transfer is completed and then control comes back to CPU then it is called burst mode transfer. i.e. with high speed device like magnetic disk. * Cycle stealing mode: - When CPU executes an instruction then normally their could following phases. * IF—Instruction Fetch * ID-Instruction Decode * OF —Operand fetch * IX— Instruction execute * WB -—write back or store result * Normally in ID and IE phases CPU don’t require system buses and if only in that time control is giving to DMA controller then it is called cycle stealing. + Many hardware systems use DMA, including disk drive controllers, graphics cards, network cards and sound cards. * In the original IBM PC (and the follow-up PC/XT), there was only one Intel 8237 DMA controller capable of providing four DMA channels (numbered 0-3). Motherboard of a NexTeube computer (1990). The two large integrated circuits below the middle of the image are the DMA controller (I.) and - unusual - an extra dedicated DMA controller (r.) for the magneto- optical disc used instead of a hard disk drive in the first series of this computer model. Q The size of the data count register of a DMA controller is 16 bits. The processor | kilobytes from disk to main memory. The memory is byte addressable. times the DMA controller needs to get the control of the system bus from the processor to transfer the file from the disk to main memory is _ __ (GATE-2016) (2 Marks) LA hi trans 10 Mbytes/ second is constantly transferring data to ‘memory using DMA. The processor runs at 600 MHz, and takes 300 and 900 clock cycles to initiate and complete DMA transfer respectively. if the size of the transfer is 20 Kbytes, what is the percentage of processor time consumed for the transfer F operation? (GATE-2004) (2 Marks) (A) 5.0% (B) 1.0% (C) 0.59% (D) 0.1% QA DMA controller transfers 32-bit words to memory using cycle stealing. The words are assembled from a device that transmits characters at a rate of 4800 characters per second. The CPU is fetching and executing instructions at an average rate of one million instructions per second. By how much will the CPU be slowed down because of the DMA transfer? (NET-DEC-2015) (a) 0.6% (b) 0.12% (1.2% (d) 2.5% Q Consider a 32-bit microprocessor, with a 16-bit external data bus, driven by an 8 MHz input clock. Assume that this microprocessor has a bus cycle whose minimum duration equals four input clock cycles. What is the maximum data transfer rate for this microprocessor? (NET-JUNE-2015) a) 8x10® bytes/sec b) 4x10° bytes/sec c) 16x108 bytes/sec d) 4x10° bytes/sec Qn a non-pipelined sequential processor, a program segment, which isa part of the interrupt service routine, s given to transfer 500 bytes from an /O device to memory. Initialize the address register Initialize the count to 500 LOOP: Load a byte from device ‘Store in memory at address given by address register Increment the address register Decrement the count If count 10 goto LOOP ‘Assume that each statement inthis program is equivalent to machine instruction which takes one clock cycle to execute fit is a non-ioad/tore instruction. The load ‘store instructions take two cock cycles to execute. The designer of the system also has an alternate approach of using DMA controle to implement the same transfer. ‘The DMA controller requires 20 clock cycles for intalzation and other overheads. Each DMA transfer cycle takes two clock cycles to transfer one byte of data from the ‘device tothe memory. What is the approximate speedup when the DMA controllr-based design is used in place ofthe interrupt driven program-based input-output? (GATE 2011) (2 Marks) . es wa (aa (sa (067 Qi Consider a computer system with DMA support. The DMA module is transferring one 8-bit character in one CPU cycle from a device to memory through cycle stealing at regular intervals. Consider a 2 MHz processor. If 0.5% processor cycles are used for DMA, the data transfer rate of the device is _____ bits per second. (GATE-2021) Total Transfer Time = SeekTime + Rotational Latency + Transfer Time * Seek Time: - It is a time taken by Read/Write header to reach the correct track. (Always given in question) * Rotational Latency: - It is the time taken by read/Write header during the wait for the correct sector. In general, it’s a random value, so far average analysis, we consider the time taken by disk to complete half rotation. * Transfer Time: - it is the time taken by read/write header either to read or write ona disk. In general, we assume that in 1 complete rotation, header can read/write the either track. * Total time will be = (File Size/Track Size) *time taken to complete one revolution. -2007) w 256 Mbyte, 28 bits (C) 512 Mbyte, 20 bits (D) 64 Gbyte, 28 bit eink entric circular tracks, the seek latency is not linearly proportional to cpa apie due to (GATE-2008) (2 Marks) (A) non-uniform distribution of requests »pping inertia r capacity of tracks on the periphery of the platter (D) use of unfair arm scheduling policies Qf the disk is rotating at 360 rpm, determine the effective data transfer rate which is defined as ‘the number of bytes transferred per second between disk and memory. (track size = 512bytes) 110° bytes/sec. Ifthe average seek time of the disk s twice the average rotational delay and the controller’ transfer time is 10 times the disk transfer time, the average time (in milliseconds) to read or write a 512 byte sector of the d D15) (2 Marks) QConsider a disk pack with a seek time of 4 milliseconds and rotational speed of 10000 rotations per minute (RPM). It has 600 sectors per track and each sector can store 512 bytes of data. Consider a file stored in the disk. The file contains 2000 sectors. Assume that every sector access necessitates a seek, and the ge rotational 3e time for one complete rotation. The total time (in milliseconds) needed to read __: (GATE-2015) (1 Marks) QA certain moving arm disk storage, with one head, has the following specifications: of P eo (GATE-1993) (2 Marks) QA hard disk system has the following parameters: (GATE-2007) (2 Marks) Number of tracks = 500 Number of sectors / track = 100 Number of bytes / sector = 500 Time taken bby the head to move from one track to adjacent track = 1 ms, Rotation speed = 600 rpm. What is the average time taken for transferring 250 bytes from the disk? (A) 300.5 ms (B) 255.5 ms (C) 255.0 ms (D) 300.0 ms QA hard disk has 63 sectors per track, 10 platters each with 2 recording surfaces and 1000 cylinders. The address of a sector is given as a triple (c, h, s), where cis the cylinder number, h is the surface number and s is the sector number. Thus, the Oth sector is addressed as (0, 0, 0), the 1st sector as (0, 0, 1), and so on The address <400,16,29> corresponds to sector number: (GATE-2009) (2 Marks) (a) 505035 (8) 505036 (c) 505037 (0) s0s038 Q Consider a hard disk with 16 recording surfaces (0-15) having 16384 cylinders (0-16383) and each cylinder contains 64 sectors (0-63). Data storage capacity in each sector is 512 bytes. Data are organized cylinder-wise and the addressing format is . A file of size 42797 KB is stored in the disk and the starting disk location of the file is <1200, 9, 40>. What is the cylinder number of the last sector of the file, if itis stored in a contiguous manner? (GATE-2013) (1 Marks ) (a) 1281 (8) 1282 (c) 1283 (0) 1284 Q Consider a disk with 16384 bytes per track having a rotation time of 16 msec and average seek time of 40 msec. What is the time in msec to read a block of 1024 bytes from this disk? (NET-DEC-2015) a) 57 sec b) 49 sec c) 48 sec d) 17 sec QAn application loads 100 libraries at start-up. Loading each library requires exactly one disk access. The seek time of the disk to a random location is given as 10 milli second. Rotational speed of is 6000 rpm. If all 100 libraries are loaded from random locations o1 disk, how. long does it to load all libraries? (The time to transfer data from the disk block once the head has been positioned at the start of the block may be neglected) (GATE-2011) (2 Marks) (A) 0.50s (B) 1.50s (C)1.25s. (D) 1.00s * Single interleaving: - in 2 rotation we read 1 track * Double interleaving: - in 2.75 rotation we read 1 track * No interleaving ¢ Single interleaving « Double interleaving Pipeline If the system has only one processor then at most one instruction can be executed at a time. And if we really want to execute multiple instruction together or concurrently then we must have multiple processors. Pipelining is a phenomena or method using which we will able to run more than one instruction at the same time, on a single processor. * When we actually execute an instruction, it is executed into number of phases, like * Instruction Fetch * Instruction Decode * Operand fetch * Instruction executes * Instruction Store * Where when one phase is completed then only we start with next phase. Instruction Fetch Decode Execute Write Clock (RT) - (RAD BR/TRAP teats Ay RTD wap Branch Fig. 3-2 IBM 801 Architecture + The idea if make special processor (pipelined processor), where the circuit of every phase is different and buffers are placed between stages. Then we can start executing next instruction before completing all the phases of the current executing one. This idea is called pipelining. * Note: Hardware architecture of non-pipelined and pipelined processor are different. Break * The idea if make special processor (pipelined processor), where the circuit of every phase is different and buffers are placed between stages. Then we can start executing next instruction before completing all the phases of the current executing one. This idea is called pipelining. * Note: Hardware architecture of non-pipelined and pipelined processor are different. Instruction Fetch Decode Execute Write Clock Instruction Fetch Decode Execute Write Clock Non-Pipelined Pipelined Q Consider a system where clock is triggering at a speed of 1MHz (1 clock = 1 ps). In a pipelined processor there are 4 stages and each stage take only 1 clock, if a program has 10 instruction then it will take what time? 1/213] 4[5|6|7| 8 | 9 [10] 11|12|13|14|15|16 1. Ona non-pipelined processor [IF 2. Ona pipelined processor 1D EX WB If ll instructions are identical (time taken for specific phase is same for all instruction) Time without pipeline (T,,) = (sum of clocks for each phase of one instruction) *(no of instruction)*time of one clock If each phase requires same clock usually one (as we set the frequency in such a way) = (no of phases * no of instruction) * time of one clock | 10] ss] s2| sa] se] 5] s6]7| s2| a9] 20) a | 20] 20| 2e| 25 26] 27| 2/20/20] 30 | 22] a2| 0/25] a5 | | 30 || |" dt 7 7 7 . rs i ry Q Consider a system where clock is triggering at a speed of 1MHz (1 clock = 1 is). In a pipelined processor there are 4 stages and each stage take only 1 clock, if a program has 10 instruction then it will take what time? 1. Ona non-pipelined processor 2. Ona pipelined processor 1/2|3/4/|5/6/7/ 8| 9 | 10/11) 12| 13/14) 15/16 IF ID EX. WB + Ifall instructions are identical (time taken for specific phase is same for all instruction) + Ifeach phase requires same clock usually one (as we set the frequency in such a way) + Time with pipeline (T,) = ((no of phase) + (no of instruction - 1)) *time of one clock 9 g s + a | a| |s|»|+ 8 /2)2|2|+ ee ee oe 2? | oo] | || +|_> ©) 2|»|=/- re oe os | ot? | ot’ | ot ee oS” | of | ott ajo — =) IF ID EX WB * Speed up = (Time without pipeline (T,,,))/ (Time with pipeline (T,)) = * Max Speed up = no of stages = (In this case 4) * Efficiency = (speed up/max speed up) * 100 = * Speed up = (Time without pipeline (T,,,))/ (Time with pipeline (T,)) = * Max Speed up = no of stages = (In this case 4) * Efficiency = (speed up/max speed up) * 100 = N= 100 * Speed up = (Time without pipeline (T,,,))/ (Time with pipeline (T,)) = * Max Speed up = no of stages = (In this case 4) * Efficiency = (speed up/max speed up) * 100 = N = 10,000 Q Consider a system where we have ‘m’ stages and program contains ‘n’ instruction such that m<=T, d) NONE Q Consider the following processors (ns stands for nanoseconds). Assume that the pipeline registers have zero latency. (Gate-2014) (2 Marks) P,: Four-stage pipeline with stage latencies 1 ns, 2 ns, 2 ns, 1 ns. P,: Four-stage pipeline with stage latencies 1 ns, 1.5 ns, 1.5 ns, 1.5 ns. P,: Five-stage pipeline with stage latencies 0.5 ns, 1 ns, 1 ns, 0.6 ns, 1 ns. "4: Five-stage pipeline with stage latencies 0.5 ns, 0.5 ns, 1 ns, 1 ns, 1.1 ns. Which processor has the highest peak clock frequency? (a), (8) P, (c)P, (0) ps, (B) 160.5 microseconds (C) 165.5 microseconds (D) 590.0 microseconds. Q Consider an instruction pipeline with four stages (S,, S,, S, and S,) each with combinational circuit only. The pipeline registers are required between each stage and at the end of the last stage. Delays for the stages and for the pipeline registers are as given in the figure: i Posie Roget (Oday) I t ; ; } } i I What is the approximate speed up of the pipeline in steady state under ideal conditions when compared to the corresponding non-pipeline implementation? (Gate-2011) (2 Marks) (a) 4.0 (8) 2.5 (a1 (0) 3.0 QAnon-pipelined single cycle processor operating at 100 MHz is converted into a synchronous pipelined processor with five stages requiring e nsec, 1.5 nsec, 2 nsec, 1.5 nsec and 2.5 nsec, respectively. The delay of the latc! . The speedup of the one processor for a large number of instructions is ( weed (2 Marks) 4.5 (B) 4.0 (03.33 (0)3.0 ‘a synchronous pipeline processor. D, has 5 pipeline stages eC, 4 nsec, 2 nsec and 3 nsec while the design D, has 8 pipeline stages each with 2 nsec execution time How much time can be saved using design D, cover design D; for executing 100 instructions? (Gate-2005) (2 Marks) (A) 214 nsec (8) 202 nsec (c) 86 nsec (D) 200 nsec Instruction execution in a processor is divided into 5 stages. Instruction Fetch (IF), Instruction Decode (1D), Operand Fetch (OF), Execute (EX), and Write Back (WB), These stages take 5,4,20, 10 and 3 nanoseconds (ns) respectively. ‘A pipelined implementation of the processor requires buffering beet ‘each pair of consecutive stages with a delay of 2 ns. na Leltnfesmanceate ‘the processor are implementation (NP) with 5 stages and tianeticent pele (EP) where the OF stage id divided into stages OF, and OF, with execution times of 12 ns and 8 ns orrect to two decimals places) achieved by EP over NP in executing 20 independent instructions with no hazards is - (Gate-2017) (2-marks) Q The stage delays in a 4-stage pipeline are 800, 500, 400 and 300 picoseconds. The first stage (with delay 800 picoseconds) is replaced with a functionally equivalent design /olving two stages with respective delays 600 and 350 picoseconds. The throughput increase of the pipeline is _ percent. (Gate-2016) (2 Marks) sider a non-pipelined processor with a clock rate of 2.5 gigahertz and average cycles per ‘instruction of four. The same processor is upgraded to a pipelined processor with five stages; but. due to the internal pipeline delay, the clock speed is reduced to 2 gigahertz. Assume that there are no stalls in the pipeline. The speed up achieved in this pipelined processor is (Gate-2015)( 2-marks) Q Consider a 3 GHz (gigahertz) processor with a three-stage pipeline and stage latencies t,, on 1 such that t1 = = 2t,, If the longest pipeline stage is split into two pipeline stages. of equal latency, the new frequency is__ GHz, ignoring delays in the pipeline registers. (Gate-2016) (2 Marks) Ug cation oe S, and S, writes after something read by S, Anti-dependence: \(S,) 9 O[S,), S, > S, and S, reads something before S, overwrites it Output dependence: (S,) 9 O(S,), S, > S, and both write the same memory location. Example: Let there be two instructions |, and |, such that 1, : ADDR, R,, R 1, :SUBR, R,,R, * When the above instructions are executed in a pipelined processor, then data dependency condition will occur, which means that |; tries to read the data before I, writes it, therefore, |, incorrectly gets the old value from |,. + To minimize data dependency stalls in the pipeline, operand forwarding is used. Solution of Data dependency We can use code movement or code relocation and can execute the dependent instruction after some time. Here we can use operator forwarding using which we can directly access the result after execution instead of waiting that it gets store in memory. Q The instruction pipeline of a RISC processor has the following stages: Instruction Fetch (IF), Instruction Decode (1D), Operand Fetch (OF), Perform Operation (PO) and Writeback (WB), The IF, ID, OF and WB stages take 1 clock cycle ea 0 sider a sequence of 100 instructions. In the PO stage, 40 instructions take 3 cycles ke 2 les each, and the remaining 25 instructions take 1 clock cycle each. ‘Assume that there are no data hazards and no control hazards. The number of clock cycles required for completion. of execution of the sequence of instruction is . (GATE-2018) (2Marks) QA 5-stage pipelined processor has Instruction Fetch (IF), Instruction Decode (ID), Operand Fetch (OF), Perform Operation (PO)and Write Operand (WO)stages. The IF, 1D, OF and WO stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD and SUB instructions,3 clock cycles for MUL instruction, and 6 clock cycles for DIV instruction respectively. Operand forwarding is used in the pipeline. What is the number of clock cycles needed to execute the following sequence of instructions? (Gate-2010) (2 Marks) Instruction ‘Meaning of instruction lp: MULR2,RO,R1—-R2~RO*RI y:DIVRS,R3,R4 RS ~R3/R4 I: ADD R2,R5,R2 -R2~RS+R2 I: SUBRS,R2,R6 -R5~R2-R6 (13 (8) 15, (17 (019 TT?I?]*T:]*)?]*]°]» Q.Consider the sequence of machine instructions given below: (Gate-2015) (2 Marks) ‘MULRS, RO, R2 DWE, R2, RB ‘ADD R?, RS, RE SUBRB, R7, Ra Inthe above sequence, RO to RB are general purpose registers. Inthe instructions shown, the first register stores the result of the operation performed on the second ‘andthe thicd registers. Ths sequence of instructions i to be executed in a pipelined instruction processor withthe following 4 stages: (A) instruction Fetch and Decode (IF), {2) Operand Fetch (OF), (2) Perform operation (PO) and (4) Write back the Resuit (WB). ‘The IF, OF and WB stages take 1 clock cycle each for any instruction. The PO stage takes 1 clock cycle for ADD or SUB instruction, 3 clock cycles for MUL instruction and 5 clock cyces for OV instruction. ‘The pipelined processor uses operand forwarding from the PO stage tothe OF stage. The number ef clock cycles taken forthe execution ofthe above sequence of instructions is t]2]3 ~[ul2[s["[s|e|v]e[e[™ QConsider a pipelined processor with the following four stages: (Gate-2007) (2 Marks) IF: Instruction Fetch 1D: Instruction Decode and Operand Fetch xecute WB: Write Back ‘The I, 10 and WB stages take one clock cycle each to complete the operation. The number of clock cycles for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL instruction needs 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What isthe number of clock cycles taken to complete the following sequence of instructions? ‘ADD R2,R1,RO R2<-RO+ RL MULR4,R3,R2_—R4<-R3* RZ SUBRG,R5,R4—RG<-RS-RA a7 (8 (910 (o)14 Consider a pipelined processor with 5 stages. instruction Fetch (IF). instruction Decode: {WD), Exocute (2). Memory Accoas (MEM), and wrte Back (WB) Each stage ofthe (GATE- 2021) DDpeline,excert no EX stage twee ore eye. Asmue tale YD etage mere Gocodes the inructon and the rogieter reed le perored nthe EX sage. Tre EX ete kee ‘one cycle for ADD nsructon and two eycie for MUL neructon, Ignore pipeline reer ittenclee, Consider the olowng sequence of B nstructons (ADB. MUL, ADD. MUL. ADD, MUL, ADO, MUL “Assure that every MUL inirsction is data-dependent on the ADD intron ist botore {and every ABD ‘netucton (except tre hat ADD) ie cata-depencdent on ihe MUL Ineiructon het bolo The Speedup = defined ae totowe: "Execution time without operand forwarcing ‘Speedup ~ “Execution time with operand forwarding The Speeaup achieved in executing he gven Falrucion sequence on the plpstned processor (rounded 02 decimal places) ie ADD, MUL, ADD, MUL, ADD, MUL, ADO, MUL

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy