0% found this document useful (0 votes)
88 views71 pages

Functional Units of A Computer System: (A) Arithmetic Logical Unit (ALU)

The document discusses the functional units of a computer system which are divided into four main units - the arithmetic logical unit (ALU), control unit, central processing unit (CPU), and input/output unit. The ALU performs arithmetic and logical operations on data. The control unit determines the sequence of instruction execution and coordinates other units. The CPU consists of the ALU and control unit and acts as the brain of the computer. The input/output unit facilitates data input and output.

Uploaded by

nbpr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views71 pages

Functional Units of A Computer System: (A) Arithmetic Logical Unit (ALU)

The document discusses the functional units of a computer system which are divided into four main units - the arithmetic logical unit (ALU), control unit, central processing unit (CPU), and input/output unit. The ALU performs arithmetic and logical operations on data. The control unit determines the sequence of instruction execution and coordinates other units. The CPU consists of the ALU and control unit and acts as the brain of the computer. The input/output unit facilitates data input and output.

Uploaded by

nbpr
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

computer organization and architecture Unit no: 1

FUNCTIONAL UNITS OF A COMPUTER SYSTEM

The computer system is divided into three separate units for its operation. These are 1) arithmetic logical unit, 2) control unit, and 3) central processing unit. 4) Input output unit.

(a) Arithmetic Logical Unit (ALU) After you enter data through the input device it is stored in the primary storage unit. Arithmetic Logical Unit performs the actual processing of data and instruction. The major operations performed by the ALU are addition, subtraction, multiplication, division, logic and comparison. Data is transferred to ALU from storage unit when required. After processing, the output is returned back to storage unit for further processing or getting stored. (b)Control Unit The next component of computer is the Control Unit, which acts like the supervisor seeing whether things are done in proper fashion. The control unit determines the sequence in which computer programs and instructions are executed. Things like processing of programs stored in the main memory, interpretation of the instructions and issuing of signals for other units of the computer to execute them. It also acts as a switch board operator when several users access the computer simultaneously. Thereby it coordinates the activities of computers peripheral equipment as they perform the input and output. Therefore it is the manager of all operations mentioned in the previous section. (c) Central Processing Unit (CPU) The ALU and the CU of a computer system are jointly known as the central processing unit. You may call CPU as the brain of any computer system. It is just like a human brain that takes all major decisions, makes all sorts of calculations and directs different part of the computer by activating and controlling the operations. (d)Input/output unit A computer must receive both data and program statements to function properly and be able to solve problems. The method of feeding data and programs to a computer is accomplished by an input device. Computer input devices read data from a source, such as magnetic disks, and translate that data into electronic impulses for transfer into the CPU. Some typical input devices are a keyboard, a mouse, or a scanner. Output unit sends processed results to the outside world. Examples: Display screens,

printers, plotters, modems, microfilms,synthesizers, high-tech blackboards,film recorders. Basic Operational Concepts of a Computer Most computer operations are executed in the ALU (arithmetic and logic unit) of a processor. Example: to add two numbers that are both located in memory. Each number is brought into the processor, and the actual addition is carried out by the ALU. The sum then may be stored in memory or retained in the processor for immediate use. Registers When operands are brought into the processor, they are stored in high-speed storage elements (registers). A register can store one piece of data (8-bit registers, 16-bit registers, 32-bit registers, 64-bit registers, etc) Access times to registers are faster than access times to the fastest cache unit in the memory hierarchy. Instructions Instructions for a processor are defined in the ISA (Instruction Set Architecture) Level 2 Typical instructions include: Mov BX, LocA Fetch the instruction Fetch the contents of memory location LocA Store the contents in general purpose register BX Add AX,BX Fetch the instruction Add the contents of registers BX and AX Place the sum in register AX How are instructions sent between memory and the processor The program counter (PC) or instruction pointer (IP) contains the memory address of the next instruction to be fetched and executed. Send the address of the memory location to be accessed to the memory unit and issue the appropriate control signals ( memory read). The instruction register (IR) holds the instruction that is currently being executed. Timing is crucial and is handled by the control unit within the processor.

Single BUS STRUCTURES

Bus structure and multiple bus structures are types of bus or computing. A bus is basically a subsystem which transfers data between the components of Computer components either within a computer or between two computers. It connects peripheral devices at the same time. - A multiple Bus Structure has multiple inter connected service integration buses and for each bus the other buses are its foreign buses. A Single bus structure is very simple and consists of a single server. - A bus can not span multiple cells. And each cell can have more than one buses. Published messages are printed on it. There is no messaging engine on Single bus structure I)In single bus structure all units are connected in the same bus than connecting different buses as multiple bus structure. i)multiple bus structure's performance is better than single bus structure. ii)single bus structure's cost is cheap than multiple bus structure.

Performance:

Instruction formats
The purpose of an instruction is to specify both an operation to be carried out by a CPU or other processor and the set of operands or data to be used in the operation. The operands include the input data or arguments of the operation and the results that are produced. most instruction specify a register-transfer operation of the form X1: = op(x1,x2,.xn) In the 680x0 family, simple instructions are assigned short formats e.g. the add-register instruction ADD.L D1,D2 Denotes register-to register addition of 32-bit operands, that is D2 := D2 + D1 Where D1 & D2 are two of the 680x0s data registers. the instruction specifies the memory-to-register addition operation

D2 := D2 + M(ADR1) Instruction format of the RISC 1 Set condition code Set immediate address

Opcode Instruction Types

source Rs

destination Rd

source S2

instructions are divided into the following five types: 1)Data-transfer instruction, which copy information from one location to another location either in the processors internal register set or in the external main memory. Operation :MOVE,LOAD,STORE,SWAP,PUSH,POP 2)Arithmetic instructions, which perform operations on numerical data. Operation: ADD,ADD WITH CARRY,SUBTRACT,MULTIPLY 3)Logical instructions, which include Boolean and other nonnumerical Operations. Operation: AND,OR,NOT,EXCLUSIVE OR,LOGICAL SHIFT 4)Program control instructions, such as branch instruction, which change the sequence in which programs are executed. Operation: JUMP,RETURN,EXECUTE,SKIP CONDITIONAL COMPARE,TEST,WAIT 5)Input-output (IO) instructions, which cause information to be transferred Between the processor or its main memory and external IO devices. Operation: INPUT, OUTPUT, START IO, TEST IO, HALT IO

Computer software
software is a general term used to describe the role that computer programs, procedures and documentation play in a computer system. The term includes: Application software, such as word processors which perform productive tasks for users. Firmware, which is software programmed resident to electrically programmable memory devices on board mainboards or other types of integrated hardware carriers. Middleware, which controls and co-ordinates distributed systems. System software such as operating systems, which interface with hardware to provide the necessary services for application software. Software testing is a domain dependent of development and programming. Software testing consists of various methods to test and declare a software product fit before it can be launched for use by either an individual or a group. Testware, which is an umbrella term or container term for all utilities and application software that serve in combination for testing a software package but not necessarily may optionally contribute to operational purposes. As such, testware is not a standing configuration but merely a working environment for application software or subsets thereof.

Types of software

System software System software helps run the computer hardware and computer system. It includes a combination of the following: device drivers operating systems servers utilities windowing systems

The purpose of systems software is to unburden the applications programmer from the often complex details of the particular computer being used, including such accessories as communications devices, printers, device readers, displays and keyboards, and also to partition the computers resources such as memory and processor time in a safe and stable manner. Examples are- Windows XP, Linux and Mac. Programming software Programming software usually provides tools to assist a programmer in writing computer programs, and software using different programming languages in a more convenient way. The tools include: compilers debuggers interpreters linkers text editors Application software Application software allows end users to accomplish one or more specific (not directly computer development related) tasks. Typical applications include: industrial automation business software computer games quantum chemistry and solid state physics software telecommunications (i.e., the internet and everything that flows on it) databases educational software medical software military software molecular modeling software image editing spreadsheet simulation software Word processing Decision making software

Instruction Set Architecture (ISA)


The Instruction Set Architecture (ISA) is the part of the processor that is visible to the programmer or compiler writer. The ISA serves as the boundary between software and hardware. We will briefly describe the instruction sets found in many of the microprocessors used today. The ISA of a processor can be described using 5 catagories: Operand Storage in the CPU Where are the operands kept other than in memory? Number of explicit named operands

How many operands are named in a typical instruction. Operand location Can any ALU instruction operand be located in memory? Or must all operands be kept internally in the CPU? Operations What operations are provided in the ISA. Type and size of operands What is the type and size of each operand and how is it specified? Of all the above the most distinguishing factor is the first. The 3 most common types of ISAs are: 1. Stack - The operands are implicitly on top of the stack. 2. Accumulator - One operand is implicitly the accumulator. 3. General Purpose Register (GPR) - All operands are explicitely mentioned, they are either registers or memory locations. Lets look at the assembly code of A = B + C;
Stack PUSH A PUSH B ADD POP C Accumulator LOAD A ADD B STORE C GPR LOAD R1,A ADD R1,B STORE R1,C -

Stack Advantages: Simple Model of expression evaluation (reverse polish). Short instructions. Disadvantages: A stack cant be randomly accessed This makes it hard to generate eficient code. The stack itself is accessed every operation and becomes a bottleneck. Accumulator Advantages: Short instructions. Disadvantages: The accumulator is only temporary storage so memory traffic is the highest for this approach. Advantages: Makes code generation easy. Data can be stored for long periods in registers. Disadvantages: All operands must be named leading to longer instructions. Earlier CPUs were of the first 2 types but in the last 15 years all CPUs made are GPR processors. The 2 major reasons are that registers are faster than memory, the more data that can be kept internaly in the CPU the faster the program wil run. The other reason is that registers are easier for a compiler to use.

superscalar processor--can execute more than one instructions per cycle. cycle --smallest unit of time in a processor. parallelism --the ability to do more than one thingat once. pipelining --overlapping parts of a large task to increase throughput without decreasing latency

Addressing Modes
The addressing mode specifies a rule for interpreting or translating the address field of the instruction into the effective address from where the operand is actually referenced. Types of addressing modes are Immediate Addressing:

This is the simplest form of addressing. Here, the operand is given in the instruction itself. This mode is used to define a constant or set initial values of variables. The advantage of this mode is that no memory reference other than instruction fetch is required to obtain operand. The disadvantage is that the size of the number is limited to the size of the address field, which most instruction sets is small compared to word length. Direct Addressing: In direct addressing mode, effective address of the operand is given in the address field of the instruction. It requires one memory reference to read the operand from the given location and provides only a limited address space. Length of the address field is usually less than the word length. Ex : Move P, Ro, Add Q, Ro P and Q are the address of operand. Indirect Addressing: Indirect addressing mode, the address field of the instruction refers to the address of a word in memory, which in turn contains the full length address of the operand. The advantage of this mode is that for the word length of N, an address space of 2N can be addressed. He disadvantage is that instruction execution requires two memory reference to fetch the operand Multilevel or cascaded indirect addressing can also be used. Register Addressing: Register addressing mode is similar to direct addressing. The only difference is that the address field of the instruction refers to a register rather than a memory location 3 or 4 bits are used as address field to reference 8 to 16 generate purpose registers. The advantages of register addressing are Small address field is needed in the instruction. Register Indirect Addressing: This mode is similar to indirect addressing. The address field of the instruction refers to a register. The register contains the effective address of the operand. This mode uses one memory reference to obtain the operand. The address space is limited to the width of the registers available to store the effective address. Displacement Addressing: In displacement addressing mode there are 3 types of addressing mode. They are : 1) Relative addressing 2) Base register addressing 3) Indexing addressing. This is a combination of direct addressing and register indirect addressing. The value contained in one address field. A is used directly and the other address refers to a register whose contents are added to A to produce the effective address. Stack Addressing: Stack is a linear array of locations referred to as last-in first out queue. The stack is a reserved block of location, appended or deleted only at the top of the stack. Stack pointer is a register which stores the address of top of stack location. This mode of addressing is also known as implicit addressing.

Reduced Instruction Set Computer (RISC)


As we mentioned before most modern CPUs are of the GPR (General Purpose Register) type. A few examples of such CPUs are the IBM 360, DEC VAX, Intel 80x86 and Motorola 68xxx. But while these CPUS were clearly better than previous stack and accumulator based CPUs they were still lacking in several areas: 1. Instructions were of varying length from 1 byte to 6-8 bytes. This causes problems with the pre-fetching and pipelining of instructions. 2. ALU (Arithmetic Logical Unit) instructions could have operands that were memory locations. Because the number of cycles it takes to access memory varies so does the whole instruction. This isn't good for compiler writers, pipelining and multiple issue. 3. Most ALU instruction had only 2 operands where one of the operands is also the destination. This means this operand is destroyed during the operation or it must be saved before somewhere. Thus in the early 80s the idea of RISC was introduced. The SPARC project was started at Berkeley and the MIPS project at Stanford. RISC stands for Reduced Instruction Set Computer. The ISA is composed of instructions that all have exactly the same size, usualy 32 bits. Thus they can be pre-fetched and pipelined succesfuly. All ALU instructions have 3 operands which are only registers. The only memory access is through explicit LOAD/STORE instructions. Thus A = B + C will be assembled as: LOAD R1,A LOAD R2,B ADD R3,R1,R2 STORE C,R3 Although it takes 4 instructions we can reuse the values in the registers. Why is this architecture called RISC? The answer is that to make all instructions the same length the number of bits that are used for the opcode is reduced. Thus less instructions are provided. The instructions that were thrown out are the less important string and BCD (binary-coded decimal) operations. In fact, now that memory access is restricted there arent several kinds of MOV or ADD instructions. Thus the older architecture is called CISC (Complete Instruction Set Computer). RISC architectures are also called LOAD/STORE architectures. The number of registers in RISC is usualy 32 or more. The first RISC CPU the MIPS 2000 has 32 GPRs as opposed to 16 in the 68xxx architecture and 8 in the 80x86 architecture. The only disadvantage of RISC is its code size. Usualy more instructions are needed and there is a waste in short instructions (POP, PUSH).

The CISC Approach


The primary goal of CISC architecture is to complete a task in as few lines of assembly as possible. This is achieved by building processor hardware that is capable of understanding and executing a series of operations. For this particular task, a CISC processor would come prepared with a specific instruction (well call it "MULT"). When executed, this instruction loads the two values into separate registers, multiplies the operands in the execution unit, and then stores the product in the appropriate register. Thus, the entire task of multiplying two numbers can be completed with one instruction: MULT 2:3, 5:2 MULT is what is known as a "complex instruction." It operates directly on the computers memory banks and does not require the programmer to explicitly call any loading or storing functions. It closely resembles a command in a higher level language. For instance, if we let "a" represent the value of 2:3 and "b" represent the value of 5:2, then this command is identical to the C statement "a = a * b." One of the primary advantages of this system is that the compiler has to do very little work to translate a high-level language statement into assembly. Because the length of the code is relatively short, very little RAM is required to store instructions. The emphasis is put on building complex instructions directly into the hardware.
CISC Emphasis on hardware Includes multi-clock complex instructions Memory-to-memory: "LOAD" and "STORE" incorporated in instructions Small code sizes, high cycles per second Transistors used for storing complex instructions RISC Emphasis on software Single-clock, reduced instruction only Register to register: "LOAD" and "STORE" are independent instructions Low cycles per second, large code sizes Spends more transistors on memory registers

ht tp :// cs et ub e.c o. nr /

ht tp :// cs et ub e.c o. nr /

computer organization and architecture Unit no: 2

Control Unit has two major functions: To control the sequencing of information-processing tasks performed by machine Guiding and supervising each unit to make sure that each unit carries out every operation assigned at the proper time Control of a computer can be distributed or centralized Early computers used distributed control and a lot of redundant hardware

BASIC PROCESSING UNIT

ht

tp

://

cs

et

ub

e.c

o.

nr

PROCESSING UNIT FEATURES

Execution of a Complete Instruction


o Add (R3), R1 /* R1 [R1] + [[R3]]

o Adds the contents of a memory location pointed to by R3 to register R1. o Sequence of control steps: 1. 2. 3. 4. 5. 6. PCout, MARin, Read, Select4, Add, Zin Zout, PCin, Yin, WMFC MDRout, IRin R3out, MARin, Read R1out, Yin, WMFC MDRout, SelectY, Add, Zin

ht

tp

://

cs

et

ub

e.c

o.

nr

7. Zout, R1in, End

Multiple bus architecture


Single-bus structure: Control sequences are long as only one data item can be transferred over the bus in a clock cycle. Figure on next slide shows a three-bus structure. All registers are combined into a single block called register file with three ports: 2 outputs allowing 2 registers to be accessed simultaneously and have their contents put on buses A and B, and 1 input allowing data on bus C to be loaded into a third register.

ht

tp

://

cs

et

ub

e.c

o.

nr

Buses A and B are used to transfer source operands to the A and B inputs of ALU, and result transferred to destination over bus C.

For the ALU, R=A (or R=B) means that its A (or B) input is passed unmodified to bus C. Add R4, R5, R6 /* R6 [R4] + [R5] o Adds the contents of R4 and R5 to R6. Sequence of control steps: o PCout, R=B, MARin, Read, IncPC o WMFC o MDRoutB, R=B, IRin o R4outA, R5outB, SelectA, Add, R6in, End

ht

tp

://

cs

et

ub

e.c

o.

nr

Hardwired control
The control logic is implemented with gates, F/Fs, decoders, and other digital circuits To execute instructions, a computer's processor must generate the control signals used to perform the processor's actions in the proper sequence. This sequence of actions can either be executed by another processor's software or in hardware. Hardware signals are generated either by hardwired control, in which the instruction bits directly generate the signals hardwired control usually was implemented using discrete components, flip-chips, or even rotating discs or drums. This can be generally done by two methods.

The classical method of sequential circuit design. It attempts to minimize theamount of hardwire, in particular, by using only log2p flip flops to realize a p state circuit. An approach that uses one flip flop per state. While expensive in terms of flip flops, this method simplifies controller unit design and debuggi

Combinational logic Determine outputs at each state. Determine next state. Storage elements Maintain state representation

State Machine

Inputs

Combinational Logic Circuit Storage Elements

Outputs

Clock

Hardwired Implementation

The Cycles (Fetch, Indirect, Execute, Interrupt) are constructed as a State Machine The Individual instruction executions can be constructed as State Machines

Common sections can be shared. There is a lot of similarity One ALU is implemented. All instructions share it

ht

tp

://

cs

et

ub

e.c

o.

nr

Microprogrammed control
A control unit whose binary control variables are stored in memory (control memory). The Control Memory contains sequences of microinstructions that provide the control signals to execute instruction cycles, e.g. Fetch, Indirect, Execute, and Interrupt Microinstruction : Control Word in Control Memory The microinstruction specifies one or more microoperations Microprogram A sequence of microinstruction Dynamic microprogramming : Control Memory = RAM n RAM can be used for writing (to change a writable control memory) n Microprogram is loaded initially from an auxiliary memory such as a magnetic disk Static microprogramming : Control Memory = ROM n Control words in ROM are made permanent during the hardware production. Microprogrammed control Organization : 1) Control Memory A memory is part of a control unit : Microprogram Computer Memory (employs a microprogrammed control unit) --Main Memory : for storing user program (Machine instruction/data) --Control Memory : for storing microprogram (Microinstruction) 2) Control Address Register Specify the address of the microinstruction 3) Sequencer (= Next Address Generator) Determine the address sequence that is read from control memory

Next address of the next microinstruction can be specified several way depending on the sequencer input : 4) Control Data Register (= Pipeline Register ) Hold the microinstruction read from control memory Allows the execution of the microoperations specified by the control word simultaneously with the generation of the next microinstruction

ht tp :// cs et ub e.c o. nr /

Microprogrammed control
--Typical Microinstruction Formats

--Micro-instruction Types Each micro-instruction specifies single (or few) micro-operations to be performed (vertical micro-programming) Each micro-instruction specifies many different micro-operations to be performed in parallel (horizontal micro-programming) Vertical Micro-programming Width is narrow n control signals encoded into log2 n bits Limited ability to express parallelism Considerable encoding of control information requires external memory word decoder to identify the exact control line being manipulated

Micro-instruction Address Function Codes Jump Condition

Horizontal Micro-programming Wide memory word High degree of parallel operations possible Little encoding of control information

Internal CPU

Control Signals

Micro-instruction Address

System Bus Control Signals

Jump Condition

Use a 2-level control storage organization Top level is a vertical format memory Output of the top level memory drives the address register of the bottom (nano-level) memory Nanomemory uses the horizontal format Produces the actual control signal outputs The advantage to this approach is significant saving in control memory size (bits) Disadvantage is more complexity and slower operation (doing 2 memory accesses fro each microinstruction)

Nanoprogramming

Example: Supppose that a system is being designed with 200 control points and 2048 microinstructions Assume that only 256 different combinations of control points are ever used A single-level control memory would require 2048x200=409,600 storage bits A nano programmed system would use Microstore of size 2048x8=16k Nanostore of size 256x200=51200 Total size = 67,584 storage bits Nano programming has been used in many CISC microprocessors

ht tp :// cs et ub e.c o. nr /

Nano programmed machine

ht

tp

://

cs

et

ub

e.c

o.

nr

Unit no: 3

Hazard (computer architecture)


In computer architecture, a hazard is a potential problem that can happen in a pipelined processor. It refers to the possibility of erroneous computation when a CPU tries to simultaneously execute multiple instructions which exhibit data dependence. There are typically three types of hazards: data hazards, structural hazards, and branching hazards (control hazards). Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being executed, and instructions may not be completed in the desired order. All the data hazards discussed here involve registers within the CPU. By convention, the hazards are named by the ordering in the program that must be preserved by the pipeline. RAW (read after write) WAW (write after write) WAR (write after read) Consider two instructions i and j, with i occurring before j. The possible data hazards are: RAW (read after write) - j tries to read a source before i writes it, so j incorrectly gets the old value. This is the most common type of hazard and the kind that we use forwarding to overcome. WAW (write after write) - j tries to write an operand before it is written by i. The writes end up being performed in the wrong order, leaving the value written by i rather than the value written by j in the destination. This hazard is present only in pipelines that write in more than one pipe stage or allow an instruction to proceed even when a previous instruction is stalled. The DLX integer pipeline writes a register only in WB and avoids this class of hazards. WAW hazards would be possible if we made the following two changes to the DLX pipeline:

Data Hazards
We must ensure that the results obtained when instructions are executed in a pipelined processor are identical to those obtained when the same instructions are executed sequentially. Hazard occurs A3+A B 4 A No hazard A5C B 20 + C When two operations depend on each other, they must be executed sequentially in the correct order. Another example: Mul R2, R3, R4 Add R5, R4, R6

Operand Forwarding
Instead of from the register file, the second instruction can get data directly from the output of ALU after the previous instruction is completed. A special arrangement needs to be made to forward the output of ALU to the input of ALU.
Source 1 Source 2

SRC1

SRC2

Register file

ALU

RSLT Destination (a) Datapath SRC1,SRC2 E: Ex ecute (ALU) RSLT W: Write (Register file)

Forw arding path (b) P osition of the source and result registers in the processor pipeline

Figure 8.7.

Operand forw arding in a pipelined processor .

Handling Data Hazards in Software


Let the compiler detect and handle the hazard: I1: Mul R2, R3, R4 NOP NOP I2: Add R5, R4, R6 The compiler can reorder the instructions to perform some useful work during the NOP slots.

Side Effects:
The previous example is explicit and easily detected. Sometimes an instruction changes the contents of a register other than the one named as the destination. When a location other than one explicitly named in an instruction as a destination operand is affected, the instruction is said to have a side effect. (Example?) Example: conditional code flags: Add R1, R3 AddWithCarry R2, R4 Instructions designed for execution on pipelined hardware should have few side effects.

Instruction Hazards
-> Whenever the stream of instructions supplied by the instruction fetch unit is interrupted, the pipeline stalls. ->Cache miss ->Branch

Unconditional Branches
Clock cycle Instruction I1 I 2 (Branch) I3 Ik I k+1 F1 E1 F2 E2 F3 X Fk Ek Fk+ 1 Ek+ 1 1 2 3 4 5 6

Time

Execution unit idle

Figure 8.8. An idle c ycle caused by a branch instruction.

Branch Timing

--Branch

penalty - Reducing the penalty

Instruction Queue and Prefetching

Conditional Branches
A conditional branch instruction introduces the added hazard caused by the dependency of the branch condition on the result of a preceding instruction. The decision to branch cannot be made until the execution of that instruction has been completed. Branch instructions represent about 20% of the dynamic instruction count of most programs.

Datapath and Control Considerations


Datapath: portion of the processor which contains hardware necessary to perform all operations required by the computer (the brawn). Control: portion of the processor (also in hardware) which tells the datapath what needs to be done (the brain).

Following operations can be performed independently in the processor :


Reading an instruction from instruction cache Incrementing the PC Decoding an instruction Reading from or writing into data cache Reading the contents of up to two registers from the register file Writing into one register in register file Performing an ALU operation

Performance Considerations Performance = (accuracy, cost of misprediction) Branch History Table (BHT): Lower bits of PC used to index table of 1-bit values Says whether or not branch taken last time No address check (unlike caches) Problem: in loop, 1-bit BHT causes double misprediction End-of-loop case, when it exits instead of looping

First loop pass next time: predicts exit instead of looping(Average loop does about 9 iterations before exit)

Exception Types

Exception Handling

I/O device request Breakpoint Integer arithmetic overflow FP arithmetic anomaly Page fault Misaligned memory accesses Memory-protection violation Undefined instruction

Exception Requirements

Privilege violation Hardware and power failure

Synchronous vs. asynchronous I/O exceptions: Asyncronous Allow completion of current instruction Exceptions within instruction: Synchronous Harder to deal with User requested vs. coerced Requested predictable and easier to handle User maskable vs. unmaskable Resume vs. terminate Easier to implement exceptions that terminate program execution Stopping & Restarting Execution Some exceptions require restart of instruction e.g. Page fault in MEM stage When exception occurs, pipeline control can: Force a trap instruction into next IF stage Until the trap is taken, turn off all writes for the faulting (and later) instructions OS exception-handling routine saves faulting instruction PC Precise exceptions Instructions before the faulting one complete Instructions after it restart As if execution were serial Exception handling complex if faulting instruction can change state before exception occurs Precise exceptions simplifies OS Required for demand paging

computer organization and architecture Unit no: 4

Memory organization RAM composed of a large number of (2M) of addressable locations, each of which stores a w-bit word. RAM operates as follows: first the address of the target location to be accessed is transferred via the address bus to the RAMs address buffer. The address is then processed by the address decoder, which selects the required location in the storage cell unit. If a read operation is requested, the contents of the addressed location are transferred from the storage cell unit to the data buffer and from there to the data bus. If a write operation is requested, the word to be stored is transferred from the data bus to the selected location in the stored unit. The storage unit is made up of many identical 1-bit memory cells and their Interconnections. In each line connected to the storage cell unit, we can expect to find a driver that acts as either an amplifier or a transducer of physical signals. Organization assume that each word is stored in a single track and that each access results In the transfer of a block of words. The address of the data to be accessed is applied to the address decoder, whose output determines the track to be used and the location of the desired block of Information within the track. the track address determines the particular read-write head to be selected.The selected head is moved into position to transfer data to of from the target track. A track position indicator generates the address of the block that isCurrently passing the read-write head. The generated address is compared with the block address produced by the address decoder.

The selected head is enabled and the data transfer between the storage track and the memory data buffer register begins. The read-write head is disabled when a complete block of information has been transferred.

Static memories (RAM)

Circuits capable of retaining their state as long as power is applied Static RAM(SRAM) volatile

ht

tp

://

cs

et

ub

e.c

o.

nr

DRAMS:
Charge on a capacitor Needs Refreshing

Synchronous DRAMs
Synchronized with a clock signal

ht

tp

://

cs

et

ub

e.c

o.

nr

Memory system considerations


Cost Speed Power dissipation Size of chip

ht

tp

://

cs

et

ub

e.c

o.

nr

Principle of locality:
Temporal locality

(locality in time): If an item is referenced, it will tend to be referenced again soon. Spatial locality (locality in space): If an item is referenced, items whose addresses are close by will tend to be referenced soon. Sequentiality (subset of spatial locality ). The principle of locality can be exploited implementing the memory of computer as a memory hierarchy, taking advantage of all types of memories. Method: The level closer to processor (the fastest) is a subset of any level further away, and all the data is stored at the lowest level (the slowest).

ht

tp

://

cs

et

ub

e.c

o.

nr

Cache Memories
Speed of the main memory is very low in comparison with the speed of processor For good performance, the processor cannot spend much time of its time waiting to access instructions and data in main memory. Important to device a scheme that reduces the time to access the information An efficient solution is to use fast cache memory When a cache is full and a memory word that is not in the cache is referenced, the cache control hardware must decide which block should be removed to create space for the new block that contain the referenced word.

The basics of Caches " The caches are organized on basis of blocks, the smallest amount of data which can be copied between two adjacent levels at a time. " If data requested by the processor is present in some block in the upper level, it is called a hit. " If data is not found in the upper level, the request is called a miss and the data is retrieved from the lower level in the hierarchy. " The fraction of memory accesses found in the upper level is called a hit ratio. " The storage, which takes advantage of locality of accesses is called a cache

Performance of caches Accessing

ht

tp

://

cs

et

ub

e.c

o.

nr

Accessing a Cache

Virtual memory
Virtual memory is a computer system technique which gives an application program the impression that it has contiguous working memory (an address space), while in fact it may be physically fragmented and may even overflow on to disk storage. Virtual memory provides two primary functions: 1. Each process has its own address space, thereby not required to be relocated nor required to use relative addressing mode. 2. Each process sees one contiguous block of free memory upon launch. Fragmentation is hidden. All implementations (Excluding emulators) require hardware support. This is typically in the form of a Memory Management Unit built into the CPU. Systems that use this technique make programming of large applications easier and use real physical memory (e.g. RAM) more efficiently than those without virtual memory. Virtual memory differs significantly from memory virtualization in that virtual memory allows resources to be virtualized as memory for a specific system, as opposed to a large pool of memory being virtualized as smaller pools for many different systems. Note that "virtual memory" is more than just "using disk space to extend physical memory size" - that is merely the extension of the memory hierarchy to include hard disk drives. Extending memory to disk is a normal consequence of using virtual memory techniques, but could be done by other means such as overlays or swapping programs and their data completely out to disk while they are inactive. The definition of "virtual memory" is based on redefining the address space with a contiguous virtual memory addresses to "trick" programs into thinking they are using large blocks of contiguous addresses.

Paged virtual memory

Compiler time: If it is known in advance that a program will reside at a specific location of main memory, then the compiler may be told to build the object code with absolute addresses right away. For example, the boot sect in a bootable disk may be compiled with the starting point of code set to 007C:0000. Load time: It is pretty rare that we know the location a program will be assigned ahead of its execution. In most cases, the compiler must generate relocatable code with logical addresses. Thus the address translation may be performed on the code during load time. Figure 3 shows that a program is loaded at location x. If the whole program resides on a monolithic block, then every memory reference may be translated to be physical by added to x.

Memory Management Requirements


Partitioning Strategies Fixed for OS) Fixed Partitions divide memory into equal sized pieces (except Degree of multiprogramming = number of partitions Simple policy to implement All processes must fit into partition space Find any free partition and load the process

Partitioning Strategies Variable Idea: remove wasted memory that is not needed in each partition Memory is dynamically divided into partitions based on process needs Definition: Hole: a block of free or available memory Holes are scattered throughout physical memory External fragmentation memory that is in holes too small to be usable by any process

OS

O S process 1

OS

process 1 process 2 process 3

process 1

Process 2 Terminates
process 3

Process 4 Starts

process 4

process 3

Memory Allocation Mechanism MM system maintains data about free and allocated memory alternatives Bit maps - 1 bit per allocation unit Linked Lists - free list updated and coalesced when not allocated to a process At swap-in or process create Find free memory that is large enough to hold the process Allocate part (or all) of memory to process and mark remainder as free Compaction Moving things around so that holes can be consolidated Expensive in OS time

Memory Management Policies First Fit: scan free list and allocate first hole that is large enough fast Next Fit: start search from end of last allocation Best Fit: find smallest hole that is adequate slower and lots of fragmentation Worst fit: find largest hole

ASSOCIATIVE MEMORY

A content addressable processor is a content addressable memory with the added capability to write in parallel (multi-write) into all those words indicating agreement as the result of a search. A typical associative memory has the following components: Memory array Comparand register Mask register Match/Mismatch (response) register Multiple match resolver Search logic Input/Output register Word select register

Comparand Reg. Word Select Reg. Mask Reg.

Tag Reg. Multiple Match Resolver

Memory Cell Array


--Memory Cell Array provides storage and search medium for data. --Comparand Register contains the data to be compared None againstBit the contents of the memory cell array.

Some/

Input/Output Reg.

--Mask Register is used to mask off portions of the data words which do not participate in the operations.

-- Word Select Register is used to mask off the memory words which do not participate in the operation. -- Match/Mismatch Register indicates the success or failure of a search operation. --Input/Output Buffer acts as an interface between associative memory and the outside word. -- Multiple Match Resolver narrows down the scope of the search to a specific location in the memory cell array in a cases where more than one memory word will satisfy the search condition(s). -- Some/None bit shows the overall search result.

SECONDARY STORAGE DEVICES


Optical storage, the typical Optical disc, stores information in deformities on the surface of a circular disc and reads this information by illuminating the surface with a laser diode and observing the reflection. Optical disc storage is non-volatile. The deformities may be permanent (read only media ), formed once (write once media) or reversible (recordable or read/write media). The following forms are currently in common use: CD, CD-ROM, DVD, BD-ROM: Read only storage, used for mass distribution of digital information (music, video, computer programs) CD-R, DVD-R, DVD+R BD-R: Write once storage, used for tertiary and off-line storage CD-RW, DVD-RW, DVD+RW, DVD-RAM, BD-RE: Slow write, fast read storage, used for tertiary and off-line storage Ultra Density Optical or UDO is similar in capacity to BD-R or BD-RE and is slow write, fast read storage used for tertiary and off-line storage.

Magneto-optical disc storage is optical disc storage where the magnetic state on a ferromagnetic surface stores information. The information is read optically and written by combining magnetic and optical methods. Magneto-optical disc storage is non-volatile, sequential access, slow write, fast read storage used for tertiary and off-line storage. A Compact Disc (also known as a CD) is an optical disc used to store digital data. It was originally developed to store sound recordings exclusively, but later it also allowed the preservation of other types of data. Audio CDs have been commercially available since October 1982. In 2009, they remain the standard physical storage medium for audio. Standard CDs have a diameter of 120 mm and can hold up to 80 minutes of uncompressed audio (700 MB of data). The Mini CD has various diameters ranging from 60 to 80 mm; they are sometimes used for CD singles or device drivers, storing up to 24 minutes of audio. The technology was eventually adapted and expanded to encompass data storage CDROM, write-once audio and data storage CD-R, rewritable media CD-RW, Video Compact Discs (VCD), Super Video Compact Discs (SVCD), PhotoCD, PictureCD, CDi, and Enhanced CD.

Magnetic disk memories


A magnetic disk consists of 1-12 platters (metal or glass disk covered with magnetic recording material on bothsides), with diameters between 1-3.5 inches Each platter is comprised of concentric tracks (5-30K) and each track is divided into sectors (100 500 per track, each about 512 bytes) A movable arm holds the read/write heads for each disk surface and moves them all in tandem a cylinder of data is accessible at a time To read/write data, the arm has to be placed on the correct track this seek time usually takes 5 to 12 ms on average can take less if there is spatial locality Rotational latency is the time taken to rotate the correct sector under the head average is typically more than 2 ms (15,000 RPM) Transfer time is the time taken to transfer a block of bits out of the disk and is typically 3 65 MB/second.

computer organization and architecture Unit no: 5


Interface to CPU and Memory Interface to one or more peripherals

Accessing I/O Devices

Generic Model of IO Module

Interface for an IO Device:

CPU checks I/O module device status I/O module returns status If ready, CPU requests data transfer I/O module gets data from device I/O module transfers data to CPU

Programmed I/O
CPU has direct control over I/O Sensing status Read/write commands Transferring data CPU waits for I/O module to complete operation Wastes CPU time CPU requests I/O operation I/O module performs operation I/O module sets status bits CPU checks status bits periodically I/O module does not inform CPU directly I/O module does not interrupt CPU CPU may wait or come back later Under programmed I/O data transfer is very like memory access (CPU viewpoint) Each device given unique identifier

IO interface circuits: ->task of connecting an IO device to a computer system is greatly eased

By the use of standard ICs known as IO interfacing circuits.

-->the simplest interface circuits is a one-word, addressable register that Serves as an I/o port. most basic IO interface circuits are programmable circuits intended to Act as serial or parallel ports .serial ports accommodate many types of slow peripheral Devices ranging from secondary memory units to network connections.

I/O Mapping
Memory mapped I/O Devices and memory share an address space I/O looks just like memory read/write No special commands for I/O Large selection of memory access commands available Isolated I/O Separate address spaces Need I/O or memory select lines Special commands for I/O

Interrupt driven and programmed I/O require active CPU intervention (All data must pass through CPU) Transfer rate is limited by processor's ability to service the device CPU is tied up managing I/O transfer Additional Module (hardware) on bus DMA controller takes over bus from CPU for I/O Waiting for a time when the processor doesn't need bus Cycle stealing seizing bus from CPU

Transfer of control through the use of interrupts

An interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution. A hardware interrupt causes the processor to save its state of execution and begin execution of an interrupt handler. Software interrupts are usually implemented as instructions in the instruction set, which cause a context switch to an interrupt handler similar to a hardware interrupt. Interrupts are a commonly used technique for computer multitasking, especially in real-time computing. Such a system is said to be interrupt-driven. An act of interrupting is referred to as an interrupt request (IRQ). When there is an interrupt, continuation of the execution of the current program is meaningless. Execution of the Current program needs to be stopped In a computer system, there are many sources of interrupts Needs to prepare interrupt processing routines for each source of interrupts Source of interrupt needs to be identified Needs to initiate execution of the interrupt processing routine associated with the identified interrupt

When the interrupt is resolved, the interrupted program needs to be continued for the efficiency reason Needs to resume execution of the interrupted program.

Vectored interrupts In a computer, a vectored interrupt is an I/O interrupt that tells the part of the computer that handles I/O interrupts at the hardware level that a request for attention from an I/O device has been received and and also identifies the device that sent the request. A vectored interrupt is an alternative to a polled interrupt , which requires that the interrupt handler poll or send a signal to each device in turn in order to find out which one sent the interrupt request. PCI interrupts Devices are required to follow a protocol so that the interrupt lines can be shared. The PCI bus includes four interrupt lines, all of which are available to each device. However, they are not wired in parallel as are the other PCI bus lines. PCI bridges (between two PCI buses) map the four interrupt traces on each of their sides in varying ways.The result is that it can be impossible to determine how a PCI device's interrupts will appear to software. PCI interrupt lines are level-triggered. This was chosen over edge-triggering in order to gain an advantage when servicing a shared interrupt line, and for robustness: edge triggered interrupts are easy to miss. PCI Express does not have physical interrupt lines at all. It uses message-signaled interrupts exclusively. pipeline interrupt Interrupt: Hardware signal to switch processor to new instruction stream.

When interrupt occurs, state of interrupted process is saved, including PC, registers, and memory
Interrupt is precise if the following three conditions hold: All instructions preceding u have been executed, and have modified the state correctly

All instructions following u are unexecuted, and have not modified the state If the interrupt was caused by an instruction, it was caused by instruction u, which is either completely executed (e.g.: overflow) or completely unexecuted (e.g: VM page fault).

Precise interrupts are desirable if software is to fix up error that caused interrupt and execution has to be resumed Easy for external interrupts, could be complex and costly for internal Imperative for some interrupts (VM page faults, IEEE FP standard)

Direct Memory Access (DMA)


Polling or interrupt driven I/O incurs considerable overhead Multiple program instructions Saving program state Incrementing memory addresses Keeping track of word count Transfer large amounts of data at high speed without continuous intervention by the processor Special control circuit required in the I/O device interface, called a DMA controller DMA controller keeps track of memory locations, transfers directly to memory (via the bus) independent of the processor Single Bus, Detached DMA controller Each transfer uses bus twice I/O to DMA then DMA to memory CPU is suspended twice

Single Bus, DMA controller integrated into I/O module Controller may support one or more devices Each transfer uses bus once DMA to memory CPU is suspended once

Operation of a DMA transfer

Lecture plan

DMA Controller

Part of the I/O device interface DMA Channels Performs functions that would normally be carried out by the processor Provides memory address Bus signals that control transfer Keeps track of number of transfers Under control of the processor

A bus is a subsystem that transfers data between components inside a computer, or between computers. Multiple devices communicating over a single set of wires Only one device can talk at a time or the message is garbled Each line or wire of a bus can at any one time contain a single binary digit. Over time, however, a sequence of binary digits may be transferred These lines may and often do send information in parallel A computer system may contain a number of different buses

Buses:

Buses, Bus control, bus interfacing, Bus arbitration

Bus Interconnection A bus is a communication pathway connecting two or more device. A key characteristic of a bus is that it is a shared transmission medium. A bus consists of multiple pathways or lines. Each line is capable of transmitting signal representing binary digit (1 or 0 A sequence of bits can be transmit across a single line. Several lines can be used to transmit bits simultaneously (in parallel). A bus that connects major components (CPU,Memory,I/O) is called System Bus. The most common computer interconnection structures are based on the use of one or more system buses.

Bus interfacing Synchronous occurrence of events on the bus is determined by a clock (Clock Cycle or Bus Cycle) which includes line upon

Asynchronous occurrence of one event follows and depends on the previous event

Bus arbitration Centralized bus controller (Arbiter), hardware device,is responsible for allocating time on the bus (daisy chain)

Distributed access control logic in each module act together to share bus

INTERFACE CIRCUITS
Circuitry required connecting an I/O device to a computer bus Provides a storage buffer for at least one word of data. Contains status flag that can be accessed by the processor. Contains address-decoding circuitry Generates the appropriate timing signals required by the bus control scheme. Performs format conversions Ports Serial port Parallel port

Figure . An example of a computer system using different interface standards.

PCI (Peripheral Component Interconnect)


PCI stands for Peripheral Component Interconnect Introduced in 1992 It is a Low-cost bus It is Processor independent It has Plug-and-play capability

PCI bus transactions


PCI bus traffic is made of a series of PCI bus transactions. Each transaction is made up of an address phase followed by one or more data phases. The direction of the data phases may be from initator to target (write transaction) or vice-versa (read transaction), but all of the data phases must be in the same direction. Either party may pause or halt the data phases at any point. (One common example is a low-performance PCI device that does not support burst transactions, and always halts a transaction after the first data phase.) Any PCI device may initiate a transaction. First, it must request permission from a PCI bus arbiter on the motherboard. The arbiter grant permission to one of the requesting devices. The initiator begins the address phase by broadcasting a 32-bit address plus a 4-bit command code, then waits for a target to respond. All other devices examine this address and one of them responds a few cycles later. 64-bit addressing is done using a 2-stage address phase. The initiator broadcasts the low 32 address bits, accompanied by a special "dual address cycle" command code. Devices which do not support 64-bit addressing can simply not respond to that command code. The next cycle, the initiator transmits the high 32 address bits, plus the real command code. The transaction operates identically from that point on. To ensure compatibility with 32-bit PCI devices, it is forbidden to use a dual address cycle if not necessary, i.e. if the high-order address bits are all zero. While the PCI bus transfers 32 bits per data phase, the initiator transmits a 4-bit byte mask indicating which 8-bit bytes are to be considered significant. In particular, a masked write must affect only the desired bytes in the target PCI device.

Arbitration Address phase Address phase timing Data phases Ending transactions

Table 4.3. Data transfer signals on the PCI bus.

SCSI Bus
Defined by ANSI X3.131 Small Computer System Interface 50, 68 or 80 pins Max. transfer rate 160 MB/s, 320 MB/s. SCSI Bus Signals

USB - Universal Serial Bus


Speed Low-speed(1.5 Mb/s) Full-speed(12 Mb/s) High-speed(480 Mb/s) Port Limitation Device Characteristics Plug-and-play

Universal Serial Bus Tree

Structure

USB (Universal Serial Bus) is a specification to establish communication between devices and a host controller (usually personal computers). USB is intended to replace many varieties of serial and parallel ports. USB can connect computer peripherals such as mice, keyboards, digital cameras, printers, personal media players, flash drives, and external hard drives. For many of those devices, USB has become the standard connection method. USB was designed for personal computers[citation needed], but it has become commonplace on other devices such as smartphones, PDAs and video game consoles, and as a power cord between a device and an AC adapter plugged into a wall plug for charging. As of 2008, there are about 2 billion USB devices sold per year, and approximately 6 billion total sold to date.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy