Computer Architecture and Organization
Computer Architecture and Organization
Performance
Computer Architecture
and Organization
• Electronic Numerical
Integrator And Computer
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943
• Finished 1946
Too late for war effort
• Used until 1955
• Input/output(I/O)equipment
operated by the control unit
• Each instruction cycle consists of two sub cycles. Instruction Fetch and
Instruction Execute Cycle
• In fetch cycle the Opcode of the next instruction is loaded into the IR and the
address portion is loaded into the MAR.
• This instruction may be taken from the IBR or from memory by loading a word
into the MBR, and then down to the IBR, IR, and MAR.
• There is only one register that is used to specify the address in memory for a
read or write and One register is used for the source or destination.
• Data transfer: Move data between memory and ALU registers or between two
ALU registers.
• The use high-level programming languages and software provided the ability to
load programs,(beginning of OS)
• Data channels:-independent I/O module with its own processor & Instruction
set. -relieves the CPU of a considerable processing burden.
• Multiplexor :- termination point for data channels, the CPU, and memory.
- schedules access to the memory from the CPU and data channels
• NCR & RCA were front runners to produced small transistor machines
• IBM followed with 7000 series
• DEC (Digital Equipment Corporation) - 1957
• Produced PDP-1
• Branch prediction: The processor looks ahead in the instruction code fetched
from memory and predicts which branches, or groups of instructions, are likely
to be processed next. The more sophisticated examples of this strategy predict
not just the next branch but multiple branches ahead.
• Speculative execution: Using branch prediction and data flow analysis, some
processors speculatively execute instructions ahead of their actual appearance
in the program execution, holding the results in temporary locations. This
enables the processor to keep its execution engines as busy as possible by
executing instructions that are likely to be needed.
• We can refine this formulation by recognizing that during the execution of an instruction,
part of the work is done by the processor, and part of the time a word is being transferred
to or from memory. In this latter case, the time to transfer depends on the memory cycle
time, which may be greater than the processor cycle time.
• where p is the number of processor cycles needed to decode and execute the
instruction, m is the number of memory references needed, and k is the ratio between
memory cycle time and processor cycle time.
• The five performance factors in the preceding equation (Ic, p, m, k, t) are influenced by
four system attributes: the design of the instruction set (known as instruction set
architecture), compiler technology (how effective the compiler is in producing an efficient
machine language program from a high-level language program), processor
implementation, and cache and memory hierarchy.
• Conclusions
• f small, use of parallel processors has little effect
• N ->∞, speedup bound by 1/(1 – f)
• Diminishing returns for using more processors
Computer Architecture
and Organization