15CS72 IAT1 Solution
15CS72 IAT1 Solution
September 2019
Sub: Advanced Computer Architecture Sub Code 15CS72 Branch: CSE
Date: 23/09/2019 Duration: 90 mins Max Marks: 50 Sem / Sec: VII OBE
A,B,C
Answer any FIVE FULL MARK S CO RB
Questions T
1 (a) Explain Flynn’s Classification of Computer architecture along with neat diagram. 08 CO1 L2
Michael Flynn introduced a classification of various computer architectures
based on notions of instruction and data streams.
Single Instruction Stream Single Data Stream(SISD)
It is uniprocessor system
Single instruction is executed by CPU in one clock cycle.
Instructions are executed sequentially
Workstations of DEC, MP & IBM, IBM 701, IBM 1620, IBM 7090 etc.
Single Instruction Stream Multiple Data stream( SIMD)
A single instruction is executed by multiple processing elements or processors.
Each processing element operates on different data.
Data level parallelism is achieved.
Example: Vector supercomputer in early 1970 like CDC star -100, Connection
Machine CM2, Maspar MP-1, IBM 9000, Cray C90, Fujitsu VP.
Multiple Instruction Stream Single Data Stream(MISD)
The same data stream flows through a linear array of processor executing
different instruction streams. This architecture is known as systolic arrays for
pipelined execution of specific instructions.
Few actual examples of this class of parallel computer have ever existed. One is
the experimental Carnegie Mellon C.MPP computer (1971).
Least popular model to be applied in commercial machines.
Multiple Instruction Stream and Multiple Data Stream(MIMD)
Most popular computer model
Every processor may be executing different instruction stream and every
processor uses different data stream.
They are also called as parallel computers.
Example: IBM 370/168MP, Univac 1100/80
Parallel computers operate in MIMD mode.
There are two types of MIMD i.e. shared memory multiprocessor and
distributed memory multicomputer.
In shared memory multiprocessor system all processors have common shared
memory and can communicate through shared variables.
In distributed multicomputer system each computer node has local memory
unshared with other nodes. Inter-processor communication is done though
message passing among the nodes.
1. (b) Describe the 5-tuple operational model of SIMD supercomputers. (02) CO4 L2
The operational model of SIMD machine is specified by a 5-tuple
M=(N,C,I,M,R)
N is the number of processing elements (PEs) in the machine. For example, the
Illiac IV had 64 PEs and the Connection Machine CM-2 had 65,536 PEs.
C is the set of instructions directly executed by the control unit (CU).
I is the set of instructions broadcast by the CU to all PEs for parallel execution.
M is the set of masking schemes, where each mask partitions the set of PEs into
enabled and disabled subsets.
R specifies the data routing schemes to be followed during inter PE
communication.
2 (a) Explain the architecture of Vector Supercomputer with a neat diagram. (08) CO1 L3
The program and data are loaded into main memory from the host computer.
All instructions are first decoded by the scalar control unit. If the decoded
instruction is a scalar operation it will be directly executed by the scalar
processor using the scalar functional pipelines.
If the instruction is decoded as vector operation, it will be sent to vector control
unit. The vector control unit manages the flow of vector data between vector
functional units and main memory.
There are multiple vector functional units which are pipelined. Data is
forwarded from one vector functional unit to another i.e. called vector chaining.
There are two types of vector processor
Register Register vector processor
Memory Memory vector processor
3 (a) Explain UMA Model and COMA Model for shared memory multiprocessor (06) CO1 L3
systems with neat diagram.
4 Explain different types of Dependences in program. Analyze the dependences for (10) CO1 L4
following code segment and draw dependence graph and assume there is only one
functional unit for Load and Store. Note M (10) contains value 64.
S1: Load R1,1024
S2: Load R2,M(10)
S3: Add R1,R2
S4: Store M(1024),R1
S5: Store M((R2)),1024
Data Dependence: There are five types of Data Dependence as shown below
Flow Dependence: A statement S2 is flow dependent on S1 if at least one output
of S1 feeds in as input to S2. Flow Dependence is denoted as S1→ S2
Anti Dependence: Statement S2 is anti-dependent on statement S1 if S2 follows
S1 in program order and if the output of S2 overlaps the input to S1. It is denoted
as follows
I/O Dependence: The read and write statements are I/O statements. The I/O
dependence occurs when the same file is referenced by both I/O statements.
Unknown dependence: The dependence relation between two statements cannot
be determined in the following situations:
The subscript of a variable is itself subscribed(indirect addressing mode)
LOAD R1, @100
The subscript does not contain the loop index variable.
A variable appears more than once with subscripts having different
coefficients of the loop variable.
The subscript is nonlinear in the loop index variable.
Control Dependence
The conditional statements are evaluated at run time and hence the execution
path followed would be different.
Different paths taken after a conditional branch may introduce or eliminate data
dependencies among instructions.
Dependence may also exist between operations performed in successive
iterations of a looping procedure. In the following, we show one loop example
with and another without control-dependent iterations.
The following loop has independent iterations
Resource Dependence
The Resource Dependence occurs due to conflicts in using shared resources like
integer units, floating point units, register or memory areas etc.
When the conflict is due to ALU unit it is called as ALU dependence and when
the conflict is due to storage it is called as storage dependence.
5 Explain Hardware and Software Parallelism with an example. (10) CO1 L2
In this section we will discuss about the hardware and software support needed for
parallelism.
Example for Software Parallelism and Hardware Parallelism
Consider there are eight instructions (four loads and four arithmetic operations)
to be executed in three consecutive machine cycles. Four load operations are
performed in the first cycle, followed by two multiply operations in the second
cycle and two add/subtract operations in the third cycle. Therefore the
parallelism varies from 4 to 2 in three cycles. The average software parallelism is
equal to 8/3 = 2.67 instructions per cycle. It is shown in figure given below
Consider the execution of same instructions by two issue processors which can
execute one memory access (load or write) and one arithmetic operation
simultaneously. With this hardware restriction, the program must execute in
seven cycles as shown in figure given below. Therefore hardware parallelism
displays an average value of 8/7 = 1.14 instructions executed per cycle. This
demonstrates a mismatch between the software parallelism and the hardware
parallelism.
To solve the mismatch problem between software parallelism and hardware parallelism,
one approach is to develop compilation support, and the other is through hardware
redesign for more efficient exploitation of parallelism.
6 (a) Compare RISC and CISC with respect to its characteristics and its architectural (06) CO2 L2
distinctions.
Instruction set size and Large set of instructions Small set of instructions
format with variable with fixed format(32 bit per
format(16-64 bits per instruction)
instruction)
General purpose register and 8-24 GPR and unified 32-192 GPR and split cache
cache design cache
8 What is memory hierarchy? Explain Inclusion, Coherence, Locality properties with (10) CO2 L3
neat diagram.
The storage devices like register, cache, main memory, disk devices and backup
storage devices are organized in a form of hierarchy as shown below. The cache
is at level 1, main memory at level 2, disk at level 3 and backup storage at level
4.
Memory devices at a lower level are faster to access, smaller in size, and more
expensive per byte, having a higher bandwidth and using a smaller unit of
transfer as compared with those at a higher level.
The access time ti refers to the round-trip time from the CPU to the ith-level
memory. The memory size si is the number of bytes or words in level i. The cost
of the ith-level memory is estimated by the product cisi. The bandwidth bi refers
to the rate at which information is transferred between adjacent levels. The unit
of transfer xi refers to the grain size for data transfer between levels i and i+1.
Also ti-1<ti,si-1<si,ci-1>ci,bi-1>bi and xi-1<xi for i=1,2,3,4 in the hierarchy.
Inclusion
The inclusion property is stated as M1⊂M2⊂M3⊂…...⊂MN. The inclusion relationship
implies that all information items are originally stored in the outermost level MN. During
the processing, subsets of MN, are copied into MN-1. Similarly, subsets of MN-1 are
copied into MN-2 and so on. Hence if word is found in Mi then same word can be found
in Mi+1, Mi+2 and so on but may not be found in Mi-1
Coherence
The coherence property requires that copies of the same information item at successive
memory levels must be consistent. If a word is modified in the cache, copies of that word
must be updated immediately or eventually at all higher levels. In general, there are two
strategies for maintaining the coherence in a memory hierarchy.
The first method is called write-through (WT), which demands immediate update in Mi+1
if a word is modified in Mi.
The second method is write-back (WB), which delays the update in Mi+1 until the word
being modified in Mi is replaced or removed from Mi.
Locality of References
The CPU refers memory to either access the instructions or data. The memory references
can be clustered according to time, space or ordering. Hence there are three dimensions
for locality of reference i.e. temporal, spatial and sequential locality.
Temporal Locality: Recently referenced items i.e. instructions or data are likely to be
referenced again in the near future. This is often caused by special program constructs
such as iterative loops, process stacks, temporary variables, or subroutines. Once a loop
is entered or a subroutine is called, a small code segment will be referenced repeatedly
many times.
Spatial Locality: This refers to the tendency for a process to access items whose
addresses are near one another. For example, operations on tables or arrays involve
accesses of a certain clustered area in the address space.
Sequential Locality: In typical programs, the execution of instructions follows a
sequential order unless branch instructions create out-of-order executions. The ratio of in-
order execution to out-of-order execution is roughly 5 to 1 in ordinary programs. Besides,
the access of large data array also follows a sequential order.