Cao Unit 6
Cao Unit 6
To Memory
1) SISD (Single Instruction - Single Data stream) Inc rementer
Floatint- point
divide
CU PU MM
IS DS
fig: a
2) SIMD
Shared memmory
(Single Instruction - Multiple Data stream) DS 1
PU 1 MM1
» vector or array operations
one vector operation includes many DS 2
PU 2 MM2
operations on a data stream
IS
» Example systems : CRAY -1, ILLIAC-IV CU
DS n
PU n MMn
IS
3) MISD
(Multiple Instruction - Single Data stream)
» Data Stream 에 Bottle neck DS
IS 1 IS 1
CU1 PU 1
Shared memory
IS 2 IS 2
CU2 PU 2 MMn MM2 MM1
IS n IS n
CUn PU n
DS
4) MIMD
(Multiple Instruction - Multiple Data stream)
» Multiprocessor System Shared memory
IS 1 IS 1 DS
CU1 PU 1 MM1
IS 2 IS 2
CU2 PU 2 MM2
v v
IS n IS n
CUn PU n MMn
9-2 Pipelining
Pipelining
Decomposing a sequential process into sub-operations
Each sub-process is executed in a special dedicated segment concurrently
Segment
2 T1 T2 T3 T4 T5 T6
In non-pipeline ( tn ) = pipeline ( k • tp )
3 T1 T2 T3 T4 T5 T6
S = tn / tp = k • tp / tp = k
4 T1 T2 T3 T4 T5 T6
4 segments suboperations
» 1) Compare exponents by subtraction : E xponents
a b
Mantissas
A B
3-2=1
R R
X = 0.9504 x 103
Y = 0.8200 x 102 C ompare Differenc e
Segment 1 :
» 2) Align mantissas exponents
by subtrac tion
X = 0.9504 x 103
Y = 0.08200 x 103 R
R R
Adjust Normalize
Segment 4 :
exponent result
R R
Instruction Cycle
Dec ode instruc tion
Segment 2 : and c alc ulate
effec tive address
4 FI FI DA FO EX
» 4) EX : Execution
5 FI DA FO EX
Timing of Instruction Pipeline : Fig. b 6 FI DA FO EX
» Instruction takes 3 Branch 7 FI DA FO EX
No Branch Branch
fig: b
Chap. 9 Pipeline and Vector Processing
9-8
3. Add I A E
4. Subtrac t I A E
Branch Prediction 5. B ranc h to X I A E
» Branch predict- additional hardware logic 6. No- operation I A E
4. Add I A E
5. Subtrac t I A E
6. Instruc tion in X I A E
Vector processor
» Single vector instruction
C(1:100) = A(1:100) + B(1:100)
ADD A B C 100
Matrix Multiplication
3 x 3 matrices multiplication : n2 = 9 inner product
A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1 A8B8 A7B7 A6B6 A5B5 A4B4 A3B3 A2B2 A1B1
Address bus
Data bus
Supercomputer
Supercomputer = Vector Instruction + Pipelined floating-point arithmetic
fig: a
Performance Evaluation Index
» MIPS : Million Instruction Per Second
» FLOPS : Floating-point Operation Per Second
megaflops : 106, gigaflops : 109
Cray supercomputer : Cray Research
» Clay-1 : 80 megaflops, 4 million 64 bit words memory
» Clay-2 : 12 times more powerful than the clay-1
VP supercomputer : Fujitsu
» VP-200 : 300 megaflops, 32 million memory, 83 vector instruction, 195 scalar
instruction
» VP-2600 : 5 gigaflops
PE 1 M1
Master c o ntrol
unit
PE 2 M2
fig: a
fig: b
Chap. 9 Pipeline and Vector Processing
Multiprocessors 9-16
C O mmon System
Loc al
shared bus C PU IO P
memory
Memory unit memory c ontroller
Loc al bus
C PU 1 C PU 2 C PU 3 IO P 1 IO P 2 System System
Loc al Loc al
bus C PU IO P bus C PU
memory memory
c ontroller c ontroller
fig: b
Chap. 9 Pipeline and Vector Processing
9-18
MM CPUs
Memory modules Memory modules
MM 1 MM 2 MM 3 MM 4 MM 1 MM 2 MM 3 MM 4
Data,address, and
c ontrol form C PU 1
Data
C PU 1 C PU 1
Data,address, and
Address Multiplexers c ontrol form C PU 2
Memory and
C PU 2 module arbitration
C PU 2
Read/ write logic
Data,address, and
c ontrol form C PU 3
C PU 3 Memory
C PU 3
enable
Data,address, and
c ontrol form C PU 4
C PU 4
C PU 4
fig: a fig: b
fig: c
0 0 000
000
0 0 1 1 001
A A 0
001
1 1 1
B B 0
010 2 010
1 3
A c onnec ted to 0 A c onnec ted to 1 0
011 011
P0
1
P1
0
100 4 100
0 0 1
A A 101 5 101
0
1 1
B B 1
0
110
6 110
B c onnec ted to 0 B c onnec ted to 1 1
111
7 111
fig: a fig: b
fig: c
011 111
0 01 11 010 110
001 101
0 00 10 000 100
B us busy line
» LRU
» FIFO 2× 4
Dec oder
» Rotating daisy-chain