0% found this document useful (0 votes)
11 views7 pages

Milen Dimitrov HW1

Advanced Computer Architecture homework For Syracuse University Masters of Computer Engineering

Uploaded by

milenid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views7 pages

Milen Dimitrov HW1

Advanced Computer Architecture homework For Syracuse University Masters of Computer Engineering

Uploaded by

milenid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Homework 1

Computer Architecture CIS 655/CSE661

Instructor: Dr. Mo Abdallah

Autor name: Milen Dimitrov


Created on: 07/14/2021

Syracuse University, College of Engineering & Computer Science

1. Research and Reading Assignment: [ min two pages total of the whole summary ]
Read and summarize the four papers located in the reading section of unit 1.

2. Q 1.1 from the book (part a)


Let us assume a wafer yield of 100% and use N = 13.5 . The defect rate should be 0.03
defects per cm2 (not 0.3)

3. Q 1.5 (part a) from the book.

1.Reading Summary
1.1 Architecture of the IBM System/360
Reading this paper was very pleasant and enlightening, showing the way of thinking of the
computer pioneers. I believe it was included in our first homework to show the difference, the
evolution of the meaning of the Computer Architecture subject. And it is starting with a note of
its first word - Architecture* [1] - “ The term architecture is used here to describe the attributes of
a system as seen by the programmer, i.e. the conceptual structure and functional behaviour, as
distinct from the organisation of the data flow and controls, logical design and the physical
implementation.” At that time the term “architecture” was including only what the programmer is
seeing - CPU, instructions, registers, buses, memory organization, IO and etc, while today we
expanded this scope much more, to include also the cost, power, cooling, communication, chip
design, clustering, parallelization.
The paper shows the changing of the computer paradigma. Until that time computers were
specialized - scientific, business, military. And were completely incompatible and almost
unupgradable. System/360 was a revolutionary design, IBM made a giant advance in the field of
Computer Architecture, making the first universal and upgradable computer. We can judge how
giant this project was by the cost - it was estimated at 5B$, half of the Manhattan project,
equivalent to over 300B today’s $.
The objectives that IBM achieved were to make computers more universal and easier, as their
usage and importance increased, dramatically growing the size of memory, 2 orders of
magnitude, IO and processing power, new storage technologies, drums, tapes and disks. They
sacrifice backward compatibility, in the name of future compatibility. Intermodal compatibility
was an important feature, software could run on different size computers. Simplifying made the
system easier to use, shortened the turn around time and increased the total throughput. The
new machine was capable of supervising itself automatically, used for real time and multi
program and time-share modes. Float point precision was increased. Hardware was abstracted,
so programers could write their programs without knowledge about it, and easier adapt to
different IO devices.
Difficult decisions were made about data format and sizes - 4, 6 or 8 bits , or 36/48/64 bits for
the floating point, in the time when the hardware was very big and expensive, ( 200kb were as
heavy as 600lb at the time ). Interestingly, stack was reviewed very deeply and found not
efficient, and was not implemented. Address register organisation was implemented as more
flexible.
Interrupts for IO were implemented. That permitted creating many IO channels at virtually no
cost, which allowed expanding external memory and saving core memory.
The most important contributions of this design were the separation of the logical structure and
the physical implementation, optimization of the common tasks that are most used ( IMHO, here
is the origin of the Amdahl’s law ) , overall structure of the data, instruction, protection and
interrupts, later becoming industry standards.

1.2 Design of the B5000 system


B5000 mainframe was an interesting design. Unlike IBM System/360 and other machines ,
where the instruction set( IS) was designed for the CPU and handed to the programmers to
write their code, in that machine adopted the opposite approach. It was designed to suit existing
high level languages - FORTRAN, ALGOL and COBOL, and permit almost mechanical
translation to the machine core; no Assembler was used. The goal was to reduce the total
turnaround time, and increase the throughput. Its Algol Compiler was so fast ( because it was a
1-pass compiler) and impressed Dijkstra so he immediately purchased several machines for his
university.
In its design we can see the first ideas of virtual memory, one of their design criteria was that
the program should be able to run in any address range, without any modification, independent
of the location. This permitted program segmentation, and running large programs into smaller
core memory by loading and unloading needed segments, an operation that we know today as
paging. Separating the IO from the data processing is another example of the abstraction.
Master Control Program is the first operating system, provided with the hardware and offering
interrupts and handling events.
B5000 relied on a stack organisation, and implemented polish notation for the mathematica
operation, thou reduced the needed memory capacity and programing effort, thou increased
efficiency and speed. Memory independence was achieved by using Program Reference Table
(PTR) which was the first use of named variables as we know today, variable by reference.
Keeping addresses in PRT lets the programmer not use any physical address in his code, and
let this to be handled by the MCP.
B5000 was a hardware-software package that offered memory saving,multi-processing, parallel
processing with very high efficiency.

1.3 “The Case of the Reduced Instruction Set Computer” and the
DEC comment on it.
I’d like to review these 2 papers together, as they are tightly related to each other. While
it is easy to judge now when we know the history 40 years later, I will try to be objective.
The original paper treats the effect of the complexity of the instruction set, using VAX11
as an example, and DEC is answering in a mostly defensive mood. At first I want to say
that it looks like at that time the term “Complex Instruction” was not well defined. Or not
perhaps not well understood by DEC engineers. While it was not explained explicitly by
Patterson and Ditzel, DEC engineers first argument is how to distinguish between
Complex and Reduced. Today it is well established that Complex instructions are when
arithmetic or logical instruction is using directly memory as operand, and not the
number of instructions, VAX-11 is CISK even though there are 11 instructions only,
while some RISK processors may have almost 200 instructions.
Paterson and Ditzel got very good points about the reasons for increased complexity,
esp the marketing strategy reason ( which is still happening and very actual ).DEC
answers to this are just silly and naive denying the obvious fact. The argument about
complex instruction usage is also interesting. In the original paper are provided some
statistical facts, but only about the # of times that instructions are used, not the time that
they consume/save. DEC is going deeper in that direction and provides valid
arguments, facts and data, but still not making completely fair comparisons.
In the argument about code density DEC has a good argument about the need for
dense and efficient code, which is valid even more today, but they give it
unproportionally higher importance than it deserves, the amount of the code is much
smaller % than the data.
When arguing about complex microcode errors, DEC assumes that when microcode is
simplified, errors will be shifted to the user code, but failing to recognize that user code
errors are much easier to fix and debug, and their scope is much limited.
When talking about RISK and VLSI, DEC chose just to say that there are no metrics
and it's unconvincing. Patterson and Ditzel, even without the knowledge of the current
level of technology, catched the right reasons - design time, speed, better usage of the
silicon ( for cache and pipelines, that is used universally, rather for rarely used complex
instruction ).
In conclusion, I can say that at the end of the company , 1990, DEC accepted the facts.
After their decline, finally in 1990 they released their RISC architecture - ALPHA , which
was a huge success , but not enough to save the company. And currently almost all
CPU are RISC , even the new Intel X86 cpus which are emulating CISC ISA, are
actually RISC underneath and running microcode to emulate CISC.

2.Answering Q1.1 , page 62, 5th edition, assuming


defect rate should be 0.03 defects per cm2.
Answering inline in red color , work shown
below.
Q1.1 [10/10] <1.6> Figure 1.22 gives the relevant chip statistics that influence the cost
of several current chips. In the next few exercises, you will be exploring the effect of
different possible design decisions for the IBM Power5.
a. [10] <1.6> What is the yield for the IBM Power5? - Yield is 0.225, ot total of 33 good
dies per wafer
b. [10] <1.6> Why does the IBM Power5 have a lower defect rate than the Niagara and
Opteron? - Because 130nm technology was more mature at the time than 90nm, and
also less demanding to the equipment and contemporary manufacturing technology, the
yield was higher. Every new step in the technology starts with higher defect rates, and
improves with the building experience, knowledge and maturing the process.
Calculations:
1. Assuming wafer yield is 100%, wafer diameter is 30cm, N=13.5, defect rate 0.03def/cm2
2. Dies_per_wafer= (Pi * ( Wafer_Diameter/2)^2 )/Die_Area - ( Pi * Wafer_Diameter /
sqrt( 2* Die_Area) ) = (3.14* 15cm^2)/3.89cm^2 - 3.14*30cm / sqrt(2*3.89cm^2) =
181.71 total dies - 33.79 corner dies .Rounding full dies down to 181, and corner dies up
to 34, total of 147 untested full dies.
3. Yield=Wafer_Yeld * 1/(1+Deffect_Rate*Die_Area)^N= 1.0/(1+0.03 def/cm^2
*3.89cm^2)^13.5 = 1/(1,1167^13.5)= 1 / 4.4375 = 0.225 (22.5% good dies)
4. 147 full dies * 0.225 = 33 good dies /wafer

3. Answering Q1.5, page 63


1.5 [10/10/20] <1.5> One critical factor in powering a server farm is cooling. If heat is not
removed from the computer efficiently, the fans will blow hot air back onto the computer, not
cold air. We will look at how different design decisions affect the necessary cooling, and thus
the price, of a system. Use Figure 1.23 for your power calculations.
a. [10] <1.5> A cooling door for a rack costs $4000 and dissipates 14 KW (into the room;
additional cost is required to get it out of the room). How many servers with an Intel Pentium 4
processor, 1 GB 240-pin DRAM, and a single 7200 rpm hard drive can you cool with one
cooling door? - 151 Servers

Calculations:
1. Assumptions - Motherboard, NICs, and all peripheral consumption is ignored, Power
supplies are assumed 80% efficiency (from Q1.4a). Assuming HDD is 60% of the time
idle ( from Q 1.4b ) .
2. Total Consumption of the HDD is 40% * 7.9W + 60% *4W = 0.4*7.9+0.6*4=5.56W
average power consumption of 7200rpm HDD
3. Total consumption of the system ( 1 P4 cpu +1 DIMM + 1HDD ) = 66W + 2.3W + 5.56W
= 73.86W
4. Consumption from the power grid P(AC) = P(DC) / Power_efficiency =
73.86W/0.8=92.33W total consumption on average.
5. 14kW ( 14000W) / 92.33W = 151 servers .
6. Which is more than enough, because one standard rack is 42U tall, extended racks up
to 56U.
7. To make it a bit closer to reality we should add ~100W for the motherboard , NIC and the
other controllers, at least 3 DIMM ( it's unlikely to run a server with 1 DIMM, and very
unusual to run it with 2 only, default HP DL360G4 is delivered with 2 ) , to populate 1
DIMM in each of the 3 channels , to maximize the performance [1] , and efficiency 73%,
more likely a system will consume ~250W , then 14000W/250W= 56Servers, ~ 1 rack
cabinet.
8. If maximum populated with 2 CPU , max memory and HDD, its 585W by spec - 24
servers.

References :
1. G.M. Amdahl, G.A. Blaauw, F.P. Brooks “IBM J. Res Develop. Vol 44 No, ½
January/March 2000, page 21
2. https://support.hpe.com/hpesc/public/docDisplay?docId=c00374943 , accessed
07/14/2021

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy