0% found this document useful (0 votes)
81 views59 pages

CIT 314 Module 1

CIT314 - Computer Architecture and Organization II is a 3-credit course designed for B.Sc. Computer Science students, focusing on the structure and function of modern computer systems. The course covers topics such as memory systems, addressing modes, control mechanisms, and fault tolerance across four modules. It aims to equip students with the knowledge to understand and manage computer architectures effectively.

Uploaded by

headymoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
81 views59 pages

CIT 314 Module 1

CIT314 - Computer Architecture and Organization II is a 3-credit course designed for B.Sc. Computer Science students, focusing on the structure and function of modern computer systems. The course covers topics such as memory systems, addressing modes, control mechanisms, and fault tolerance across four modules. It aims to equip students with the knowledge to understand and manage computer architectures effectively.

Uploaded by

headymoo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

COURSE

GUIDE

CIT314
COMPUTER ARCHITECTURE AND ORGANIZATION II

Course Team: Dr. Godwin Udoinyang - (Developer/Writer)


Prof. Steve Adeshina - Content Editor
Dr. Francis B. Osang – HOD/Internal Quality
Control Expert

NATIONAL OPEN UNIVERSITY OF NIGERIA


CIT 314 COURSE GUIDE

National Open University of Nigeria


University Village, Plot 91
Jabi Cadastral Zone
Nnamdi Azikiwe Expressway
Jabi, Abuja

Lagos Office
14/16 Ahmadu Bello Way
Victoria Island, Lagos
Departmental email: computersciencedepartment@noun.edu.ng
NOUN e-mail: centralinfo@noun.edu.ng
URL: www.nou.edu.ng
First Printed 2022

ISBN: 978-058-557-5

All Rights Reserved

Printed by: NOUN PRESS


January 2022

ii
CIT 314 COURSE GUIDE

CONTENTS PAGE

Introduction…………………………………………………iv

What You Will Learn in This Course …………………........v

Course Aims .......................................................................... v

Course Objectives ................................................................. v

Working through This Course .............................................v

Course Materials .................................................................. vi

Study Units ..........................................................................vi

Textbooks and References ................................................... vii

Assignment File ................................................................... viii

Presentation Schedule ......................................................... xi

Assessment ......................................................................... xii

Tutor-Marked Assignments (TMAs) .................................. xii

Final Examination and Grading .......................................... xiii

Course Marking Scheme ..................................................... xiv

Course Overview ................................................................ xv

How to Get the Most from This Course ............................. xv

Facilitators/Tutors and Tutorials ........................................ xv

Summary ..............................................................................xvi

iii
CIT 314 COURSE GUIDE

COURSE GUIDE

INTRODUCTION

CIT314 –Computer Architecture and Organization II – is a 3- credit unit


course. Keeping pace with technological change is an issue for all
computing courses and texts. Systems which seemed capable of holding
their advanced position within the market-place for several years, are
now overtaken within months of launch. Software tools are being
developed and adopted by commercial programmers long before
universities have had a chance. We all learn differently, but the ability to
use text effectively has been at the core of modern civilization for a long
time. We all benefit so much from people’s experience recorded on
paper for others to read. Ignoring this vast resource is deliberately
handicapping yourself. Life is difficult enough without conceding an
unnecessary penalty! If anything, the introduction of the World Wide
Web has placed even greater literacy demands on everyone. Most Web
pages presenting useful information still depend heavily on text. A
picture may be worth a thousand words, but it is often the accompanying
text that gives you the first glimmer of understanding.

This book is about the structure and function of Computers. Its purpose
is to present as clearly and completely as possible, the nature and
characteristics of modern-day computer systems.

This task is challenging for several reason, first, there is a tremendous


variety of products that can rightly claim the name of computer, from
single-chip microprocessors costing a lot of dollars to supercomputers
costing tens of millions of dollars. Variety is exhibited not only in cost,
but in size, performance, and application. Second, the rapid pace of
change that has always characterized computer technology continues
with no letup. These changes cover all aspects of computer technology,
from the underlying integrated circuit technology used to construct
computer components, to the increasing use of parallel organization
concepts in combining those components.

In spite of the variety, and pace of change in the computer field, certain
fundamental concepts apply consistently throughout. The application of
these concepts depends on the current state of the technology and the
price, performance, objectives of the designer. The intent of this book is
to provide a thorough discussion of the fundamentals of computer
organization and architecture and to relate these to contemporary design
issues.

It is a course for B.Sc. Computer Science major students, and is


normally taken in the third year of the programme duration. It should
iv
CIT 314 COURSE GUIDE

therefore appeal to whosoever is concerned with the understanding of


the basic computer architecture and its organization.

This course is divided into four modules. The first module deals with an
overview of the memory system. The second module covers, memory
addressing, elements of memory hierarchy, and virtual memory control
systems. The third module discusses various forms of control including
hardware, asynchronous, microprogrammed and asynchronous forms.
The fourth and last module takes on fault tolerant computing and
methods for fault-tolerant computing.

This course guide gives you a brief overview of the course contents,
course duration, and course materials.

COURSE COMPETENCIES

First, students will learn about the basics of computers and what they are
made up of. Second, they will be able to judge certain functionalities in
computer systems dependent on the type of architectures they are
operating on. This in turn will give them a deeper understanding on how
to manage computer faults.

In general, this course is designed to aid them take on a path involving


how a computer system is built to operate.

COURSE OBJECTIVES

Certain objectives have been set out for the achievement of the course
aims. And apart from the course objectives, each unit of this course has
its objectives, which you need to confirm if are met, at the end of each
unit. So, upon the completion of this course, you should be able to:

 Describe how computer memories function and how they can be


optimized
 Explain major functions and techniques involving architecture
designing and study
 Explain methods to tolerate faults in computer architectures
 Explain methods to optimize control in computer systems

WORKING THROUGH THIS COURSE

In order to have a thorough understanding of the course units, you will


need to read and understand the contents, and practice the steps and
techniques involved in the task of computer architecture and
organization and its involvement in the development of various
segments of computer systems.
v
CIT 314 COURSE GUIDE

This course is designed to cover approximately seventeen (17) weeks,


and requires your devoted attention, answering the exercises tutor-
marked marked assignments and gets them submitted to your tutors.

STUDY UNITS

There are 10 units in this course:

MODULE ONE

UNIT ONE: Memory system

1.1 Main Memories


1.2 Auxiliary Memories
1.3 Memory Access Methods
1.4 Memory Mapping and Virtual Memories
1.5 Replacement Algorithms
1.6 Data Transfer Modes
1.7 Parallel Processing
1.8 Pipelining

MODULE TWO

UNIT ONE: Memory Addressing

1.1 What is a Memory address mode?


1.2 Modes of addressing
1.3 Number of addressing modes
1.4 Advantages of addressing modes
1.5 Uses of addressing modes

UNIT TWO: Elements of Memory Hierarchy

2.1 What is Memory Hierarchy?


2.2 Memory Hierarchy Diagram
2.3 Characteristics of Memory Hierarchy
2.4 Memory Hierarchy Design
2.5 Advantages of Memory Hierarchy

UNIT THREE: Virtual Memory control systems

3.1 Memory Management Systems


3.2 Paging
3.3 Address mapping using Paging
3.4 Address Mapping using Segments
vi
CIT 314 COURSE GUIDE

3.5 Address Mapping using Segmented Paging


3.6 Multi-Programming
3.7 Virtual Machines/Memory and Protection
3.8 Hierarchical Memory systems
3.9 Drawbacks that occur in Virtual Memories

MODULE THREE

UNIT ONE: Hardware control

3.1.1 Hardwired Control Unit


3.1.2 Design of a hardwired Control Unit

UNIT TWO: Micro-Programmed Control

3.2.1 Design of a Micro-Programmed Control Unit


3.2.2 Differences between Hardwired and Microprogrammed Control
3.2.3 Organization of Micro-Programmed Control Unit
3.2.4 Types of Micro-programmed Control Unit

UNIT THREE: Asynchronous Control

3.3.1 Clock limitations


3.3.2 Basic Concepts
3.3.3 Benefits of Asynchronous Control
3.3.4 Asynchronous Communication
3.3.5 Asynchronous Transmission
3.3.6 Synchronous vs. Asynchronous Transmission
3.3.7 Emerging application areas
3.3.8 Asynchronous Data paths and Data Transfer
3.3.9 Handshaking

MODULE FOUR

UNIT ONE: Fault Tolerant Computing

3.0.1.1 What is Fault Tolerance


3.0.1.2 Fault Tolerant Systems
3.0.1.3 Hardware and Software Fault Tolerant Issues
3.0.1.4 Fault Tolerance VS High Availability
3.0.1.5 Redundancy
3.0.1.6 Relationship Between Security and Fault Tolerance

vii
CIT 314 COURSE GUIDE

UNIT TWO: Methods for Fault Tolerant Computing

3.0.2.0 Fault Tree Analysis


3.0.2.1 Fault Detection Methods
3.0.2.2 Fault Tolerance Architecture
3.0.2.3 Fault Models
3.0.2.4 Fault Tolerance Methods
3.0.2.5 Major Issues in Modelling and Evaluation
3.0.2.6 Fault Tolerance for Web Applications

You should make use of the course materials, and do the exercises to
enhance your learning.

REFERENCES AND FURTHER READINGS

Adamski, M., Barkalov, A.: Architectural and Sequential Synthesis of


Digital Devices. University of Zielona Góra Press, Zielona Góra
(2006). URL:
https://www.sciencedirect.com/science/article/pii/S147466701632
3667

Agerwala, T.: Microprogram optimization: A survey. IEEE Transactions


of Computers (10), 962–973 (1976). URL:
https://ieeexplore.ieee.org/document/1674537

Ailamaki AG, DeWitt DJ., Hill MD, Wood DA. DBMSs on a modern
processor: where does time go? In: Proceedings of the 25th
International Conference on Very Large Data Bases; 1999. p. 266–
77. URL: https://www.semanticscholar.org/paper/DBMSs-on-a-
Modern-Processor%3A-Where-Does-Time-Go-Ailamaki-
DeWitt/54b92179ede08158e2cf605f5e9f264ca06c01ff

Amma A. D. T., Pramod V. R and N. Radhika, (2012) “ISM for


Analyzing the Interrelationship between the Inhibitors of Cloud
Computing”, vol. 2, No. 3. URL:
https://www.academia.edu/12392163/Revisiting_Software_Securit
y_Durability_Perspective

Anderson T. and Knight J. C. (1983), “A Framework for software Fault


tolerance in Real time System”, IEEE Transaction on software
Engineering, Vol. 9, No.3. URL:
https://www.cse.cuhk.edu.hk/~lyu/book/sft/pdf/chap8.pdf

Asanovic, Krste (2017). The RISC V Instruction Set


Manual (PDF) (2.2 ed.). Berkeley: RISC-V Foundation.
viii
CIT 314 COURSE GUIDE

Astha Singh. "Computer Organization - Control Unit and


design". GeeksforGeeks. Retrieved 25 May 2019.

Balaji E. and Krishnamurthy P. (1996). “Modeling ASIC memories in


VHDL”. In: Design Automation Conference, with EURO-VHDL
’96 and Exhibition, Proceedings EURODAC ’96, European, pp.
502–508. DOI: 10.1109/EURDAC.1996.558250.

Chattopadhyay, S.: Area conscious state assignment with flip-flop and


output polarity selection for finite state machines synthesis – a
genetic algorithm. The Computer Journal 48(4), 443–450 (2005).
URL:
https://www.researchgate.net/publication/220459930_Area_Consci
ous_State_Assignment_with_Flip-
Flop_and_Output_Polarity_Selection_for_Finite_State_Machine_S
ynthesis--A_Genetic_Algorithm_Approach

CLARE. C. R.: Designing Logic Systems Using State Machines.


McGraw-Hill Book Company. 1973. URL: http://bitsavers.trailing-
edge.com/pdf/hp/tutorial/Clare_-
_Designing_Logic_Systems_Using_State_Machines_1973.pdf

Denning PJ. The working set model for program behaviour. Commun
ACM. 1968;11(5):323–33. URL:
https://denninginstitute.com/pjd/PUBS/WSModel_1968.pdf

Engineering Safety Requirements, Safety Constraints, and Safety


Critical Requirements, Available at:
http://www.jot.fm/issues/issue_2004_03/column3/ last visit
November 17, 2021.

Fred B. Schneider (1990). Implementing fault-tolerant services using the


state machine approach: A tutorial. A.C.M. Computing Surveys,
22(4):299–319. URL: https://dl.acm.org/doi/10.1145/98163.98167

Fred B. Schneider. (1997) Towards fault-tolerant and secure agentry.


Technical report, Cornell University, Department of Computer
Science. URL:
https://link.springer.com/chapter/10.1007/BFb0030670

Fundamentals of Computer Organization and Architecture, by M. Abd-


El-Barr and H. El-Rewini ISBN 0-471-46741-3 Copyright # 2005
John Wiley & Sons, Inc. URL:
https://engineering.futureuniversity.com/BOOKS%20FOR%20IT/
%5BMostafa_Abd-El-Barr__Hesham_El-
Rewini%5D_Fundamenta(BookZZ.org).pdf
ix
CIT 314 COURSE GUIDE

IEEE Trans. Computers, journal published by IEEE Computer Society;


has occasional special issues on parallel and distributed processing
(April 1987, December 1988, August 1989, December 1991, April
1997, April 1998).
http://link.springer.com/content/pdf/bfm%3A978-0-306-46964-
0%2F1.pdf

John L. Hennessy and David A. Patterson (2012) Computer


Architecture; A Qualitative Approach. Fifth (Ed.), Library of
Congress Cataloging in Publication Data. URL:
https://www.academia.edu/22618699/Computer_Architecture_A_
Quantitative_Approach_5th_edition_

Johnson, B. W. (1996). An introduction to the design and analysis of


fault-tolerant systems. Fault-tolerant computer system design, 1, 1-
84. URL:
https://www.researchgate.net/publication/234812893_An_introduc
tion_to_the_design_and_analysis_of_fault-tolerant_systems

Keith R. Mobley (2004) Maintenance Fundamentals. 2nd (Ed.), Elsevier


Butterworth Heinemann. URL:
https://www.elsevier.com/books/maintenance-
fundamentals/mobley/978-0-7506-7798-1

Kim, E. P., & Shanbhag, N. R. (2012). Soft N-modular redundancy.


IEEE Transactions on Computers, 61(3), 323–336. URL:
https://dl.acm.org/doi/abs/10.1109/TC.2010.253

Leighton, Luke. "Libre RISC-V M-Class". Crowd Supply. Retrieved 16


January 2020.

Lyu, M. and Mendiratta V, (1999) “Software Fault Tolerance in a


Clustered Architecture: Techniques and Reliability Modeling,” In
Proceedings' of IEEE Aerospace Conference, Snowmass,
Colorado, vol.5, pp.141-150, 6-13. URL:
https://dl.acm.org/doi/10.1007/11955498_4

Manegold S. Understanding, modeling, and improving main-memory


database performance. PhD thesis, Universiteit van Amsterdam,
Amsterdam, The Netherlands; 2002. URL:
https://ir.cwi.nl/pub/14301/14301B.pdf

Mostafa Abd-El-Barr and Hesham El-Rewini (2005) Fundamentals of


Computer Organization and Architecture. A John Wiley and Sons,
Inc Publication. URL:
x
CIT 314 COURSE GUIDE

https://books.google.com/books/about/Fundamentals_of_Computer
_Organization_an.html?id=m6uFlL41TlIC

Neuman, P (2000) “Practical Architecture for survivable system and


networks”, Phase Two Project 1688, SRI International, Menlo
Park, California. URL:
http://www.csl.sri.com/users/neumann/survivability.html

Patton, R. J. (2015). Fault-tolerant control. Encyclopedia of systems


and control, 422–428. URL: https://encyclopedia.pub/3028

Power ISA(tm) (3.0B ed.). Austin: IBM. 2017. Retrieved 26


December 2019.

Richard D. Schlichting and Fred B. Schneider. (1983) Fail-stop


processors: An approach to designing fault-tolerant computing
systems. A.C.M. Transactions on Computer Systems, 1(3):222–
238.

Rob Williams (2006) Computer System Architecture; ANetwork


Approach. 2 (Ed.), Prentice Hall. URL:
https://dokumen.pub/computer-systems-architecture-a-networking-
approach-with-cd-rom-2nd-ed-9780321340795-0321340795-
9781405890588-1405890584.html

Shatdal A, Kant C, Naughton J. Cache conscious algorithms for


relational query processing. In: Proceedings of the 20th
International Conference on Very Large Data Bases; 1994. p. 510–
2. URL https://www.semanticscholar.org/paper/Cache-Conscious-
Algorithms-for-Relational-Query-Shatdal-
Kant/12c2693c5e27a301a030933822c1c6da1558c267

Stallings, W. (2015). Computer Organization and Architecture. Pearson


Education. URL:
https://docs.google.com/viewer?a=v&pid=sites&srcid=aGNtdWF
mLmVkdS52bnxuZ3V5ZW54dWFudmluaHxneDo1YzAxMWY0
N2QxMGViZTRl

Stone, H. S., High-Performance Computer Architecture, Addison–


Wesley, 1993. URL: https://www.abebooks.com/book-
search/title/high-performance-computer-architecture/author/harold-
stone/

Varma, A., and C. S. Raghavendra, Interconnection Networks for


Multiprocessors and Multicomputers: Theory and Practice, IEEE
Computer Society Press, 1994. URL:
xi
CIT 314 COURSE GUIDE

https://books.google.com/books/about/Interconnection_Networks_
for_Multiproces.html?id=-1u7QgAACAAJ
Walton G. H., Long Taff T.A. and R. C. Linder, (1997) “Computational
Evaluation of Software Security attributes”, IEEE.

Webb, C., Liptay, J.: A high-frequency custom cmos s/390


microprocessor. IBM Journal of research and
Development 41(4/5), 463–473 (1997)

WENDT. S.: Entwurf komplexer Schaltvierke. Springer Verlag. 1974.


URL: https://www.springer.com/de/book/9783642474552

William Stallings (2003) Computer Organization Architecture;


Designing for Performance Six Ed. Prentice Hall. URL
http://williamstallings.com/ComputerOrganization/

William Stallings (2019) Computer Organization and Architecture;


Designing for Performance. 11 (Ed.), Pearson. URL:
https://www.pearson.com/us/higher-education/program/Stallings-
Pearson-e-Text-for-Computer-Organization-and-Architecture-
Access-Code-Card-11th-Edition/PGM2043621.html

William, S. (2010). Computer organization and architecture: designing


for performance. URL:
https://www.academia.edu/44827616/Computer_organization_and
_arChiteCture_Designing_for_Performance_tenth_edition

Zomaya, A. Y. (ed.), Parallel and Distributed Computing Handbook,


McGraw-Hill, 1996. URL: https://research-
repository.uwa.edu.au/en/publications/parallel-and-distributed-
computing-handbook

PRESENTATION SCHEDULE

The Presentation Schedule included in your course materials gives you


the important dates for the completion of tutor marked assignments and
attending tutorials. Remember, you are required to submit all your
assignments by the due date. You should guard against lagging behind
in your work.

ASSESSMENT

There are two aspects to the assessment of the course. First are the tutor
–marked assignments; second, is a written examination. In tackling the
assignments, you are expected to apply the information and knowledge
you acquired during this course. The assignments must be submitted to
xii
CIT 314 COURSE GUIDE

your tutor for formal assessment in accordance with the deadlines stated
in the Assignment File. The work you submit to your tutor for
assessment will count for 30% of your total course mark. At the end of
the course, you will need to sit for a final three-hour examination.
This will also count for 70% of your total course mark.

TUTOR-MARKED ASSIGNMENT

There are eight tutor- marked assignments in this course. You need to
submit all the assignments. The total marks for the best four (4)
assignments will be 30% of your total course mark.

Assignment questions for the units in this course are contained in the
Assignment

File. You should be able to complete your assignments from the


information and materials contained in your set textbooks and study
units. However, you may wish to use other references to broaden your
viewpoint and provide a deeper understanding of the subject.

When you have completed each assignment, send it together with a form
to your tutor. Make sure that each assignment reaches your tutor on or
before the deadline given. If, however you cannot complete your work
on time, contact your tutor before the assignment is done to discuss the
possibility of an extension.

FINAL EXAMINATIONS AND GRADING

The final examination for the course will carry 70% percentage of the
total mark available for this course. The examination will cover every
aspect of the course, so you are advised to revise all your corrected
assignments before the examination.

This course endows you with the status of a teacher and that of a learner.
This means that you teach yourself and that you learn, as your learning
capabilities would allow. It also means that you are in a better position
to determine and to ascertain the what, the how, and the when of your
language learning. No teacher imposes any method of learning on you.
The course units are similarly designed with the introduction following
the table of contents, then a set of objectives and then the discourse and
so on. The objectives guide you as you go through the units to ascertain
your knowledge of the required terms and expressions.

xiii
CIT 314 COURSE GUIDE

COURSE MARKING SCHEME


This table shows how the actual course marking is broken down.

Assessment Marks
Assignment 1- 4 Four assignments, best three marks of the four count at
30% of course marks
Final Examination 70% of overall course marks
Total 100% of course marks

HOW TO GET THE BEST FROM THIS COURSE

In distance learning the study units replace the university lecturer. This is
one of the great advantages of distance learning; you can read and work
through specially designed study materials at your own pace, and at a time
and place that suit you best. Think of it as reading the lecture instead of
listening to a lecturer. In the same way that a lecturer might set you some
reading to do, the study units tell you when to read your set books or other
material. Just as a lecturer might give you an in-class exercise, your study
units provide exercises for you to do at appropriate points.

Each of the study units follows a common format. The first item is an
introduction to the subject matter of the unit and how a particular unit is
integrated with the other units and the course as a whole. Next is a set of
learning objectives. These objectives enable you know what you should be
able to do by the time you have completed the unit. You should use these
objectives to guide your study. When you have finished the units you
must go back and check whether you have achieved the objectives. If you
make a habit of doing this, you will significantly improve your chances of
passing the course.

Remember that your tutor’s job is to assist you. When you need help, don’t
hesitate to call and ask your tutor to provide it.

1. Read this Course Guide thoroughly.


2. Organize a study schedule. Refer to the Course Overview for more
details. Note the time you are expected to spend on each unit and
how the assignments relate to the units. Whatever method you chose
to use, you should decide on it and write in your own dates for
working on each unit.
3. Once you have created your own study schedule, do everything you
can to stick to it. The major reason that students fail is that they lag
behind in their course work.
4. Turn to Unit 1 and read the introduction and the objectives for the
unit.
xiv
CIT 314 COURSE GUIDE

5. Assemble the study materials. Information about what you need for
a unit is given in the overview at the beginning of each unit. You
will almost always need both the study unit you are working on and
one of your set of books on your desk at the same time.
6. Work through the unit. The content of the unit itself has been
arranged to provide a sequence for you to follow. As you work
through the unit you will be instructed to read sections from your
set books or other articles. Use the unit to guide your reading.
7. Review the objectives for each study unit to confirm that you have
achieved them. If you feel unsure about any of the objectives,
review the study material or consult your tutor.
8. When you are confident that you have achieved a unit’s
objectives, you can then start on the next unit. Proceed unit by
unit through the course and try to pace your study so that you
keep yourself on schedule.
9. When you have submitted an assignment to your tutor for marking,
do not wait for its return before starting on the next unit. Keep to
your schedule. When the assignment is returned, pay particular
attention to your tutor’s comments, both on the tutor-marked
assignment form and on the assignment. Consult your tutor as
soon as possible if you have any questions or problems.
10. After completing the last unit, review the course and prepare
yourself for the final examination. Check that you have achieved
the unit objectives (listed at the beginning of each unit) and the
course objectives (listed in this Course Guide).

FACILITATORS/TUTORS AND TUTORIALS

There are 15 hours of tutorials provided in support of this course. You


will be notified of the dates, times and location of these tutorials, together
with the name and phone number of your tutor, as soon as you are
allocated a tutorial group.

Your tutor will mark and comment on your assignments, keep a close
watch on your progress and on any difficulties you might encounter and
provide assistance for you during the course. You must mail or submit
your tutor-marked assignments to your tutor well before the due date (at
least two working days are required). They will be marked by your tutor
and returned to you as soon as possible.

Do not hesitate to contact your tutor by telephone, or e-mail if you need


help. The following might be circumstances in which you would find help
necessary. Contact your tutor if:

• you do not understand any part of the study units or the assigned
readings,
xv
CIT 314 COURSE GUIDE

• you have difficulty with the self-tests or exercises,

• you have a question or problem with an assignment, with your


tutor’s comments on an assignment or with the grading of an
assignment.

You should try your best to attend the tutorials. This is the only chance to
a have face to face contact with your tutor and to ask questions which are
answered instantly. You can raise any problem encountered in the course
of your study. To gain the maximum benefit from course tutorials, prepare
a question list before attending them. You will learn a lot from
participating in discussions actively.

SUMMARY

Introduction to Computer Organization, as the title implies, introduces


you to the fundamental concepts of how the computer system operates
internally to perform the basic tasks required of it by the end-users.
Therefore, you should acquire the basic knowledge of the internal
workings of the components of the computer system in this course. The
content of the course material was planned and written to ensure that you
acquire the proper knowledge and skills in order to be able to programme
the computer to do your bidding. The essence is to get you to acquire the
necessary knowledge and competence and equip you with the necessary
tools.

We wish you success with the course and hope that you will find it
interesting and useful.

xvi
MAIN
COURSE

CONTENT PAGE

MODULE 1………………………………………. 1

UNIT 1 MEMORY SYSTEM…………….. 2

1.1 Main Memories


1.2 Auxiliary Memories
1.3 Memory Access Methods
1.4 Memory Mapping and Virtual Memories
1.5 Replacement Algorithms
1.6 Data Transfer Modes
1.7 Parallel Processing
1.8 Pipelining

MODULE 2………………………………………… 41

UNIT 1 MEMORY ADDRESSING………………. 41

1.1 What is a Memory address mode?


1.2 Modes of addressing
1.3 Number of addressing modes
1.4 Advantages of addressing modes
1.5 Uses of addressing modes

UNIT 2 ELEMENTS OF MEMORY


HIERARCHY………………………………. 49

2.1 What is Memory Hierarchy?


2.2 Memory Hierarchy Diagram
2.3 Characteristics of Memory Hierarchy
2.4 Memory Hierarchy Design
2.5 Advantages of Memory Hierarchy
UNIT 3 VIRTUAL MEMORY
CONTROL SYSTEMS…………………… 56

3.1 Memory Management Systems


3.2 Paging
3.3 Address mapping using Paging
3.4 Address Mapping using Segments
3.5 Address Mapping using Segmented Paging
3.6 Multi-Programming
3.7 Virtual Machines/Memory and Protection
3.8 Hierarchical Memory systems
3.9 Drawbacks that occur in Virtual Memories

MODULE 3………………………………………………. 69

UNIT 1 HARDWARE CONTROL………………… 69

3.1.1 Hardwired Control Unit


3.1.2 Design of a hardwired Control Unit

UNIT 2 MICRO-PROGRAMMED CONTROL…. 75

3.2.1 Design of a Micro-Programmed Control Unit


3.2.2 Differences between Hardwired and Microprogrammed Control
3.2.3 Organization of Micro-Programmed Control Unit
3.2.4 Types of Micro-programmed Control Unit

UNIT 3 ASYNCHRONOUS CONTROL…………. 83

3.3.1 Clock limitations


3.3.2 Basic Concepts
3.3.3 Benefits of Asynchronous Control
3.3.4 Asynchronous Communication
3.3.5 Asynchronous Transmission
3.3.6 Synchronous vs. Asynchronous Transmission
3.3.7 Emerging application areas
3.3.8 Asynchronous Data paths and Data Transfer
3.3.9 Handshaking

MODULE 4……………………………………………. 102

UNIT 1 FAULT TOLERANT COMPUTING….. 102

3.0.1.1 What is Fault Tolerance


3.0.1.2 Fault Tolerant Systems
3.0.1.3 Hardware and Software Fault Tolerant Issues
3.0.1.4 Fault Tolerance VS High Availability
3.0.1.5 Redundancy
3.0.1.6 Relationship Between Security and Fault Tolerance

UNIT 2 METHODS FOR FAULT TOLERANT


COMPUTING…………………………………… 129

3.0.2.0 Fault Tree Analysis


3.0.2.1 Fault Detection Methods
3.0.2.2 Fault Tolerance Architecture
3.0.2.3 Fault Models
3.0.2.4 Fault Tolerance Methods
3.0.2.5 Major Issues in Modelling and Evaluation
3.0.2.6 Fault Tolerance for Web Applications
CIT 314 MODULE 1

MODULE 1 INTRODUCTION TO COMPUTER


ARCHITECTURE AND ORGANIZATION

INTRODUCTION

Computer Organization is concerned with the structure and behavior of


a computer system as seen by the user. It acts as the interface between
hardware and software. Computer architecture refers to those attributes
of a system visible to a programmer, or put another way, those attributes
that have a direct impact on the logical execution of a program.
Computer organization refers to the operational units and their
interconnection that realize the architecture specification.

Examples of architecture attributes include the instruction set, the


number of bit to represent various data types (e.g.., numbers, and
characters), I/O mechanisms, and technique for addressing memory.

Examples of organization attributes include those hardware details


transparent to the programmer, such as control signals, interfaces
between the computer and peripherals, and the memory technology
used.

As an example, it is an architectural design issue whether a computer


will have a multiply instruction. It is an organizational issue whether
that instruction will be implemented by a special multiply unit or by a
mechanism that makes repeated use of the add unit of the system. The
organization decision may be bases on the anticipated frequency of use
of the multiply instruction, the relative speed of the two approaches, and
the cost and physical size of a special multiply unit.

This module is an introductory module. Having one unit, the unit


explains memory systems and basic functionalities related to it. This is
shown below.

1
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

UNIT ONE: MEMORY SYSTEMS

CONTENTS

1.0 Introduction
2.0 Intended Learning Outcomes (ILOS)
3.0 Main Contents
UNIT ONE: Memory Systems
3.1 Main Memories
3.2 Auxiliary Memories
3.3 Memory Access Methods
3.4 Memory Mapping and Virtual Memories
3.5 Replacement Algorithms
3.6 Data Transfer Modes
3.7 Parallel Processing
3.8 Pipelining
4.0 Self-Assessment Exercises
5.0 Conclusion
6.0 Summary
7.0 References/Further Reading

1.0 INTRODUCTION

Although seemingly simple in concept, computer memory exhibits


perhaps the: widest range of type. Technology, organization.
Performance, and cost of any feature of a computer system. No one
technology is optimal in satisfying the memory requirements for a
computer system. As a consequence, the typical computer system is
equipped with a hierarchy of memory subsystems, some internal to the
system (directly accessible by the processor) and some external
(accessible by the processor via a 110 module).

The memory unit is an essential component in any digital ccomputer


since it is needed for storing programs and data. A very small computer
with an unlimited application may be able to fulill its intended task
without the need of additional storage capacity. Most general purpose
computers would run more efficiently if they were equipped with
additional storage beyond the capacity of the main memory. There is
just not enough space in one memory unit to accommodate all the
progams used in a typical computer. The memory unit that
communicates directly with the CPU is called the main memory.
Devices that provide backup storage are called auxiliary memory. The
most common auxiliary memory devices used in computer systems are
magnetic disks and tapes. They are used for storing system, programs,
large data, and other backup information. Only programs and data
2
CIT 314 MODULE 1

currently needed by the processor reside in main memory. A special


very-high speed memory called a Cache is sometimes used to increase
the speed of processing by making current programs and data available
to the CPU at a rapid rate. The cache memory is employed in computer
systems to compensate for the speed differential between main memory
access time and processor logic.

The complex subject of computer memory is made more manageable if


we classify memory systems according to their key characteristics.
Internal memory is often equated with main memory. But there are other
forms of internal memory. The processor requires its own local memory,
in the form of registers. The control unit portion of the processor may
also require its own internal memory. Cache is another form of internal
memory. External memory consists of peripheral storage devices, such
as disk and tape, that arc accessible to the processor via I/O. An obvious
characteristic of memory is its capacity. For internal memory, this is
typically expressed in terms of bytes (1 byte- = 1024 bits) or words.
Common word lengths are 8, 16, and 32 bits. External memory capacity
is typically expressed in terms of bytes.

A related concept is the unit of transfer, for internal memory, the unit
of transfer is equal to the number of data lines into and out of the
memory module. This may be equal to the word length, but is often
larger. such as 64. 128, or 256 bits. From a user's point of view, the two
most important characteristics of memory are capacity and
performance. Three performance parameters arc used: a. second access
can commence. This additional time may be required for transients to
die out on signal lines or to regenerate data if they are read destructively.
Now that memory cycle time is concerned with the system bus, not the
processor.

 Transfer rate: This is the rate at which data can be transferred


into or out of a memory unit.
 Access time (latency): For random-access memory, this is the
time it takes to perform a read or write operation. That is, the
time from the instant that an address is presented to the memory
to the instant that data have been stored or made available for use.
For non-random-access memory, access time is the time it takes
to position the read—write mechanism at the desired location.

Memory cycle time: This concept is primarily applied to random-access


memory and consists of the access time plus any additional time
required before

3
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

2.0 INTENDED LEARNING OUTCOMES (ILOs)

At the end of this module, the user should be able to discuss elaborately
on;

 Memory types and their functionalities


 The history of memory devices
 Modes to augment processing
 Access methods
 Pipelining

3.1 MAIN MEMORIES

The main memory is the central storage unit in a computer system. It is a


relatively large and fast memory used to store programs and data during
the computer operation. The principal technology used for the main
memory is based on semiconductor integrated circuits. Integrated circuit
RAM chips are available in two possible operating modes, static and
dynamic. The static RAM consists essentially of internal flip-flops that
store the binary information. The stored information remains valid as
long as power is applied to the unit. The dynamic RAM stores the binary
information in the form of electric charges that are applied to capacitors.
The capacitors are provided inside the chip by MOS transistors. The
stored charge on the capacitors tend to discharge with time and the
capacitors must be periodically recharged by refreshing the dynamic
memory. Refreshing is done by cycling through the words every few
milliseconds to restore the decaying charge. The dynamic RAM offers
reduced power consumption and larger storage capacity in a single
memory chip. The static RAM is easier to use and has shorter read and
write cycles. Most of the main memory in a general-purpose computer is
made up of RAM integrated circuit chips, but a portion of the memory
may be constructed with ROM chips. Originally, RAM was used to refer
to a random-access memory, but now it is used to designate a read/write
memory to distinguish it from a read-only memory, although ROM is
also random access. RAM is used for storing the bulk of the programs
and data that are subject to change. ROM is used for storing programs
that are permanently resident in the computer and for tables of constants
that do not change in value once the production of the computer is
completed. Among other things, the ROM portion of main memory is
needed for storing an initial program called a bootstrap loader. The
bootstrap loader is a program whose function is to start the computer
software operating when power is turned on. Since RAM is volatile, its
contents are destroyed when power is turned off. The contents of ROM
remain unchanged after power is turned off and on again. The startup of
4
CIT 314 MODULE 1

a computer consists of turning the power on and starting the execution


of an initial program. Thus when power is turned on, the hardware of the
computer sets the program counter to the first address of the bootstrap
loader. The bootstrap program loads a portion of the operating system
from disk to main memory and control is then transferred to the
operating system, which prepares the computer for general use.

3.2 AUXILIARY MEMORIES

The primary types of Auxiliary Storage Devices are:

 Magnetic tape
 Magnetic Disks
 Floppy Disks
 Hard Disks and Drives

These high-speed storage devices are very expensive and hence the cost
per bit of storage is also very high. Again, the storage capacity of the
main memory is also very limited. Often it is necessary to store
hundreds of millions of bytes of data for the CPU to process. Therefore,
additional memory is required in all the computer systems. This memory
is called auxiliary memory or secondary storage. In this type of memory,
the cost per bit of storage is low. However, the operating speed is slower
than that of the primary memory. Most widely used secondary storage
devices are magnetic tapes, magnetic disks and floppy disks.

 It is not directly accessible by the CPU.


 Computer usually uses its input / output channels to access
secondary storage and transfers the desired data using an
intermediate in primary storage.

3.2.1 Magnetic Tapes

Magnetic tape is a medium for magnetic recording, made of a thin,


magnetisable coating on a long, narrow strip of plastic film. It was
developed in Germany, based on magnetic wire recording. Devices that
record and play back audio and video using magnetic tape are tape
recorders and video tape recorders. Magnetic tape an information
storage medium consisting of a magnetic coating on a flexible backing
in tape form. Data is recorded by magnetic encoding of tracks on the
coating according to a particular tape format.

5
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Figure 1.0: Magnetic Tape

Characteristics of Magnetic Tapes

 No direct access, but very fast sequential access.


 Resistant to different environmental conditions.
 Easy to transport, store, cheaper than disk.
 Before, it was widely used to store application data; nowadays,
 it's mostly used for backups or archives (tertiary storage).

Magnetic tape is wound on reels (or spools). These may be used on their
own, as open-reel tape, or they may be contained in some sort of
magnetic tape cartridge for protection and ease of handling. Early
computers used open-reel tape, and this is still sometimes used on large
computer systems although it has been widely superseded by cartridge
tape. On smaller systems, if tape is used at all it is normally cartridge
tape.

Figure 1.2: Magnetic Tape

Magnetic tape is used in a tape transport (also called a tape drive, tape
deck, tape unit, or MTU), a device that moves the tape over one or more
magnetic heads. An electrical signal is applied to the write head to
record data as a magnetic pattern on the tape; as the recorded tape passes
over the read head it generates an electrical signal from which the stored
6
CIT 314 MODULE 1

data can be reconstructed. The two heads may be combined into a single
read/write head. There may also be a separate erase head to erase the
magnetic pattern remaining from previous use of the tape. Most
magnetic-tape formats have several separate data tracks running the
length of the tape. These may be recorded simultaneously, in which
case, for example, a byte of data may be recorded with one bit in each
track (parallel recording); alternatively, tracks may be recorded one at a
time (serial recording) with the byte written serially along one track.

 Magnetic tape has been used for offline data storage, backup,
archiving, data interchange, and software distribution, and in the
early days (before disk storage was available) also as online
backing store. For many of these purposes it has been superseded
by magnetic or optical disk or by online communications. For
example, although tape is a non-volatile medium, it tends to
deteriorate in long-term storage and so needs regular attention
(typically an annual rewinding and inspection) as well as a
controlled environment. It is therefore being superseded for
archival purposes by optical disk.
 Magnetic tape is still extensively used for backup; for this
purpose, interchange standards are of minor importance, so
proprietary cartridge-tape formats are widely used.
 Magnetic tapes are used for large computers like mainframe
computers where large volume of data is stored for a longer time.
In PCs also you can use tapes in the form of cassettes.
 The cost of storing data in tapes is inexpensive. Tapes consist of
magnetic materials that store data permanently. It can be 12.5
mm to 25 mm wide plastic film-type and 500 meter to 1200-
meter-long which is coated with magnetic material. The deck is
connected to the central processor and information is fed into or
read from the tape through the processor. It is similar to cassette
tape recorder.

Advantages of Magnetic Tape

 Compact: A 10-inch diameter reel of tape is 2400 feet long and is


able to hold 800, 1600 or 6250 characters in each inch of its
length. The maximum capacity of such type is 180 million
characters. Thus data are stored much more compact on tape
 Economical: The cost of storing characters on tape is very less as
compared to other storage devices.
 Fast: Copying of data is easier and fast.
 Long term Storage and Re-usability: Magnetic tapes can be
used for long term storage and a tape can be used repeatedly
without loss of data.

7
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

3.2.2 Magnetic Disks

You might have seen the gramophone record, which is circular like a
disk and coated with magnetic material. Magnetic disks used in
computer are made on the same principle. It rotates with very high speed
inside the disk drive. Data are stored on both the surface of the disk.

Magnetic disks are most popular for direct access storage. Each disk
consists of a number of invisible concentric circles called tracks.
Information is recorded on tracks of a disk surface in the form of tiny
magnetic sports. The presence of a magnetic sport represents one bit (1)
and its absence represents zero bit (0). The information stored in a disk
can be read many times without affecting the stored data. So the reading
operation is non-destructive. But if you want to write a new data, then
the existing data is erased from the disk and new data is recorded.

A digital computer memory in which the data carrier is a thin aluminium


or plastic disk coated with a layer of magnetic material. Magnetic disks
are 180-1,200 mm in diameter and 2.5-5.0 mm thick; Ni-Co or CoW
alloys are used for the magnetic coating. A magnetic disk memory
usually contains several dozen disks mounted on a comm on axle, which
is turned by an electric motor. One or more disks (a packet) may be
replaced, creating disk index files. There may be as many as 100 disks in
a memory and 64- 5,000 data tracks on each operating surface of a disk;
the recording density is 20 130 impulses per millimeter.

The data capacity of magnetic disk memories ranges from several tens
of thousands up to several billion bits, and the average access time is 10-
100 milliseconds. The two main types are the hard disk and the floppy
disk.

Data is stored on either or both surfaces of discs in concentric rings


called "tracks". Each track is divided into a whole number of "sectors".
Where multiple (rigid) discs are mounted on the same axle the set of
tracks at the same radius on all their surfaces is known as a" cylinder".
Data is read and written by a disk drive which rotates the discs and
positions the read/write heads over the desired track(s). The latter radial
movement is known as "seeking”. There is usually one head for each
surface that stores data. The head writes binary data by magnetising
small areas or "zones" of the disk in one of two opposing orientations. It
reads data by detecting current pulses induced in a coil as zones with
different magnetic alignment pass underneath it.

In theory, bits could be read back as a time sequence of pulse (one) or


no pulse (zero). However, a run of zeros would give a prolonged
absence of signal, making it hard to accurately divide the signal into
8
CIT 314 MODULE 1

individual bits due to the variability of motor speed. High speed disks
have an access time of 28 milliseconds or less, and low speed disk s,
65milliseconds or more. The higher

speed disks also transfer their data faster than the slower speed units.
The disks are usually aluminum with a magnetic coating. The heads
"float" just above the disk's surface on a current of air, sometimes at
lower than atmospheric pressure in an air tight enclosure. The head has
an aerodynamic shape so the current pushes it away from the disk. A
small spring pushes the head towards the disk at the same time keeping
the he a data constant distance from the disk (about two microns). Disk
drives are commonly characterized by the kind of interface used to
connect to the computer

Figure 1.3: Magnetic Disks

3.2.3 Floppy Disks

These are small removable disks that are plastic coated with magnetic
recording material. Floppy disks are typically 3.5″ in size (diameter) and
can hold 1.44 MB of data. This portable storage device is a rewritable
media and can be reused a number of times. Floppy disks are commonly
used to move files between different computers. The main disadvantage
of floppy disks is that they can be damaged easily and, therefore, are not
very reliable. The following figure shows an example of the floppy disk.
It is similar to magnetic

9
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Figure 1.4: Floppy Disks

disk. It is 3.5 inch in diameter. The capacity of a 3.5-inch floppy is 1.44


megabytes. It is cheaper than any other storage devices and is portable.
The floppy is a low-cost device particularly suitable for personal
computer system.

 Read/Write head: A floppy disk drive normally has two-


read/write heads making Modern floppy disk drives as double-
sided drives. A head exists for each side of disk and Both heads
are used for reading and writing on the respective disk side.
 Head 0 and Head 1: Many people do not realize that the first
head (head 0) is bottom one and top head is head 1. The top head
is located either four or eight tracks inward from the bottom head
depending upon the drive type.
 Head Movement: A motor called head actuator moves the head
mechanism. The heads can move in and out over the surface of
the disk in a straight line to position themselves over various
tracks. The heads move in and out tangentially to the tracks that
they record on the disk.
 Head: The heads are made of soft ferrous (iron) compound with
electromagnetic coils. Each head is a composite design with a
R/W head centered within two tunnel erasure heads in the same
physical assembly. PC compatible floppy disk drive spin at 300
or 360r.p.m. The two heads are spring loaded and physically grip
the disk with small pressure, this pressure does not present
excessive friction.

3.2.3.1 Recording Method

 Tunnel Erasure: As the track is laid down by the R/W heads, the
trailing tunnel erasure heads force the data to be present only
within a specified narrow tunnel on each track. This process
prevents the signals from reaching adjacent track and making
cross talk.
10
CIT 314 MODULE 1

 Straddle Erasure: In this method, the R/W and the erasure heads
do recording and erasing at the same time. The erasure head is
not used to erase data stored in the diskette. It trims the top and
bottom fringes of recorded flux reversals. The erasure heads
reduce the effect of cross-talk between tracks and minimize the
errors induced by minor run out problems on the diskette or
diskette drive.
 Head alignment: Alignment is the process of placement of the
heads with respect to the track that they must read and write.
Head alignment can be checked only against some sort of
reference- standard disk recorded by perfectly aligned machine.
These types of disks are available and one can use one to check
the drive alignment.

3.2.4 Hard Disks and Drives

A hard disk drive (HDD), hard disk, hard drive or fixed disk is a data
storage device that uses magnetic storage to store and retrieve digital
information using one or more rigid rapidly rotating disks (platters)
coated with magnetic material. The platters are paired with magnetic
heads, usually arranged on a moving actuator arm, which read and write
data to the platter surfaces. Data is accessed in a random-access manner,
meaning that individual blocks of data can be stored or retrieved in any
order and not only sequentially.

HDDs are a type of non-volatile storage, retaining stored data even


when powered off. A hard drive can be used to store any data, including
pictures, music, videos, text documents, and any files created or
downloaded. Also, hard drives store files for the operating and software
programs that run on the computer. All primary computer hard drives
are found inside a computer case and are attached to the computer
motherboard using an ATA, SCSI, or SATA cable, and are powered by
a connection to the PSU (power supply unit). The hard drive is typically
capable of storing more data than any other drive, but its size can vary
depending on the type of drive and its age. Older hard drives had a
storage size of several hundred megabytes (MB) to several gigabytes
(GB). Newer hard drives have a storage size of several hundred
gigabytes to several terabytes (TB). Each year, new and improved
technology allows for increasing hard drive storage sizes.

3.2.4.1 Hard Drive Components

As can be seen in the picture below, the desktop hard drive consists of
the following components: the head actuator, read/write actuator arm,
read/write head, spindle, and platter. On the back of a hard drive is a
11
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

circuit board called the disk controller or interface board and is what
allows the hard drive to communicate with the computer.

Figure 1.5: Hard Drive Components

3.2.4.2 External and Internal Hard drives

Although most hard drives are internal, there are also stand-alone
devices called external hard drives, which can backup data on computers
and expand the available disk space. External drives are often stored in
an enclosure that helps protect the drive and allows it to interface with
the computer, usually over USB or eSATA.

Figure 1.6: Hard Drive

3.2.4.3 History of the hard drive

The first hard drive was introduced to the market by IBM on September
13, 1956. The hard drive was first used in the RAMAC 305 system, with
a storage capacity of 5 MB and a cost of about $50,000 ($10,000 per
megabyte). The hard drive was built-in to the computer and was not
removable. The first hard drive to have a storage capacity of one
gigabyte was also developed by IBM in 1980. It weighed 550 pounds
12
CIT 314 MODULE 1

and cost $40,000. 1983 marked the introduction of the first 3.5-inch size
hard drive, developed by Rodime. It had a storage capacity of 10 MB.
Seagate was the first company to introduce a 7200 RPM hard drive in
1992. Seagate also introduced the first 10,000 RPM hard drive in 1996
and the first 15,000 RPM hard drive in 2000. The first solid-state drive
(SSD) as we know them today was developed by SanDisk Corporation
in 1991, with a storage capacity of 20 MB. However, this was not a
flash-based SSD, which were introduced later in 1995 by M-Systems.
These drives did not require a battery to keep data stored on the memory
chips, making them a non-volatile storage medium.

3.2.5 CD-ROM Compact Disk/Read Only Memory (CD-ROM)

CD-ROM disks are made of reflective metals. CD-ROM is written


during the process of manufacturing by high power laser beam. Here the
storage density is very high, storage cost is very low and access time is
relatively fast. Each disk is approximately 4 1/2 inches in diameter and
can hold over 600 MB of data. As the CD-ROM can be read only we
cannot write or make changes into the data contained in it.

Figure 1.7: CD-Rom

3.2.5.1 Characteristics of the CD-ROM

 In PCs, the most commonly used optical storage technology is


called Compact Disk Read-Only Memory (CD-ROM).
 A standard CD-ROM disk can store up to 650 MB of data, or
about 70 minutes of audio.
 Once data is written to a standard CD-ROM disk, the data cannot
be altered or overwritten. CD‐ROM SPEEDS AND USES
Storage capacity 1 CD can store about 600 to 700 MB = 600 000
to 700 000 KB. For comparison, we should realize that a common
A4 sheet of paper can store an amount of information in the form
of printed characters that would require about 2 kB of space on a
13
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

computer. So one CD can store about the same amount of text


information equivalent as 300 000 of such A4 sheets. Yellow
Book standard
 The basic technology of CD-ROM remains the same as that for
CD audio, but CD-ROM requires greater data integrity, because a
corrupt bit that is not noticeable during audio playback becomes
intolerable with computer data.
 So CD-ROM (Yellow Book) dedicates more bits to error
detection and correction than CD audio (Red Book).
 Data is laid out in a format known as ISO 960. Advantages in
comparison with other information carriers
 The information density is high.
 The cost of information storage per information unit is low.
 The disks are easy to store, to transport and to mail.
 Random access to information is possible.

Advantages

 Easier access to a range of CD-ROMs.


 Ideally, access from the user’s own workstation in the office or at
home.
 Simultaneous access by several users to the same data.
 Better security avoids damage to discs and equipment.
 Less personnel time needed to provide disks to users.
 Automated, detailed registration of usage statistics to support the
management

Disadvantages

 Costs of the network software and computer hardware.


 Increased charges imposed by the information suppliers.
 Need for expensive, technical expertise to select, set up, manage,
and maintain the network system.
 Technical problems when the CD-ROM product is not designed
for use in the network.
 The network software component for the workstation side must
be installed on each microcomputer before this can be applied to
access the CD-ROM’s.

3.2.6 Other Optical Devices

An optical disk is made up of a rotating disk which is coated with a thin


reflective metal. To record data on the optical disk, a laser beam is
focused on the surface of the spinning disk. The laser beam is turned on
and off at varying rates! Due to this, tiny holes (pits) are burnt into the
14
CIT 314 MODULE 1

metal coating along the tracks. When data stored on the optical disk is to
be read, a less powerful laser beam is focused on the disk surface. The
storage capacity of these devices is tremendous; the Optical disk access
time is relatively fast. The biggest drawback of the optical disk is that it
is a permanent storage device. Data once written cannot be erased.
Therefore it is a read only storage medium. A typical example of the
optical disk is the CD-ROM.

1. Read-only memory (ROM) disks, like the audio CD, are used
for the distribution of standard program and data files. These are
mass-produced by mechanical pressing from a master die. The
information is actually stored as physical indentations on the
surface of the CD. Recently low-cost equipment has been
introduced in the market to make one-off CD-ROMs, putting
them into the next category.
2. Write-once read-many (WORM) disks: Some optical disks can
be recorded once. The information stored on the disk cannot be
changed or erased. Generally the disk has a thin reflective film
deposited on the surface. A strong laser beam is focused on
selected spots on the surface and pulsed. The energy melts the
film at that point, producing a nonreflective void. In the read
mode, a low power laser is directed at the disk and the bit
information is recovered by sensing the presence or absence of a
reflected beam from the disk.
3. Re-writeable, write-many read-many (WMRM) disks, just like
the magnetic storage disks, allows information to be recorded and
erased many times. Usually, there is a separate erase cycle
although this may be transparent to the user. Some modern
devices have this accomplished with one over-write cycle. These
devices are also called direct read-after-write (DRAW) disks.
4. WORM (write once, read many) is a data storage technology
that allows information to be written to a disc a single time and
prevents the drive from erasing the data. The discs are
intentionally not rewritable, because they are especially intended
to store data that the user does not want to erase accidentally.
Because of this feature, WORM devices have long been used for
the archival purposes of organizations such as government
agencies or large enterprises. A type of optical media, WORM
devices were developed in the late 1970s and have been adapted
to a number of different media. The discs have varied in size
from 5.25 to 14 inches wide, in varying formats ranging from
140MB to more than 3 GB per side of the (usually) double-sided
medium. Data is written to a WORM disc with a low- powered
laser that makes permanent marks on the surface. WORM (Write
Once, Read Many) storage had emerged in the late 1980s and
was popular with large institutions for the archiving of high
15
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

volume, sensitive data. When data is written to a WORM drive,


physical marks are made on the media surface by a low- powered
laser and since these marks are permanent, they cannot be erased.
Rewritable, or erasable, optical disk drives followed, providing
the same high capacities as those provided by WORM or CD-
ROM devices.
5. Erasable Optical Disk: An erasable optical disk is the one which
can be erased and then loaded with new data content all over
again. These generally come with a RW label. These are based on
a technology popularly known as Magnetic Optical which
involves the application of heat on a precise point on the disk
surface and magnetizing it using a laser. Magnetizing alters the
polarity of the point indicating data value ‘1’. Erasing too is
achieved by heating it with a high energy laser to a certain critical
level where the crystal polarity is reset to all 0’s. A variety of
optical disc, or type of external storage media, that allows the
deletion and rewriting of information, unlike a CD or CD-ROM,
which are read-only optical discs. An erasable optical disc allows
high-capacity storage (600 MB or more) and their durability has
made them useful for archival storage.
6. Touchscreen Optical Device: A touchscreen is an input and
output device normally layered on the top of an electronic visual
display of an information processing system. A user can give
input or control the information processing system through
simple or multi-touch gestures by touching the screen with a
special stylus or one or more fingers. Some touchscreens use
ordinary or specially coated gloves to work while others may
only work using a special stylus or pen. The user can use the
touchscreen to react to what is displayed and, if the software
allows, to control how it is displayed; for example, zooming to
increase the text size. The touchscreen enables the user to interact
directly with what is displayed, rather than using a mouse,
touchpad, or other such devices (other than a stylus, which is
optional for most modern touchscreens). Touchscreens are
common in devices such as game consoles, personal computers,
electronic voting machines, and point-of-sale (POS) systems.
They can also be attached to computers or, as terminals, to
networks. They play a prominent role in the design of digital
appliances such as personal digital assistants (PDAs) and some e-
readers.

There are two types of overlay-based touch screens:

 Capacitive Touch Technology – Capacitive touch screens take


advantage of the conductivity of the object to detect location of
touch. While they are durable and last for a long time, they can
16
CIT 314 MODULE 1

malfunction if they get wet. Their performance is also


compromised if a nonconductor like a gloved finger presses on
the screen. Most smart phones and tablets have capacitive touch
screens.
 Resistive Touch Technology – Resistive touch screens have
moving parts. There is an air gap between two layers of
transparent material. When the user applies pressure to the outer
layer, it touches the inner layer at specific locations. An electric
circuit is completed and the location can be determined. Though
they are cheaper to build compared to capacitive touch screens,
they are also less sensitive and can wear out quickly.

There are mainly three types of perimeter-based technologies:

 Infrared Touch Technology – This technology uses beams of


infrared lights to detect touch events.
 Surface Acoustic Wave Touch Technology – This type of touch
screen uses ultrasonic waves to detect touch events.
 Optical Touch Technology – This type of perimeter-based
technology uses optical sensors, mainly CMOS sensors to detect
touch events. All of these touch screen technologies can also be
integrated on top of a non-touch-based system like an ordinary
LCD and converted into Open Frame Touch Monitors.

3.3 MEMORY ACCESS METHODS

Data need to be accessed from the memory for various purposes. There
are several methods to access memory as listed below:

 Sequential access
 Direct access
 Random access
 Associative access

We will study about each of the access method one by one.

3.3.1 Sequential Access Method

In sequential memory access method, the memory is accessed in linear


sequential way. The time to access data in this type of method depends
on the location of the data.

17

Figure 1.8: Sequential Access Method


CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

3.3.2 Random Access Method


In random access method, data from any location of the memory can be
accessed randomly.
The access to any location is not related with its physical location and is

Figure 1.9: Random Access Method

independent of other locations. There is a separate access mechanism


for each location.

3.3.3 Direct Access Method


Direct access method can be seen as combination of sequential access
method and random access method. Magnetic hard disks contain many
rotating storage tracks. Here each tracks has its own read or write head
and the tracks can be accessed randomly. But access within each track is
sequential.

Example of direct access: Memory devices such as magnetic hard disks.

18
CIT 314 MODULE 1

Figure 1.10: Direct Access Method

1.3.4 Associative Access Method

Associative access method is a special type of random access method. It


enables comparison of desired bit locations within a word for a specific
match and to do this for all words simultaneously. Thus, based on
portion of word's content, word is retrieved rather than its address.
Example of associative access: Cache memory uses associative access
method.

3.4 MEMORY MAPPING AND VIRTUAL MEMORIES

Memory-mapping is a mechanism that maps a portion of a file, or an


entire file, on disk to a range of addresses within an application's address
space. The application can then access files on disk in the same way it
accesses dynamic memory. This makes file reads and writes faster in
comparison with using functions such as fread and fwrite.

3.4.1 Benefits of Memory-Mapping

The principal benefits of memory-mapping are efficiency, faster file


access, the ability to share memory between applications, and more
efficient coding.

3.4.1.1 Faster File Access

Accessing files via memory map is faster than using I/O functions such
as fread and fwrite. Data are read and written using the virtual memory
capabilities that are built in to the operating system rather than having to
allocate, copy into, and then deallocate data buffers owned by the
process does not access data from the disk when the map is first
constructed. It only reads or writes the file on disk when a specified part
of the memory map is accessed, and then it only reads that specific part.
This provides faster random access to the mapped data.

19
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

3.4.1.2 Efficiency

Mapping a file into memory allows access to data in the file as if that
data had been read into an array in the application's address space.
Initially, MATLAB only allocates address space for the array; it
does not actually read data from the file until you access the mapped region. As
a result, memory-mapped files provide a mechanism by which
applications can access data segments in an extremely large file without
having toread the entire file into memory first. Efficient Coding Style
Memory-mapping in your MATLAB application enables you to access file
data using standard MATLAB indexing operations.

3.4.2 VIRTUAL MEMORIES

Processes in a system share the CPU and main memory with other
processes. However, sharing the main memory poses some special
challenges. As demand on the CPU increases, processes slowdown in
some reasonably smooth way. But if too many processes need too much
memory, then some of them will simply not be able to run. When a
program is out of space, it is out of luck. Memory is also vulnerable to
corruption. If some process inadvertently writes to the memory used by
another process, that process might fail in some bewildering fashion
totally unrelated to the program logic. In order to manage memory more
efficiently and with fewer errors, modern systems provide an abstraction
of main memory known as virtual memory (VM). Virtual memory is an
elegant interaction of hardware exceptions, hardware address translation,
main memory, disk files, and kernel software that provides each process
with a large, uniform, and private address space. With one clean
mechanism, virtual memory provides three important capabilities.

 It uses main memory efficiently by treating it as a cache for an


address space stored on disk, keeping only the active areas in
main memory, and transferring data back and forth between disk
and memory as needed.
 It simplifies memory management by providing each process
with a uniform address space.
 It protects the address space of each process from corruption by
other processes.

Virtual memory is one of the great ideas in computer systems. A major


reason for its success is that it works silently and automatically, without
any intervention from the application programmer. Since virtual
memory works so well behind the scenes, why would a programmer
need to understand it? There are several reasons.
20
CIT 314 MODULE 1

• Virtual memory is central. Virtual memory pervades all levels of


computer systems, playing key roles in the design of hardware
exceptions, assemblers, linkers, loaders, shared objects, files, and
processes. Understanding virtual memory will help you better
understand how systems work in general.
• Virtual memory is powerful. Virtual memory gives applications
powerful capabilities to create and destroy chunks of memory, map
chunks of memory to portions of disk files, and share memory with
other processes. For example, did you know that you can read or
modify the contents of a disk file by reading and writing memory
locations? Or that you can load the contents of a file into memory
without doing any explicit copying? Understanding virtual memory
will help you harness its powerful capabilities in your applications.

3.4.2.1 VM as a Tool for Caching

Conceptually, a virtual memory is organized as an array of N contiguous


byte-sized cells stored on disk. Each byte has a unique virtual address
that serves as an index into the array. The contents of the array on disk
are cached in main memory. As with any other cache in the memory
hierarchy, the data on disk (the lower level) is partitioned into blocks
that serve as the transfer units between the disk and the main memory
(the upper level). VM systems handle this by partitioning the virtual
memory into fixed-sized blocks called virtual pages (VPs). Each virtual
page is P = 2p bytes in size. Similarly, physical memory is partitioned
into physical pages (PPs), also P bytes in size. (Physical pages are also
referred to as page frames.)

Figure 1.12: Memory as a Cache

21
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

3.4.2.2 Page Tables

As with any cache, the VM system must have some way to determine if
a virtual page is cached somewhere in DRAM. If so, the system must
determine which physical page it is cached in. If there is a miss, the
system must determine where the virtual page is stored on disk, select a
victim page in physical memory, and copy the virtual page from disk to
DRAM, replacing the victim page. These capabilities are provided by a
combination of operating system software, address translation hardware
in the MMU (memory management unit), and a data structure stored in
physical memory known as a page table that maps virtual pages to
physical pages. The address translation hardware reads the page table
each time it converts a virtual address to a physical address. The
operating system is responsible for maintaining the contents of the page
table and transferring pages back and forth between disk and DRAM

Figure 1.13: Page Table

Virtual memory was invented in the early 1960s, long before the
widening CPU-memory gap spawned SRAM caches. As a result, virtual
memory systems use a different terminology from SRAM caches, even
though many of the ideas are similar. In virtual memory parlance, blocks
are known as pages. The activity of transferring a page between disk and
memory is known as swapping or paging. Pages are swapped in (paged
in) from disk to DRAM, and swapped out (paged out) from DRAM to
disk. The strategy of waiting until the last moment to swap in a page,
when a miss occurs, is known as demand paging. Other approaches,
such as trying to predict misses and swap pages in before they are
actually referenced, are possible. However, all modern systems use
demand paging.

22
CIT 314 MODULE 1

3.4.2.3 VM as a Tool for Memory Protection

Any modern computer system must provide the means for the operating
system to control access to the memory system. A user process should
not be allowed to modify its read-only text section. Nor should it be
allowed to read or modify any of the code and data structures in the
kernel. It should not be allowed to read or write the private memory of
other processes, and it should not be allowed to modify any virtual pages
that are shared with other processes, unless all parties explicitly allow it
(via calls to explicit inter-process communication system calls).

3.4.2.4 Integrating Caches and VM

In any system that uses both virtual memory and SRAM caches, there is
the issue of whether to use virtual or physical addresses to access the
SRAM cache. Although a detailed discussion of the trade-offs is beyond
our scope here, most systems opt for physical addressing. With physical
addressing, it is straightforward for multiple processes to have blocks in
the cache at the same time and to share blocks from the same virtual
pages. Further, the cache does not have to deal with protection issues
because access rights are checked as part of the address translation
process.

3.4.2.5 Speeding up Address Translation with a TLB

As we have seen, every time the CPU generates a virtual address, the
MMU must refer to a PTE in order to translate the virtual address into a
physical address. In the worst case, this requires an additional fetch from
memory, at a cost of tens to hundreds of cycles. If the PTE happens to
be cached in L1, then the cost goes down to one or two cycles. However,
many systems try to eliminate even this cost by including a small cache
of PTEs in the MMU called a translation lookaside buffer (TLB). A
TLB is a small, virtually addressed cache where each line holds a block
consisting of a single PTE. A TLB usually has a high degree of
associativity

3.5 Replacement Algorithms

When a page fault occurs, the operating system has to choose a page to
remove from memory to make room for the page that has to be brought
in. If the page to be removed has been modified while in memory, it
must be rewritten to the disk to bring the disk copy up to date. If,
however, the page has not been changed (e.g., it contains program text),
the disk copy is already up to date, so no rewrite is needed. The page to
be read in just overwrites the page being evicted. While it would be
23
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

possible to pick a random page to evict at each page fault, system


performance is much better if a page that is not heavily used is chosen.
If a heavily used page is removed, it will probably have to be brought
back in quickly, resulting in extra overhead. Much work has been done
on the subject of page replacement algorithms, both theoretical and
experimental. Below we will describe some of the most important
algorithms. It is worth noting that the problem of ‘‘page replacement’’
occurs in other areas of computer design as well. For example, most
computers have one or more memory caches consisting of recently used
32-byte or 64-byte memory blocks. When the cache is full, some block
has to be chosen for removal. This problem is precisely the same as page
replacement except on a shorter time scale (it has to be done in a few
nanoseconds, not milliseconds as with page replacement). The reason
for the shorter time scale is that cache block misses are satisfied from
main memory, which has no seek time and no rotational latency. To
select the particular algorithm, the algorithm with lowest page fault rate
is considered.

 Optimal page replacement algorithm


 Not recently used page replacement
 First-In, First-Out page replacement
 Second chance page replacement
 Clock page replacement
 Least recently used page replacement

3.5.1 The Optimal Page Replacement Algorithm

The best possible page replacement algorithm is easy to describe but


impossible to implement. It goes like this. At the moment that a page
fault occurs, some set of pages is in memory. One of these pages will be
referenced on the very next instruction (the page containing that
instruction). Other pages may not be referenced until 10, 100, or perhaps
1000 instructions later. Each page can be labeled with the number of
instructions that will be executed before that page is first referenced.
The optimal page algorithm simply says that the page with the highest
label should be removed. If one page will not be used for 8 million
instructions and another page will not be used for 6 million instructions,
removing the former pushes the page fault that will fetch it back as far
into the future as possible.

3.5.2 The Not Recently Used Page Replacement


Algorithm

In order to allow the operating system to collect useful statistics about


which pages are being used and which ones are not, most computers
24
CIT 314 MODULE 1

with virtual memory have two status bits associated with each page. R is
set whenever the page is referenced (read or written). M is set when the
page is written to (i.e., modified). The bits are contained in each page
table entry. It is important to realize that these bits must be updated on
every memory reference, so it is essential that they be set by the
hardware. Once a bit has been set to 1, it stays 1 until the operating
system resets it to 0 in software. If the hardware does not have these
bits, they can be simulated as follows. When a process is started up, all
of its page table entries are marked as not in memory. As soon as any
page is referenced, a page fault will occur. The operating system then
sets the R bit (in its internal tables), changes the page table entry to point
to the correct page, with mode READ ONLY, and restarts the
instruction. If the page is subsequently written on, another page fault
will occur, allowing the operating system to set the M bit and change the
page’s mode to READ/WRITE. The R and M bits can be used to build a
simple paging algorithm as follows. When a process is started up, both
page bits for all its pages are set to 0 by the operating system.
Periodically (e.g., on each clock interrupt), the R bit is cleared, to
distinguish pages that have not been referenced recently from those that
have been. When a page fault occurs, the operating system inspects all
the pages and divides them into four categories based on the current
values of their R and M bits:

• Class 0: not referenced, not modified.


• Class 1: not referenced, modified.
• Class 2: referenced, not modified.
• Class 3: referenced, modified.

3.5.3 The First-In, First-Out (FIFO) Page Replacement


Algorithm

Another low-overhead paging algorithm is the First-In, First-Out (FIFO)


algorithm. To illustrate how this works, consider a supermarket that has
enough shelves to display exactly k different products. One day, some
company introduces a new convenience food—instant, freeze-dried,
organic yogurt that can be reconstituted in a microwave oven. It is an
immediate success, so our finite supermarket has to get rid of one old
product in order to stock it. One possibility is to find the product that the
supermarket has been stocking the longest (i.e., something it began
selling 120 years ago) and get rid of it on the grounds that no one is
interested any more. In effect, the supermarket maintains a linked list of
all the products it currently sells in the order they were introduced. The
new one goes on the back of the list; the one at the front of the list is
dropped. As a page replacement algorithm, the same idea is applicable.
The operating system maintains a list of all pages currently in memory,
with the page at the head of the list the oldest one and the page at the tail
25
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

the most recent arrival. On a page fault, the page at the head is removed
and the new page added to the tail of the list. When applied to stores,
FIFO might remove mustache wax, but it might also remove flour, salt,
or butter. When applied to computers the same problem arises. For this
reason, FIFO in its pure form is rarely used.

3.5.4 The Second Chance Page Replacement Algorithm

A simple modification to FIFO that avoids the problem of throwing out


a heavily used page is to inspect the R bit of the oldest page. If it is 0,
the page is both old and unused, so it is replaced immediately. If the R
bit is 1, the bit is cleared, the page is put onto the end of the list of pages,
and its load time is updated as though it had just arrived in memory.
Then the search continues.

3.5.5 The Clock Page Replacement Algorithm

Although second chance is a reasonable algorithm, it is unnecessarily


inefficient because it is constantly moving pages around on its list. A
better approach is to keep all the page frames on a circular list in the
form of a clock.

Figure 1.14: The Clock Replacement Algorithm

When a page fault occurs, the page being pointed to by the hand is
inspected. If its R bit is 0, the page is evicted, the new page is inserted
into the clock in its place, and the hand is advanced one position. If R is
1, it is cleared and the hand is advanced to the next page. This process is
repeated until a page is found with R = 0. Not surprisingly, this
algorithm is called clock. It differs from second chance only in the
implementation.

26
CIT 314 MODULE 1

3.5.6 The Least Recently Used (LRU) Page Replacement


Algorithm

A good approximation to the optimal algorithm is based on the


observation that pages that have been heavily used in the last few
instructions will probably be heavily used again in the next few.
Conversely, pages that have not been used for ages will probably remain
unused for a long time. This idea suggests a realizable algorithm: when a
page fault occurs, throw out the page that has been unused for the
longest time. This strategy is called LRU (Least Recently Used) paging.
Although LRU is theoretically realizable, it is not cheap. To fully
implement LRU, it is necessary to maintain a linked list of all pages in
memory, with the most recently used page at the front and the least
recently used page at the rear. The difficulty is that the list must be
updated on every memory reference. Finding a page in the list, deleting
it, and then moving it to the front is a very time-consuming operation,
even in hardware (assuming that such hardware could be built).

3.6 DATA TRANSFER MODES

The DMA mode of data transfer reduces CPU’s overhead in handling


I/O operations. It also allows parallelism in CPU and I/O operations.
Such parallelism is necessary to avoid wastage of valuable CPU time
while handling I/O devices whose speeds are much slower as compared
to CPU. The concept of DMA operation can be extended to relieve the
CPU further from getting involved with the execution of I/O operations.
This gives rises to the development of special purpose processor called
Input-Output Processor (IOP) or IO channel. The Input Output
Processor (IOP) is just like a CPU that handles the details of I/O
operations. It is more equipped with facilities than those are available in
typical DMA controller.

Figure 1.15: The Block Diagram

The IOP can fetch and execute its own instructions that are specifically
designed to characterize I/O transfers. In addition to the I/O – related
27
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

tasks, it can perform other processing tasks like arithmetic, logic, and
branching and code translation. The main memory unit takes the pivotal
role. It communicates with processor by the means of DMA.

The Input Output Processor is a specialized processor which loads and


stores data into memory along with the execution of I/O instructions. It
acts as an interface between system and devices. It involves a sequence
of events to executing I/O operations and then store the results into the
memory.

3.6.1 Advantages

• The I/O devices can directly access the main memory without the
intervention by the processor in I/O processor-based systems.
• It is used to address the problems that are arises in Direct memory
access method.

3.6.2 Modes of Transfer

The binary information that is received from an external device is


usually stored in the memory unit. The information that is transferred
from the CPU to the external device is originated from the memory unit.
CPU merely processes the information but the source and target is
always the memory unit. Data transfer between CPU and the I/O devices
may be done in different modes.

Data transfer to and from the peripherals may be done in any of the
three possible ways.

 Programmed I/O.
 Interrupt- initiated I/O.
 Direct memory access (DMA).

1. Programmed I/O: It is due to the result of the I/O instructions


that are written in the computer program. Each data item transfer
is initiated by an instruction in the program. Usually, the transfer
is from a CPU register and memory. In this case it requires
constant monitoring by the CPU of the peripheral devices.

Example of Programmed I/O: In this case, the I/O device does not
have direct access to the memory unit. A transfer from I/O device to
memory requires the execution of several instructions by the CPU,
including an input instruction to transfer the data from device to the CPU
and store instruction to transfer the data from CPU to memory. In
programmed I/O, the CPU stays in the program loop until the I/O unit
28
CIT 314 MODULE 1

indicates that it is ready for data transfer. This is a time consuming


process since it needlessly keeps the CPU busy. This situation can be
avoided by using an interrupt facility. This is discussed below.

2. Interrupt- initiated I/O: Since in the above case we saw the


CPU is kept busy unnecessarily. This situation can very well be
avoided by using an interrupt driven method for data transfer. By
using interrupt facility and special commands to inform the
interface to issue an interrupt request signal whenever data is
available from any device. In the meantime the CPU can proceed
for any other program execution. The interface meanwhile keeps
monitoring the device. Whenever it is determined that the device
is ready for data transfer it initiates an interrupt request signal to
the computer. Upon detection of an external interrupt signal the
CPU stops momentarily the task that it was already performing,
branches to the service program to process the I/O transfer, and
then return to the task it was originally performing.

Note:Both the methods programmed I/O and Interrupt-driven I/O


require the active intervention of the processor to transfer
data between memory and the I/O module, and any data
transfer must transverse a path through the processor. Thus,
both these forms of I/O suffer from two inherent drawbacks.

• The I/O transfer rate is limited by the speed with which the
processor can test and service a device.

• The processor is tied up in managing an I/O transfer; a


number of instructions must be executed for each I/O
transfer.

3. Direct Memory Access: The data transfer between a fast storage


media such as magnetic disk and memory unit is limited by the
speed of the CPU. Thus we can allow the peripherals directly
communicate with each other using the memory buses, removing
the intervention of the CPU. This type of data transfer technique is
known as DMA or direct memory access. During DMA the CPU is
idle and it has no control over the memory buses. The DMA
controller takes over the buses to manage the transfer directly
between the I/O devices and the memory unit.
29
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Figure 1.16: Control lines for DMA

 Bus Request: It is used by the DMA controller to request the


CPU to relinquish the control of the buses.
 Bus Grant: It is activated by the CPU to Inform the external
DMA controller that the buses are in high impedance state and the
requesting DMA can take control of the buses. Once the DMA
has taken the control of the buses it transfers the data. This
transfer can take place in many ways.

3.7 PARALLEL PROCESSING

The quest for higher-performance digital computers seems unending. In


the past two decades, the performance of microprocessors has enjoyed
an exponential growth. The growth of microprocessor
speed/performance by a factor of 2 every 18 months (or about 60% per
year) is known as Moore’s law.

This growth is the result of a combination of two factors:

 Increase in complexity (related both to higher device density and to


larger size) of VLSI chips, projected to rise to around 10 M
transistors per chip for microprocessors, and 1B for dynamic
random-access memories (DRAMs), by the year 2000
 Introduction of, and improvements in, architectural features such as
on-chip cache memories, large instruction buffers, multiple
instruction issue per cycle, multithreading, deep pipelines, out-of-
order instruction execution, and branch prediction.

The motivations for parallel processing can be summarized as follows:

1. Higher speed, or solving problems faster. This is important when


applications have “hard” or “soft” deadlines. For example, we
have at most a few hours of computation time to do 24-hour
weather forecasting or to produce timely tornado warnings.
30
CIT 314 MODULE 1

2. Higher throughput, or solving more instances of given problems.


This is important when many similar tasks must be performed.
For example, banks and airlines, among others, use transaction
processing systems that handle large volumes of data.

3. Higher computational power, or solving larger problems. This


would allow us to use very detailed, and thus more accurate,
models or to carry out simulation runs for longer periods of time
(e.g., 5-day, as opposed to 24-hour, weather forecasting).

All three aspects above are captured by a figure-of-merit often used in


connection with parallel processors: the computation speed-up factor
with respect to a uniprocessor. The ultimate efficiency in parallel
systems is to achieve a computation speed-up factor of p with p
processors. Although in many cases this ideal cannot be achieved, some
speed-up is generally possible. The actual gain in speed depends on the
architecture used for the system and the algorithm run on it. Of course,
for a task that is (virtually) impossible to perform on a single processor
in view of its excessive running time, the computation speed-up factor
can rightly be taken to be larger than p or even infinite. This situation,
which is the analogue of several men moving a heavy piece of
machinery or furniture in a few minutes, whereas one of them could not
move it at all, is sometimes referred to as parallel synergy.

A major issue in devising a parallel algorithm for a given problem is the


way in which the computational load is divided between the multiple
processors. The most efficient scheme often depends both on the
problem and on the parallel machine’s architecture.

Example

Consider the problem of constructing the list of all prime numbers in the
interval [1, n] for a given integer n > 0. A simple algorithm that can be
used for this computation is the sieve of Eratosthenes. Start with the list
of numbers 1, 2, 3, 4, ... , n represented as a “mark” bit-vector initialized
to 1000 . . . 00. In each step, the next unmarked number m (associated
with a 0 in element m of the mark bit-vector) is a prime. Find this
element m and mark all multiples of m beginning with m². When m² > n,
the computation stops and all unmarked elements are prime numbers.
The computation steps for n = 30 are shown in the figure below

31
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Figure 3.17: The Block Diagram

3.7.1 PARALLEL PROCESSING UPS AND DOWNS

L. F. Richardson, a British meteorologist, was the first person to attempt


to forecast the weather using numerical computations. He started to
formulate his method during the First World War while serving in the
army ambulance corps. He estimated that predicting the weather for a
24-hour period would require 64,000 slow “computers” (humans +
mechanical calculators) and even then, the forecast would take 12 hours
to complete. He had the following idea or dream:

Imagine a large hall like a theater. The walls of this chamber are
painted to form a map of the globe. A myriad of computers at work upon
the weather on the part of the map where each sits, but each computer
attends to only one equation or part of an equation. The work of each
region is coordinated by an official of higher rank. Numerous little
‘night signs’ display the instantaneous values so that neighbouring
computers can read them. One of [the conductor’s] duties is to maintain
a uniform speed of progress in all parts of the globe. But instead of
waving a baton, he turns a beam of rosy light upon any region that is
running ahead of the rest, and a beam of blue light upon those that are
behindhand.

3.7.2 Types of Parallelism: A Taxonomy


Parallel computers can be divided into two main categories of control
flow and data flow. Control-flow parallel computers are essentially
based on the same principles as the sequential or von Neumann
computer, except that multiple instructions can be executed at any given
time. Data-flow parallel computers, sometimes referred to as “non-von
Neumann,” are completely different in that they have no pointer to
active instruction(s) or a locus of control. The control is totally
distributed, with the availability of operands triggering the activation of
instructions.
32
CIT 314 MODULE 1

Figure 3.18: Pictorial Representation of Richardsons example

In 1966, M. J. Flynn proposed a four-way classification of computer


systems based on the notions of instruction streams and data streams.
Flynn’s classification has become standard and is widely used. Flynn
coined the abbreviations SISD, SIMD, MISD, and MIMD (pronounced
“sis-dee,” “sim-dee,” and so forth) for the four classes of computers
shown in Fig. 1.7.3, based on the number of instruction streams (single
or multiple) and data streams (single or multiple). The SISD class
represents ordinary “uniprocessor” machines. Computers in the SIMD
class, with several processors directed by instructions issued from a
central control unit, are sometimes characterized as “array processors.”
Machines in the MISD category have not found widespread application,
but one can view them as generalized pipelines in which each stage
performs a relatively complex operation (as opposed to ordinary
pipelines found in modern processors where each stage does a very
simple instruction-level operation).

The MIMD category includes a wide class of computers. For this reason,
in 1988, E. E. Johnson proposed a further classification of such
machines based on their memory structure (global or distributed) and the
mechanism used for communication/synchronization (shared variables
or message passing). Again, one of the four categories (GMMP) is not
widely used. The GMSV class is what is loosely referred to as (shared-
memory) multiprocessors.

33
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Figure 1.19: Classes of Computer according to Flynn

At the other extreme, the DMMP class is known as (distributed-


memory) multi-computers. Finally, the DMSV class, which is becoming
popular in view of combining the implementation ease of distributed
memory with the programming ease of the shared-variable scheme, is
sometimes called distributed shared memory. When all processors in a
MIMD-type machine execute the same program, the result is sometimes
referred to as single-program multipledata [SPMD (spim-dee)].
Although the Figigure lumps all SIMD machines together, there are in
fact variations similar to those suggested above for MIMD machines. At
least conceptually, there can be shared-memory and distributed-memory
SIMD machines in which the processors communicate by means of
shared variables or explicit message passing. Anecdote. The Flynn–
Johnson classification of Figure contains eight four-letter abbreviations.
There are many other such abbreviations and acronyms in parallel
processing, examples being CISC, NUMA, PRAM, RISC, and VLIW.
Even our journals (JPDC, TPDS) and conferences (ICPP, IPPS, SPDP,
SPAA) have not escaped this fascination with four-letter abbreviations.
The author has a theory that an individual cannot be considered a
successful computer architect until she or he has coined at least one, and
preferably a group of two or four, such abbreviations! Toward this end,
the author coined the acronyms SINC and FINC (Scant/Full Interaction
Network Cell) as the communication network counterparts to the
popular RISC/CISC dichotomy. Alas, the use of these acronyms is not
yet as widespread as that of RISC/CISC. In fact, they are not used at all.

3.7.3 Roadblocks to Parallel Computing

Over the years, the enthusiasm of parallel computer designers and


researchers has been counteracted by many objections and cautionary
statements. The list begins with the less serious, or obsolete, objections
and ends with Amdahl’s law, which perhaps constitutes the most
important challenge facing parallel computer designers and users.
34
CIT 314 MODULE 1

1. Grosch’s law (economy of scale applies, or computing power is


proportional to the square of cost). If this law did in fact hold,
investing money in p processors would be foolish as a single
computer with the same total cost could offer p² times the
performance of one such processor. Grosch’s law was formulated
in the days of giant mainframes and actually did hold for those
machines. In the early days of parallel processing, it was offered
as an argument against the cost-effectiveness of parallel
machines. However, we can now safely retire this law, as we can
buy more MFLOPS computing power per dollar by spending on
micros rather than on supers. Note that even if this law did hold,
one could counter that there is only one “fastest” single-processor
computer and it has a certain price; you cannot get a more
powerful one by spending more.
2. Minsky’s conjecture (speed-up is proportional to the logarithm of
the number p of processors). This conjecture has its roots in an
analysis of data access conflicts assuming random distribution of
addresses. These conflicts will slow everything down to the point
that quadrupling the number of processors only doubles the
performance. However, data access patterns in real applications
are far from random. Most applications have a pleasant amount of
data access regularity and locality that help improve the
performance. One might say that the log p speed-up rule is one
side of the coin that has the perfect speed-up p on the flip side.
Depending on the application, real speed-up can range from log p
to p (p/log p being a reasonable middle ground).
3. The tyranny of IC technology (because hardware becomes about
10 times faster every 5 years, by the time a parallel machine with
10-fold performance is designed and implemented, uniprocessors
will be just as fast). This objection might be valid for some
special-purpose systems that must be built from scratch with
“old” technology. Recent experience in parallel machine design
has shown that off-theshelf components can be used in
synthesizing massively parallel computers. If the design of the
parallel processor is such that faster microprocessors can simply
be plugged in as they become available, they too benefit from
advancements in IC technology. Besides, why restrict our
attention to parallel systems that are designed to be only 10 times
faster rather than 100 or 1000 times?
4. The tyranny of vector supercomputers (vector supercomputers,
built by Cray, Fujitsu, and other companies, are rapidly
improving in performance and additionally offer a familiar
programming model and excellent vectorizing compilers; why
bother with parallel processors?). Besides, not all
computationally intensive applications deal with vectors or
matrices; some are in fact quite irregular. Note, also, that vector
35
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

and parallel processing are complementary approaches. Most


current vector supercomputers do in fact come in multiprocessor
configurations for increased performance.
5. The software inertia (billions of dollars’ worth of existing
software makes it hard to switch to parallel systems; the cost of
converting the “dusty decks” to parallel programs and retraining
the programmers is prohibitive). This objection is valid in the
short term; however, not all programs needed in the future have
already been written. New applications will be developed and
many new problems will become solvable with increased
performance. Students are already being trained to think parallel.
Additionally, tools are being developed to transform sequential
code into parallel code automatically. In fact, it has been argued
that it might be prudent to develop programs in parallel languages
even if they are to be run on sequential computers. The added
information about concurrency and data dependencies would
allow the sequential computer to improve its performance by
instruction prefetching, data caching, and so forth.

3.8 PIPELINING

There exist two basic techniques to increase the instruction execution


rate of a processor. These are to increase the clock rate, thus decreasing
the instruction execution time, or alternatively to increase the number of
instructions that can be executed simultaneously. Pipelining and
instruction-level parallelism are examples of the latter technique.
Pipelining owes its origin to car assembly lines. The idea is to have
more than one instruction being processed by the processor at the same
time. Similar to the assembly line, the success of a pipeline depends
upon dividing the execution of an instruction among a number of
subunits (stages), each performing part of the required operations. A
possible division is to consider instruction fetch (F), instruction decode
(D), operand fetch (F), instruction execution (E), and store of results (S)
as the subtasks needed for the execution of an instruction. In this case, it
is possible to have up to five instructions in the pipeline at the same
time, thus reducing instruction execution latency.

Pipeline system is like the modern day assembly line setup in factories.
For example in a car manufacturing industry, huge assembly lines are
setup and at each point, there are robotic arms to perform a certain task,
and then the car moves on ahead to the next arm.

36
CIT 314 MODULE 1

Types of Pipeline:
It is divided into 2 categories:

 Arithmetic Pipeline- Arithmetic pipelines are usually found in


most of the computers. They are used for floating point
operations, multiplication of fixed point numbers etc.
 Instruction Pipeline- In this a stream of instructions can be
executed by overlapping fetch, decode and execute phases of an
instruction cycle. This type of technique is used to increase the
throughput of the computer system. An instruction pipeline reads
instruction from the memory while previous instructions are
being executed in other segments of the pipeline. Thus we can
execute multiple instructions simultaneously. The pipeline will be
more efficient if the instruction cycle is divided into segments of
equal duration.

 Pipeline Conflicts

There are some factors that cause the pipeline to deviate its normal
performance. Some of these factors are given below:

 Timing Variations: All stages cannot take same amount of time.


This problem generally occurs in instruction processing where
different instructions have different operand requirements and
thus different processing time.

 Data Hazards: When several instructions are in partial


execution, and if they reference same data then the problem
arises. We must ensure that next instruction does not attempt to
access data before the current instruction, because this will lead
to incorrect results. Branching In order to fetch and execute the
next instruction, we must know what that instruction is. If the
present instruction is a conditional branch, and its result will lead
us to the next instruction, then the next instruction may not be
known until the current one is processed.

 Interrupts: Interrupts set unwanted instruction into the


instruction stream. Interrupts effect the execution of instruction.

 Data Dependency: It arises when an instruction depends upon


the result of a previous instruction but this result is not yet
available.

37
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

Advantages of Pipelining

 The cycle time of the processor is reduced.


 It increases the throughput of the system
 It makes the system reliable.

Disadvantages of Pipelining

 The design of pipelined processor is complex and costly to


manufacture.
 The instruction latency is more.

Pipelining refers to the technique in which a given task is divided into a


number of subtasks that need to be performed in sequence. Each subtask
is performed by a given functional unit. The units are connected in a
serial fashion and all of them operate simultaneously. The use of
pipelining improves the performance compared to the traditional
sequential execution of tasks. Figure 3.20 shows an illustration of the
basic difference between executing four subtasks of a given instruction
(in this case fetching F, decoding D, execution E, and writing the results
W) using pipelining and sequential processing.

Figure 3.20: Pictorial Representation of a simple Pipelining Example

It is clear from the figure that the total time required to process three
instructions (I1, I2, I3) is only six time units if four-stage pipelining is
used as compared to 12 time units if sequential processing is used. A
possible saving of up to 50% in the execution time of these three
instructions is obtained. In order to formulate some performance
measures for the goodness of a pipeline in processing a series of tasks, a
space time chart (called the Gantt’s chart) is used.

As can be seen from the figure 3.20, 13 time units are needed to finish
executing 10 instructions (I1 to I10). This is to be compared to 40 time
units if sequential processing is used (ten instructions each requiring
four time units).
38
CIT 314 MODULE 1

4.0 SELF-ASSESSMENT EXERCISES

1. Differentiate between the types of pipelines available.


2. What are the types of parallel computing?
3. Consider the execution of 500 instructions on a five-stage
pipeline machine. Compute the speed-up due to the use of
pipelining given that the probability of an instruction being a
branch is p = 0.3? What must be the value of p and the expected
number of branch instructions such that a speed-up of at least 4 is
possible? What must be the value of p such that a speed-up of at
least 5 is possible? Assume that each stage takes one cycle to
perform its task.
4. What is the average instruction processing time of a five-stage
instruction pipeline for 36 instructions if conditional branch
instructions occur as follows: I5, I7, I10, I25, I27. Use both the
space–time chart and the analytical model.

TUTOR MARKED ASSIGNMENTS

1. Parallelism in everyday life. Discuss the various forms of


parallelism used to speed up the following processes:
 Student registration at a university.
 Shopping at a supermarket.
 Taking an elevator in a high-rise building
2. A computer system has a three-stage pipeline consisting of a
Fetch unit (F), a Decode unit (D), and an Execute (E) unit.
Determine (using the space–time chart) the time required to
execute 20 sequential instructions using two-way interleaved
memory if all three units require the use of the memory
simultaneously.
3. List and explain various pipeline conflicts that exist.
4. What are the roadblocks to parallel processing?
5. Show understanding by explaining the types of replacement
algorithms available.

4.0 CONCLUSION

Computer memory is central to the operation of a modern computer


system; it stores data or program instructions on a temporary or
permanent basis for use in a computer. However, there is an increasing
gap between the speed of memory and the speed of microprocessors. In
this paper, various memory management and optimization techniques
are reviewed to reduce the gap, including the hardware designs of the
memory organization such as memory hierarchical structure and cache
design; the memory management techniques varying from replacement
39
CIT314 COMPUTER ARCHITECTURE AND ORGANIZATION II

algorithms to optimization techniques; and virtual memory strategies


from a primitive bare-machine approach to paging and segmentation
strategies.

5.0 SUMMARY

This unit studied the memory system of a computer, starting with the
organisation of its main memory, which, in some simple systems, is the
only form of data storage to the understanding of more complex systems
and the additional components they carry. Cache systems, which aim at
speeding up access to the primary storage were also studied, and there
was a greater focus on virtual memory systems, which make possible the
transparent use of secondary storage as if it was main memory, by the
processor.

7.0 REFERENCES/FURTHER READING

Fundamentals of Computer Organization and Architecture, by M. Abd-


El-Barr and H. El-Rewini ISBN 0-471-46741-3 Copyright # 2005
John Wiley & Sons, Inc. URL:
https://engineering.futureuniversity.com/BOOKS%20FOR%20IT/
%5BMostafa_Abd-El-Barr__Hesham_El-
Rewini%5D_Fundamenta(BookZZ.org).pdf.

Stone, H. S., High-Performance Computer Architecture, Addison–


Wesley, 1993. URL: https://www.abebooks.com/book-
search/title/high-performance-computer-architecture/author/harold-
stone/

IEEE Trans. Computers, journal published by IEEE Computer Society;


has occasional special issues on parallel and distributed processing
(April 1987, December 1988, August 1989, December 1991, April
1997, April 1998).
http://link.springer.com/content/pdf/bfm%3A978-0-306-46964-
0%2F1.pdf

Varma, A., and C. S. Raghavendra, Interconnection Networks for


Multiprocessors and Multicomputers: Theory and Practice, IEEE
Computer Society Press, 1994. URL:
https://books.google.com/books/about/Interconnection_Networks_
for_Multiproces.html?id=-1u7QgAACAAJ

Zomaya, A. Y. (ed.), Parallel and Distributed Computing Handbook,


McGraw-Hill, 1996. URL: https://research-
repository.uwa.edu.au/en/publications/parallel-and-distributed-
computing-handbook
40

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy