0% found this document useful (0 votes)
19 views49 pages

Computer Organization UNIT5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views49 pages

Computer Organization UNIT5

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Pipeline Crossbar

Timeshared

Register Multistage
Processor

Cache Multiport
Memory

Coherence Inter-Process
Structure

Parallel Interconnection
Synchronization

Vector Switch
Communication

Processing Instruction
Computer Organization | RISC and CISC

 Reduced instruction set architecture (RISC )

The main idea behind this is to make hardware simpler by using an instruction set composed of a few basic
steps for loading, evaluating, and storing operations just like a load command will load data, a store
command will store the data.
 Complex instruction set architecture (cisc)

The main idea is that a single instruction will do all loading, evaluating, and storing operations just
like a multiplication command will do stuff like loading data, evaluating, and storing it, hence it’s
complex.

Both approaches try to increase the cpu performance


Risc: reduce the cycles per instruction at the cost of the number of instructions per program.

Cisc: the cisc approach attempts to minimize the number of instructions per program but at the cost
of an increase in the number of cycles per instruction.
Simpler instruction, hence simple instruction decoding.

Instruction comes undersize of one word.

Characteristic Instruction takes a single clock cycle to get executed.

of RISC More general-purpose registers.

Simple Addressing Modes.

Fewer Data types.

A pipeline can be achieved.


Complex instruction, hence complex instruction decoding.

Instructions are larger than one-word size.

Characteristic Instruction may take more than a single clock cycle to get
executed.
of CISC
Less number of general-purpose registers as operations get
performed in memory itself.

Complex Addressing Modes.

More Data types.


RISC CISC
Focus on software Focus on hardware
Uses both hardwired and microprogrammed
Uses only Hardwired control unit
control unit
Transistors are used for storing complex
Transistors are used for more registers
Instructions
Fixed sized instructions Variable sized instructions
Can perform only Register to Register Arithmetic Can perform REG to REG or REG to MEM or
operations MEM to MEM
Requires more number of registers Requires less number of registers
Code size is large Code size is small

Differences An instruction executed in a single clock cycle


An instruction fit in one word.
Instruction takes more than one clock cycle
Instructions are larger than the size of one word
Simple and limited addressing modes. Complex and more addressing modes.
RISC is Reduced Instruction Cycle. CISC is Complex Instruction Cycle.
The number of instructions are less as compared to The number of instructions are more as compared
CISC. to RISC.
It consumes the low power. It consumes more/high power.
RISC is highly pipelined. CISC is less pipelined.
RISC required more RAM. CISC required less RAM.
Here, Addressing modes are less. Here, Addressing modes are more.
Advantages and Disadvantages
Advantages of RISC:

Simpler instructions: RISC processors use a smaller set of simple instructions, which
makes them easier to decode and execute quickly. This results in faster processing times.
Faster execution: Because RISC processors have a simpler instruction set, they can
execute instructions faster than CISC processors.
Lower power consumption: RISC processors consume less power than CISC processors,
making them ideal for portable devices.

Disadvantages of RISC:

More instructions required: RISC processors require more instructions to perform complex
tasks than CISC processors.
Increased memory usage: RISC processors require more memory to store the additional
instructions needed to perform complex tasks.
Higher cost: Developing and manufacturing RISC processors can be more expensive than
CISC processors.
Advantages and Disadvantages
Advantages of CISC:

Reduced code size: CISC processors use complex instructions that can perform multiple
operations, reducing the amount of code needed to perform a task.
More memory efficient: Because CISC instructions are more complex, they require fewer
instructions to perform complex tasks, which can result in more memory-efficient code.
Widely used: CISC processors have been in use for a longer time than RISC processors, so
they have a larger user base and more available software.

Disadvantages of CISC:

Slower execution: CISC processors take longer to execute instructions because they have
more complex instructions and need more time to decode them.
More complex design: CISC processors have more complex instruction sets, which makes
them more difficult to design and manufacture.
Higher power consumption: CISC processors consume more power than RISC processors
because of their more complex instruction sets.
What is Parallel Processing ?

 For the purpose of increasing the computational speed of computer system, the term ‘parallel
processing‘ employed to give simultaneous data-processing operations is used to represent a large
class. In addition, a parallel processing system is capable of concurrent data processing to achieve
faster execution times.
 As an example, the next instruction can be read from memory, while an instruction is being
executed in ALU. The system can have two or more ALUs and be able to execute two or more
instructions at the same time. In addition, two or more processing is also used to speed up computer
processing capacity and increases with parallel processing, and with it, the cost of the system
increases. But, technological development has reduced hardware costs to the point where parallel
processing methods are economically possible.
Parallel processing derives from multiple levels of complexity.
It is distinguished between parallel and serial operations by the type of registers used at the lowest level.
Shift registers work one bit at a time in a serial fashion, while parallel registers work simultaneously with all bits of
simultaneously with all bits of the word.
At high levels of complexity, parallel processing derives from having a plurality of functional units that perform
separate or similar operations simultaneously.
By distributing data among several functional units, parallel processing is installed.

As an example, arithmetic, shift and logic operations can be divided into three units and operations are transformed into
a teach unit under the supervision of a control unit.
One possible method of dividing the execution unit into eight functional units operating in parallel is shown in figure.
Depending on the operation specified by the instruction, operands in the registers are transferred to one of the units,
associated with the operands. In each functional unit, the operation performed is denoted in each block of the diagram.
The arithmetic operations with integer numbers are performed by the adder and integer multiplier.
 Floating-point operations can be divided into three circuits operating in parallel. Logic, shift, and increment
operations are performed concurrently on different data. All units are independent of each other, therefore one
number is shifted while another number is being incremented. Generally, a multi-functional organization is
associated with a complex control unit to coordinate all the activities between the several components.

 The main advantage of parallel processing is that it provides better utilization of system resources by increasing
resource multiplicity which overall system throughput.
Pipelining : Type 1

 To improve the performance of a CPU we have two options: 1) Improve the hardware by introducing faster circuits. 2) Arrange the
hardware such that more than one operation can be performed at the same time. Since there is a limit on the speed of hardware and
the cost of faster circuits is quite high, we have to adopt the 2 nd option.
 Pipelining is a process of arrangement of hardware elements of the CPU such that its overall performance is increased.
Simultaneous execution of more than one instruction takes place in a pipelined processor.

 Design of a basic pipeline


 In a pipelined processor, a pipeline has two ends, the input end and the output end. Between these ends, there are
multiple stages/segments such that the output of one stage is connected to the input of the next stage and each stage
performs a specific operation.
 Interface registers are used to hold the intermediate output between two stages. These interface registers are also
called latch or buffer.
 All the stages in the pipeline along with the interface registers are controlled by a common clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined processor can be visualized
using a space-time diagram. For example, consider a processor having 4 stages and let there be 2 instructions to be
executed. We can visualize the execution sequence through the following space-time diagrams:

Non-overlapped execution: Overlapped


execution:
Stage
/ Stage /
Cycl Cycle 1 2 3 4 5
e 1 2 3 4 5 6 7 8
S1 I1 I2
S1 I1 I2
S2 I1 I2
S2 I1 I2

S3 I1 I2
S3 I1 I2

S4 I1 I2
S4 I1 I2

Total time = 8 Cycle Total time = 5 Cycle


 Pipeline Stages RISC processor has 5 stage instruction pipeline to execute all the instructions in the RISC instruction set.
Following are the 5 stages of the RISC pipeline with their respective operations:
 Stage 1 (Instruction Fetch) In this stage the CPU reads instructions from the address in the memory whose value is present in
the program counter.
 Stage 2 (Instruction Decode) In this stage, instruction is decoded and the register file is accessed to get the values from the
registers used in the instruction.
 Stage 3 (Instruction Execute) In this stage, ALU operations are performed.
 Stage 4 (Memory Access) In this stage, memory operands are read and written from/to the memory that is present in the
instruction.
 Stage 5 (Write Back) In this stage, computed/fetched value is written back to the register present in the instructions.
 Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let there be ‘n’ tasks to
be completed in the pipelined processor. Now, the first instruction is going to take ‘k’ cycles to come out of the pipeline but the
other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles.
Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time as ‘Tp’. Let
there be ‘n’ tasks to be completed in the pipelined processor. Now, the first instruction is going to take ‘k’
cycles to come out of the pipeline but the other ‘n – 1’ instructions will take only ‘1’ cycle each, i.e, a total
of ‘n – 1’ cycles. So, time taken to execute ‘n’ instructions in a pipelined processor:

ETpipeline = k + n – 1 cycles = (k + n – 1) Tp

In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will be:

ETnon-pipeline = n * k * Tp
Arithmetic Pipeline and Instruction Pipeline

1. ArithmeticPipeline :
An arithmetic pipeline divides an arithmetic problem into various sub problems for execution in various pipeline segments. It is used
for floating point operations, multiplication and various other computations. The process or flowchart arithmetic pipeline for floating
point addition is shown in the diagram.
 Floating point addition using arithmetic pipeline :

The following sub operations are performed in this case:


 Compare the exponents.
 Align the mantissas.
 Add or subtract the mantissas.
 Normalise the result
 First of all the two exponents are compared and the larger of two exponents is chosen as the result exponent. The difference in the
exponents then decides how many times we must shift the smaller exponent to the right. Then after shifting of exponent, both the
mantissas get aligned. Finally the addition of both numbers take place followed by normalisation of the result in the last segment.
2. Instruction Pipeline :


In this a stream of instructions can be executed by overlapping fetch, decode and execute phases of an instruction cycle. This type
of technique is used to increase the throughput of the computer system. An instruction pipeline reads instruction from the memory
while previous instructions are being executed in other segments of the pipeline. Thus we can execute multiple instructions
simultaneously. The pipeline will be more efficient if the instruction cycle is divided into segments of equal duration.
 In the most general case computer needs to process each instruction in following sequence of steps:
 Fetch the instruction from memory (FI)
 Decode the instruction (DA)
 Calculate the effective address
 Fetch the operands from memory (FO)
 Execute the instruction (EX)
 Store the result in the proper place
Vector processing

According to from where the operands are retrieved in a vector processor, pipe lined vector computers are classified into two
architectural configurations:
1.Memory to memory architecture –
In memory to memory architecture, source operands, intermediate and final results are retrieved (read) directly from the main
memory. For memory to memory vector instructions, the information of the base address, the offset, the increment, and the vector
length must be specified in order to enable streams of data transfers between the main memory and pipelines. The processors like TI-
ASC, CDC STAR-100, and Cyber-205 have vector instructions in memory to memory formats. The main points about memory to
memory architecture are:
1. There is no limitation of size
2. Speed is comparatively slow in this architecture
2.Register to register architecture –
In register to register architecture, operands and results are retrieved indirectly from the main memory through the use of large
number of vector registers or scalar registers. The processors like Cray-1 and the Fujitsu VP-200 use vector instructions in register
to register formats. The main points about register to register architecture are:
1. Register to register architecture has limited size.
2. Speed is very high as compared to the memory to memory architecture.
3. The hardware cost is high in this architecture.
Flowchart
Example:

• Here the instruction is fetched on first clock cycle in segment 1.


• Now it is decoded in next clock cycle, then operands are fetched and finally the instruction is executed. We can see
that here the fetch and decode phase overlap due to pipelining. By the time the first instruction is being decoded,
next instruction is fetched by the pipeline.
• In case of third instruction we see that it is a branched instruction. Here when it is being decoded 4th instruction is
fetched simultaneously. But as it is a branched instruction it may point to some other instruction when it is decoded.
• Thus fourth instruction is kept on hold until the branched instruction is executed. When it gets executed then the
fourth instruction is copied back and the other phases continue as usual.
A block diagram of a modern multiple pipeline vector computer
Array Processor

Array Processor performs computations on large array of data.

These are two types of Array Processors: Attached Array Processor, and SIMD Array Processor.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational tasks auxiliary processor is attached to it.
Attached array processor has two interfaces:

1.Input output interface to a common processor.


2.Interface with a local memory.

Here local memory interconnects main memory. Host computer is general purpose computer. Attached
processor is back end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the computer treats it as an
external interface.
2. SIMD array processor :This is computer with multiple processing units operating in parallel. Both types of array processors,
manipulate vectors but their internal organization is different.

The processing units are synchronized to perform the same operation under the control of a common control unit. Thus providing a single instruction stream, multiple
data stream (SIMD) organization. As shown in figure, SIMD contains a set of identical processing elements (PES) each having a local memory M.
ProcessingElement

Each PE includes
ALU
Floating point arithmetic unit
Working Register

• Master control unit controls the operation in the PEs. The function of master control unit is to decode the instruction and determine how the instruction
to be executed. If the instruction is scalar or program control instruction then it is directly executed within the master control unit.
• Main memory is used for storage of the program while each PE uses operands stored in its local memory .
Multiprocessor:

A Multiprocessor is a computer system with two or more central processing units


(CPUs) share full access to a common RAM. The main objective of using a
multiprocessor is to boost the system’s execution speed, with other objectives being
fault tolerance and application matching.

There are two types of multiprocessors


1. Shared memory multiprocessor
2. Distributed memory multiprocessor.

In shared memory multiprocessors, all the CPUs shares the common memory but in a
distributed memory multiprocessor, every CPU has its own private memory.
Applications of Multiprocessor

1. As a uniprocessor, such as single instruction, single data stream (SISD).


2. As a multiprocessor, such as single instruction, multiple data stream (SIMD), which is usually
used for vector processing.
3. Multiple series of instructions in a single perspective, such as multiple instruction, single data
stream (MISD), which is used for describing hyper-threading or pipelined processors.
4. Inside a single system for executing multiple, individual series of instructions in multiple
perspectives, such as multiple instruction, multiple data stream (MIMD).
Benefits of using a Multiprocessor

 Enhanced performance.
 Multiple applications.
 Multi-tasking inside an application.
 High throughput and responsiveness.
 Hardware sharing among CPUs.
Interconnection Structures

The processors must be able to share a set of main memory modules & I/O devices in a
multiprocessor system. This sharing capability can be provided through interconnection
structures.

The interconnection structure that are commonly used can be given as follows:

1. Time-shared / Common Bus


2. Cross bar Switch
3. Multiport Memory
4. Multistage Switching Network
5. Hypercube System
1.Time-shared / Common Bus

In a multiprocessor system, the time shared bus interconnection provides a common communication path connecting all the functional units like processor, I/O processor,
memory unit etc. The figure below shows the multiple processors with common communication path (single bus).

To communicate with any functional unit, processor needs the bus to transfer the
data. To do so, the processor first need to see that whether the bus is available / not by
checking the status of the bus. If the bus is used by some other functional unit, the
status is busy, else free.

A processor can use bus only when the bus is free. The sender processor puts the
address of the destination on the bus & the destination unit identifies it. In order to
communicate with any functional unit, a command is issued to tell that unit, what
work is to be done. The other processors at that time will be either busy in internal
operations or will sit free, waiting to get bus.
We can use a bus controller to resolve conflicts, if any.

Single-Bus Multiprocessor Organization


Advantages and Disadvantages

Advantages
Inexpensive as no extra hardware is required such as switch.
Simple & easy to configure as the functional units are directly connected to the bus .

Disadvantages
Major fight with this kind of configuration is that if malfunctioning occurs in any of the bus interface circuits, complete
system will fail.

Decreased throughput
At a time, only one processor can communicate with any other functional unit.

Increased arbitration logic


As the number of processors & memory unit increases, the bus contention problem increases.
2.Cross bar Switch
A crossbar switch contains a matrix of simple switch elements that can switch on and off to create or break a connection. Turning on a switch
element in the matrix, a connection between a processor and a memory can be made. Crossbar switches are non-blocking, that is all
communication permutations can be performed without blocking.

The crossbar switch organization includes various crosspoints that are located at intersections between processor buses and memory module
directions. The diagram shows a crossbar switch interconnection between four CPUs and four memory modules.
The tiny square in each crosspoint is a switch that decides the direction from a processor to a memory module. Every switch point has to
control logic to set up the transmission direction between a processor and memory.
It determines the address that is located in the bus to decide whether its specific module is being addressed. It can also resolve several
requests for an approach to the equal memory module on a fixed priority basis.
s

The diagram shows the functional design of a


crossbar switch linked to one memory module. The
circuit includes multiplexers that choose the
information, address, and control from one CPU for
interaction with the memory module.
Priority levels are created by the arbitration logic to choose one CPU when multiple CPUs try to access the equal memory. The
multiplexers are regulated with the binary code that is created through a priority encoder inside the arbitration logic.
3.Multiport Memory

Multiport Memory System employs separate buses between each memory module and each CPU. A processor bus comprises the address, data and
control lines necessary to communicate with memory. Each memory module connects each processor bus. At any given time, the memory module
should have internal control logic to obtain which port can have access to memory.

Memory module can be said to have four ports and each port accommodates one of the buses. Assigning fixed priorities to each memory port resolve
the memory access conflicts. the priority is established for memory access associated with each processor by the physical port position that its bus
occupies in each module. Therefore CPU 1 can have priority over CPU 2, CPU 2 can have priority over CPU 3 and CPU 4 can have the lowest
priority.

Advantage:-
High transfer rate can be achieved because of multiple paths
Disadvantage:-
It requires expensive memory control logic and a large number of cables and
connectors.
It is only good for systems with small number of processors.
4.Multistage Switching Network

The 2×2 crossbar switch is used in the multistage network. It has 2 inputs (A & B) and 2 outputs (0 & 1). To establish the connection between the input
& output terminals, the control inputs C A & CB are associated.

The input is connected to 0 output if the


control input is 0 & the input is connected
to 1 output if the control input is 1.
This switch can arbitrate between
conflicting requests.

Only 1 will be connected if both A & B


require the same output terminal, the other
will be blocked/ rejected.

Contd…..
2 * 2 Crossbar Switch
We can construct a multistage network using 2×2 switches, in order to control the communication between a number of sources & destinations.
Creating a binary tree of cross-bar switches accomplishes the connections to connect the input to one of the 8 possible destinations.

1 to 8 way switch using 2*2 Switch

In the above diagram, PA & PB are 2 processors, and they are connected to 8 memory modules in a binary way from 000(0) to 111(7) through switches. Three levels are there from
a source to a destination. To choose output in a level, one bit is assigned to each of the 3 levels. There are 3 bits in the destination number: 1st bit determines the output of the
switch in 1st level, 2nd bit in 2nd level & 3rd bit in the 3rd level.
Example: If the source is: PB & the destination is memory module 011 (as in the figure): A path is formed from P B to 0
output in 1st level, output 1 in 2nd level & output 1 in 3rd level.
Usually, the processor acts as the source and the memory unit acts as a destination in a tightly coupled system. The
destination is a memory module. But, processing units act as both, the source and the destination in a loosely coupled
system.
Many patterns can be made using 2×2 switches such as Omega networks, Butterfly Network, etc.

Conclusion :
Interconnection structure can decide the overall system’s performance in a multi-processor environment. To overcome the
disadvantage of the common bus system, i.e., availability of only 1 path & reducing the complexity (crossbar have the
complexity of O(n2))of other interconnection structure, Multi-Stage Switching network came. They used smaller switches,
i.e., 2×2 switches to reduce the complexity. To set the switches, routing algorithms can be used. Its complexity and cost are
less than the cross-bar interconnection network
5.Hypercube System

Hypercube (or Binary n-cube multiprocessor) structure represents a loosely coupled system made up of N=2n processors interconnected in an n-dimensional binary cube. Each
processor makes a made of the cube. Each processor makes a node of the cube. Therefore, it is customary to refer to each node as containing a processor, in effect it has not only a CPU
but also local memory and I/O interface. Each processor has direct communication paths to n other neighbor processors. These paths correspond to the cube edges.
There are 2 distinct n-bit binary addresses which can be assigned to the processors. Each processor address differs from that of each of its n neighbors by exactly one bit position.
Hypercube structure for n= 1, 2 and 3.
A one cube structure contains n = 1 and 2n = 2.
It has two processors interconnected by a single path.
A two-cube structure contains n=2 and 2n=4.
It has four nodes interconnected as a cube.
An n-cube structure contains 2n nodes with a processor residing in each node.
Each node is assigned a binary address in such a manner, that the addresses of two neighbors differ in exactly one bit position. For example, the three neighbors of the node with
address 100 are 000, 110, and 101 in a three-cube structure. Each of these binary numbers differs from address 100 by one bit value.
Interprocess communication and synchronization.

Interprocess communication is the mechanism provided by the operating system that allows processes to communicate with each other. This
communication could involve a process letting another process know that some event has occurred or the transferring of data from one process
to another.

A diagram that illustrates interprocess communication is as follows


Synchronization in Interprocess Communication

Synchronization is a necessary part of Interprocess communication. It is either provided by the interprocess control mechanism or
handled by the communicating processes.

Some of the methods to provide synchronization are as follows :

Semaphore A semaphore is a variable that controls the access to a common resource by multiple processes. The two types of
semaphores are binary semaphores and counting semaphores.

Mutual Exclusion Mutual exclusion requires that only one process thread can enter the critical section at a time. This is useful for
synchronization and also prevents race conditions.

Barrier A barrier does not allow individual processes to proceed until all the processes reach it. Many parallel languages and
collective routines impose barriers.

Spinlock This is a type of lock. The processes trying to acquire this lock wait in a loop while checking if the lock is available or not.
This is known as busy waiting because the process is not doing any useful operation even though it is active.
Approaches to Interprocess Communication

The different approaches to implement interprocess communication are given as follows −


Pipe A pipe is a data channel that is unidirectional. Two pipes can be used to create a two-way data channel between two processes.
This uses standard input and output methods. Pipes are used in all POSIX systems as well as Windows operating systems.
Socket The socket is the endpoint for sending or receiving data in a network. This is true for data sent between processes on the same
computer or data sent between different computers on the same network. Most of the operating systems use sockets for interprocess
communication.
File A file is a data record that may be stored on a disk or acquired on demand by a file server. Multiple processes can access a file as
required. All operating systems use files for data storage.
Signal Signals are useful in interprocess communication in a limited way. They are system messages that are sent from one process to
another. Normally, signals are not used to transfer data but are used for remote commands between processes.
Shared Memory Shared memory is the memory that can be simultaneously accessed by multiple processes. This is done so that the
processes can communicate with each other. All POSIX systems, as well as Windows operating systems use shared memory.
Message Queue Multiple processes can read and write data to the message queue without being connected to each other. Messages are
stored in the queue until their recipient retrieves them. Message queues are quite useful for interprocess communication and are used by
most operating systems
A diagram that demonstrates message queue and shared memory methods of Interprocess communication
Cache Coherence

Cache coherence : In a multiprocessor system, data inconsistency may occur among adjacent levels or within the same level of the memory hierarchy.
In a shared memory multiprocessor with a separate cache memory for each processor, it is possible to have many copies of any one instruction operand:
one copy in the main memory and one in each cache memory. When one copy of an operand is changed, the other copies of the operand must be
changed also.

Example : Cache and the main memory may have inconsistent copies of the same object.
Suppose there are three processors, each having cache. Suppose the following scenario:-
Processor 1 read X : obtains 24 from the memory and caches it.
Processor 2 read X : obtains 24 from memory and caches it.
Again, processor 1 writes as X : 64, Its locally cached copy is updated.

Now, processor 3 reads X, what value should it get?


Memory and processor 2 thinks it is 24 and processor 1 thinks it is 64.

As multiple processors operate in parallel, and independently multiple caches may possess different copies of the same memory block,
this creates a cache coherence problem. Cache coherence is the discipline that ensures that changes in the values of shared operands are
propagated throughout the system in a timely fashion.

There are three distinct level of cache coherence :-

1. Every write operation appears to occur instantaneously.


2. All processors see exactly the same sequence of changes of values for each separate operand.
3. Different processors may see an operation and assume different sequences of values; this is known as non-coherent behavior.
There are various Cache Coherence Protocols in multiprocessor system.

These are :-

MSI protocol (Modified, Shared, Invalid)


MOSI protocol (Modified, Owned, Shared, Invalid)
MESI protocol (Modified, Exclusive, Shared, Invalid)
MOESI protocol (Modified, Owned, Exclusive, Shared, Invalid)

These important terms are discussed as follows:

Modified – It means that the value in the cache is dirty, that is the value in current cache is different from the main memory.
Exclusive – It means that the value present in the cache is same as that present in the main memory, that is the value is clean.
Shared – It means that the cache value holds the most recent data copy and that is what shared among all the cache and main memory
as well.
Owned – It means that the current cache holds the block and is now the owner of that block, that is having all rights on that particular
blocks.
Invalid – This states that the current cache block itself is invalid and is required to be fetched from other cache or main memory.
Coherency mechanisms

There are three types of coherence :

1. Directory-based – In a directory-based system, the data being shared is placed in a common directory that maintains the coherence
between caches. The directory acts as a filter through which the processor must ask permission to load an entry from the primary memory
to its cache. When an entry is changed, the directory either updates or invalidates the other caches with that entry.

2. Snooping – First introduced in 1983, snooping is a process where the individual caches monitor address lines for accesses to memory
locations that they have cached. It is called a write invalidate protocol. When a write operation is observed to a location that a cache has a
copy of and the cache controller invalidates its own copy of the snooped memory location.

3. Snarfing – It is a mechanism where a cache controller watches both address and data in an attempt to update its own copy of a memory
location when a second master modifies a location in main memory. When a write operation is observed to a location that a cache has a
copy of the cache controller updates its own copy of the snarfed memory location with the new data.
Thank You

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy