0% found this document useful (0 votes)

1 views11 pages

Cod 5 Coa

Data dependencies occur when one instruction relies on the result of another, affecting the execution order and potential for parallelism. There are three main types: RAW (Read After Write), WAW (Write After Write), and WAR (Write After Read), each impacting instruction execution differently. Techniques like out-of-order execution and instruction reordering can help mitigate the effects of these dependencies to improve instruction-level parallelism.

Uploaded by

heyna2617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views11 pages

Cod 5 Coa

Uploaded by

heyna2617

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

What are Data Dependencies?

A data dependency occurs when one instruction requires the result of another instruction before it can execute. In simple terms, if Instruction 2 needs data from
Instruction 1, then Instruction 2 cannot run until Instruction 1 is done. This is because the result of Instruction 1 is needed for Instruction 2 to proceed.

Example of Data Dependency:

L.D F0, 0(R1) ; Load value into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store result in F4

 First Instruction: L.D F0, 0(R1) loads data into register F0.

 Second Instruction: ADD.D F4, F0, F2 adds F0 and F2 and stores the result in F4. This instruction depends on the result of the first instruction because it
needs F0.

In this case, Instruction 2 cannot execute until Instruction 1 finishes, since it needs F0's value.

2. Types of Data Dependencies

There are three types of data dependencies that can occur between instructions:

 RAW (Read After Write): This is the most common type of dependency. It happens when one instruction tries to read a register that has not yet been
written to by a previous instruction.

o Example: In the code above, ADD.D tries to read F0 after L.D has written to it.

 WAW (Write After Write): This occurs when two instructions try to write to the same register. This can cause issues if one write happens before the
other, which could lead to incorrect results.

 WAR (Write After Read): This happens when an instruction writes to a register that is read by an earlier instruction. The second instruction’s write can
overwrite the value before the first instruction has used it, which leads to incorrect results.

ffects of Data Dependencies in Instruction-Level Parallelism

Instruction-Level Parallelism (ILP) refers to the ability of a processor to execute multiple instructions in parallel, improving performance by making use of
available processing power. However, the presence of data dependencies between instructions can have a significant impact on the extent to which ILP can be
exploited. Understanding how these dependencies affect instruction execution is essential for optimizing performance.

What are Data Dependencies?

A data dependency occurs when one instruction relies on the result of a previous instruction. These dependencies create constraints on how instructions can be
executed in parallel. In other words, the processor must respect the order of instructions with data dependencies to ensure that the results are correct.

There are different types of data dependencies, and each one can affect instruction execution differently. The three primary types of data dependencies are:

 True Data Dependence (RAW): An instruction depends on the result of a previous instruction.

 Write-After-Write (WAW): Two instructions write to the same register or memory location.

 Write-After-Read (WAR): One instruction writes to a register or memory location that another instruction has already read from.

Types of Data Dependencies

1. True Data Dependence (RAW - Read After Write)

True Data Dependence occurs when an instruction needs data produced by a previous instruction. For example, if an instruction is adding two numbers, and one
of the numbers is the result of a previous instruction, the second instruction cannot execute until the first one has completed.

Example:

L.D F0, 0(R1) ; Load value into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store result in F4

In this example:

 The second instruction (ADD.D) depends on the result of the first instruction (L.D). The value of F0 from L.D is needed by ADD.D to perform the addition.

 Effect on ILP: These instructions cannot be executed in parallel. The second instruction must wait for the first one to finish, reducing the potential for
parallel execution.

2. Write After Write (WAW)

Write After Write occurs when two instructions attempt to write to the same register or memory location. The order of writes must be preserved to avoid
incorrect results.

Example:

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F4, F4, F1 ; F4 = F4 * F1

In this case:

 Both instructions are trying to write to register F4. The second instruction (MUL.D) must wait for the first instruction (ADD.D) to finish writing its result to
F4 before it can perform its own write.

 Effect on ILP: While these instructions could be executed in parallel, they must follow the correct order of writes, limiting parallel execution.

3. Write After Read (WAR)

Write After Read happens when an instruction writes to a register or memory location that another instruction has read from. The write must be scheduled
carefully to avoid overwriting the value before it has been read.
Example:

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F2, F4, F1 ; F2 = F4 * F1

In this case:

 The second instruction (MUL.D) writes to F2, while the first instruction (ADD.D) has already read from F2. The write must occur after the read, ensuring
no overwriting of the value before it's used.

 Effect on ILP: The processor needs to manage the timing of these operations carefully to prevent incorrect results, which can limit parallel execution.

4. Data Hazards and Their Effects

A data hazard occurs when instructions that depend on each other are executed too close together, causing a delay in execution. These hazards can lead to
pipeline stalls, where the processor has to pause the execution of instructions to resolve dependencies.

Raw Hazard (Read After Write)

 Effect on ILP: When a later instruction needs data that is still being computed by an earlier instruction, it must wait for the result, causing a delay and
reducing parallelism.

Example of RAW Hazard:

L.D F0, 0(R1) ; Load data into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store in F4

 The ADD.D instruction cannot execute until L.D finishes loading the data into F0. This causes a delay and reduces the potential for parallel execution.

WAW and WAR Hazards

 Effect on ILP: Both WAW and WAR hazards can also cause delays, but they are less common than RAW hazards. These hazards involve managing the
correct order of writes and reads to registers or memory locations. When these hazards are present, the processor must schedule instructions carefully
to avoid incorrect results or unnecessary stalls.

5. Memory Dependencies

Memory dependencies are harder to manage than register dependencies because the processor may not always know if two memory accesses refer to the same
location. For example, two memory addresses might appear different in the code, but they could point to the same physical memory location.

Example of Memory Dependency:

L.D F0, 0(R1) ; Load data into F0

S.D F0, 0(R1) ; Store data from F0 into memory

 Both instructions access the same memory location. The second instruction (S.D) may overwrite the value in memory before the first instruction (L.D) has
finished using it.

 Effect on ILP: Memory dependencies make it difficult for the processor to determine if two instructions can be executed in parallel, leading to potential
delays and reduced parallelism.

6. Out-of-Order Execution

Modern processors use out-of-order execution to mitigate the effects of data dependencies. This allows the processor to execute instructions as soon as their
operands are available, even if they appear out of order in the program.

 Effect on ILP: Out-of-order execution can improve parallelism by allowing independent instructions to run while waiting for data from dependent
instructions. However, the processor must ensure that the final program order is maintained, particularly for store instructions.

Example of Out-of-Order Execution: If ADD.D depends on L.D to load a value into F0, and there is another independent instruction that doesn’t need F0, the
processor can execute that independent instruction while waiting for L.D to finish.

7. Techniques to Improve Instruction-Level Parallelism

There are several techniques that processors use to manage data dependencies and maximize instruction-level parallelism:

 Forwarding (Bypassing): This allows the result of one instruction to be sent directly to another instruction without waiting for it to be written back to a
register. This reduces pipeline stalls caused by RAW hazards.

Example of Forwarding:

ADD.D F4, F2, F0 ; F4 = F2 + F0

S.D F4, 0(R1) ; Store F4

Instead of waiting for ADD.D to write F4 to memory, the processor can forward the result directly to S.D.

 Instruction Reordering: The compiler can reorder instructions to avoid data hazards. By scheduling independent instructions together, it’s possible to
increase parallel execution and reduce delays.

 Dynamic Scheduling: Techniques like Tomasulo’s algorithm allow the processor to schedule instructions dynamically based on available resources,
reducing the impact of data hazards and allowing for out-of-order execution.

Conclusion
Data dependencies play a crucial role in instruction-level parallelism. True data dependencies (RAW) are the most common and can cause stalls when
instructions need to wait for the results of earlier instructions. Other dependencies like WAW and WAR can also limit parallelism by requiring careful
management of writes and reads.

To overcome these limitations, processors use techniques such as out-of-order execution, forwarding, and instruction reordering. These techniques help
maximize parallelism by reducing the impact of data hazards and improving overall performance.

By understanding and managing data dependencies, both hardware and software can be optimized to achieve better instruction-level parallelism and higher
processing efficiency.

Name Dependencies and Their Effects on Instruction-Level Parallelism

In instruction-level parallelism (ILP), name dependencies occur when two instructions use the same register or memory location but do not directly transfer data
between them. Despite this, name dependencies can affect how instructions are executed in parallel. There are two main types of name dependencies:
antidependence and output dependence.

1. Antidependence (Read After Write)

 What is it?
Antidependence happens when one instruction reads a register or memory location that a previous instruction has written to. The program must execute
in the original order to ensure that the read instruction gets the correct value.

 Example:

S.D F4, 0(R1) ; Store value in F4 to memory at address 0(R1)

DADDIU R1, R1, #-8 ; Decrement R1 by 8

o In this example, DADDIU reads R1, and S.D writes to R1. The DADDIU instruction must read the value before S.D changes it.

o Effect on ILP: This limits parallel execution, as the instructions must run in order to ensure DADDIU gets the correct value of R1.

2. Output Dependence (Write After Write)

 What is it?
Output dependence occurs when two instructions write to the same register or memory location. The order in which the instructions execute must be
preserved to ensure that the correct value is written.

 Example:

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F4, F4, F1 ; F4 = F4 * F1

o Here, both instructions write to F4. The second instruction, MUL.D, must wait for the first one to finish writing to F4 to ensure the correct value is
stored.

o Effect on ILP: This also limits parallel execution because the order of writes must be maintained.

 Impact on Instruction-Level Parallelism

 Name dependencies can limit ILP because they prevent instructions from executing in parallel when they use the same registers or memory
locations. However, since there is no direct data transfer between instructions (unlike true data dependencies), name dependencies are
more flexible and can be resolved using techniques like register renaming.
 By renaming registers or changing instruction order, processors can reduce the impact of name dependencies, increasing parallelism and
improving performance.
 Conclusion
 Name dependencies (antidependence and output dependence) influence ILP by limiting parallel execution. These dependences ensure that
instructions execute in the correct order, but they do not involve direct data flow. By using techniques like register renaming, processors
can overcome these dependencies and allow instructions to run in parallel, leading to better performance.

Control dependences are one type of dependence that affects how instructions are scheduled in the pipeline, especially in the presence of
branching instructions (like if or loop).
What are Control Dependences?
A control dependence arises when the execution of one instruction depends on the outcome of a branch (such as an if statement or a loop
condition). These dependences determine the flow of the program, ensuring that instructions are executed in the correct order and that branches
are taken or skipped as intended.
Control dependences ensure that instructions dependent on a branch do not execute before the branch condition is known. For example,
instructions that are part of the "then" block of an if statement should only execute if the branch is taken. If you try to execute those instructions
before the branch decision, the program's behavior will be incorrect.

Consider the following code snippet with an if statement:

if p1 {
S1; // S1 is executed if p1 is true
}

S2; // S2 is executed unconditionally

 Control dependence:
o S1 is control dependent on p1. This means S1 will only execute if p1 is true.
o S2 is not control dependent on p1 because it always executes regardless of the condition.
Impact on Instruction-Level Parallelism:
 Preserving program order: In a pipeline, instructions like S1 that depend on p1's condition cannot be executed out of order with respect to
the branch (p1). The processor needs to wait until the outcome of the branch is known before it can execute S1.
 Out-of-order execution restrictions: If we attempt to reorder instructions and execute S2 before the branch (p1), we might end up
executing S2 even when it's not logically correct (for example, performing an unnecessary computation).
Control Dependence Example 2: If-Else with Two Branches
Consider the following code with two different branch conditions:
if p1 {
S1; // S1 is executed if p1 is true
}
if p2 {
S2; // S2 is executed if p2 is true
}
 Control dependence:
o S1 is control dependent on p1.
o S2 is control dependent on p2.
o The two branches are independent, meaning S1 and S2 are not dependent on each other.
Pipelining
Pipelining is a technique used in modern computer processors to improve performance by allowing multiple instructions to be processed
simultaneously. The idea behind pipelining is to overlap the execution of different stages of different instructions, similar to how an assembly line
works in manufacturing.

How Does Pipelining Work?

A typical five-stage pipeline in a processor might consist of the following stages:
1. Instruction Fetch (IF): The processor fetches the instruction from memory.
2. Instruction Decode (ID): The instruction is decoded to determine what operation it should perform.
3. Execute (EX): The operation is carried out (e.g., arithmetic or logical operation).
4. Memory Access (MEM): If the instruction involves memory (like a load or store), memory is accessed.
5. Write Back (WB): The result of the operation is written back to the register file or memory.
Basic Performance Issues in Pipelining
Pipelining is a technique that improves performance by allowing multiple instructions to be processed at the same time, increasing instruction
throughput. However, it doesn't always make each individual instruction faster, and in some cases, it may increase the time needed for each
instruction because of the added overhead. Even with this, pipelining can speed up the total execution of a program by handling multiple
instructions simultaneously. Let’s look at the key factors that affect pipelining’s performance.

1. Pipeline Latency and Imbalance Among Stages

Pipelining divides the execution of an instruction into several stages. Ideally, all stages should take the same amount of time, but in practice, some
stages are slower than others, creating an imbalance.
 Slowest Stage Limits Speed: The overall speed of the pipeline is determined by its slowest stage. If one stage takes longer than others, the
whole pipeline has to wait for it, causing delays. For example, if one stage takes 5 nanoseconds while another takes 3 nanoseconds, the
faster stages must wait for the slower one, reducing throughput.

2. Pipeline Overhead
Pipeline overhead is the extra time needed to manage the pipeline.
 Pipeline Register Delay: Each stage in the pipeline is separated by registers that store intermediate results. These registers add some delay
when passing data between stages.
 Clock Skew: The clock signal does not always reach all parts of the pipeline at the same time. This slight delay, known as clock skew, can
cause timing problems and slow down the system.
Both factors create delays that can reduce the efficiency of pipelining, especially in processors with high speeds.

3. Limits of Pipelining
Pipelining’s performance benefits are limited by several factors:
 Pipeline Depth: Adding more stages could theoretically increase performance, but each new stage introduces more overhead. There are
limits to how deep the pipeline can go without causing problems.
 Diminishing Returns: After a certain point, adding more pipeline stages doesn’t improve performance much. The extra overhead from
managing the pipeline outweighs the benefits.
 Pipeline Hazards: These are problems that can prevent the pipeline from running smoothly:
o Data Hazards: When an instruction depends on the result of a previous one that hasn’t finished.
o Control Hazards: When a branch instruction changes the flow of the program and causes delays.
o Structural Hazards: When there aren’t enough resources to handle all the instructions at once.
These hazards can slow down the pipeline, especially if the processor isn’t designed to handle them well.

4. Pipelining Overheads in Real-World Processors

The Pentium 4 is a good example of how pipeline overhead can affect performance. The Pentium 4 had a very deep pipeline (over 20 stages), but
this depth caused extra overhead that reduced its effectiveness compared to the Pentium III, which had a shorter pipeline. Although the Pentium 4
had a faster clock speed, its deeper pipeline didn’t always lead to better performance.

Conclusion
Pipelining helps improve the overall performance of a processor by allowing multiple instructions to be processed at the same time. However, it
doesn’t always make each instruction run faster and can introduce delays due to overhead, imbalance between stages, and hazards. Despite these
challenges, pipelining remains crucial in modern processors because it enables faster overall program execution by processing multiple instructions
in parallel. The key to good performance is balancing pipeline depth, managing hazards, and optimizing the design to minimize overhead.
The Classic Five-Stage Pipeline for a RISC Processor
In pipelining, we start a new instruction every clock cycle, breaking down the execution of each instruction into five distinct stages. These stages
are:
 IF (Instruction Fetch): Fetch the instruction from memory.
 ID (Instruction Decode): Decode the instruction and read registers.
 EX (Execution): Perform the arithmetic or logical operation.
 MEM (Memory Access): Access memory (for load/store instructions).
 WB (Write-back): Write the result back to the register file.
By doing this, multiple instructions are processed at different stages simultaneously, improving overall throughput. For example, while instruction 1
is in the EX stage, instruction 2 is in the ID stage, and instruction 3 is in the IF stage, allowing all stages to be active during each clock cycle. In this
ideal pipeline, if we start a new instruction every clock cycle, the processor can theoretically complete five times as many instructions as an
unpipelined processor in the same amount of time.

 Instruction Fetch (IF):

 The processor fetches the instruction from memory.
 The Program Counter (PC) holds the address of the instruction.
 After fetching the instruction, the PC is updated to point to the next instruction.
 Instruction Decode (ID):
 The fetched instruction is decoded to determine the operation.
 The operands needed for the instruction are read from the register file.
 The control signals for the next stages are generated based on the decoded instruction.
 Execution (EX):
 The ALU (Arithmetic Logic Unit) performs the required operation (e.g., arithmetic, logical operations).
 If the instruction is a memory operation, the effective address is calculated.
 Memory Access (MEM):
 If the instruction is a memory access (load or store), memory is accessed here.
 For load instructions, data is read from memory; for store instructions, data is written to memory.
 Write-Back (WB):
 The result of the instruction is written back to the register file.
 For load instructions, the fetched data from memory is written to the register.
 For arithmetic instructions, the result of the computation is stored in the destination register.

What is the Five-Stage Pipeline in a RISC Processor?

The five-stage pipeline is a method of organizing the execution of instructions in a RISC processor. It splits the work of executing an instruction into
five smaller steps (or "stages"). These stages happen in parallel for different instructions, allowing the processor to execute more instructions in
less time.
The idea is similar to an assembly line in a factory: while one instruction is being executed in one stage, another instruction can start at an earlier
stage. This increases the number of instructions the processor can handle per second.

The Five Stages of the Pipeline

Each instruction passes through these five stages. Here's a detailed breakdown of each stage.

1. Instruction Fetch (IF)

This is the first step, where the processor fetches (retrieves) the instruction from the instruction memory.
 Purpose: Get the instruction from memory so the CPU knows what to do.
 Steps Involved:
1. The Program Counter (PC) holds the memory address of the instruction to be fetched.
2. The instruction is retrieved from the Instruction Memory using the PC.
3. The PC is updated to point to the next instruction. This is done by adding 4 to the PC (since each instruction is 4 bytes in a RISC
processor).
 Key Components:
o Instruction Memory: Stores the program's instructions.
o Program Counter (PC): Tracks the address of the next instruction.
o Adder: Adds 4 to the PC to point to the next instruction.
 Output:
o The fetched instruction is stored in a temporary register for the next stage.
o The PC is updated for the next fetch.

2. Instruction Decode and Register Fetch (ID)

Once the instruction is fetched, the processor decodes it to figure out what action to perform. This stage also retrieves the required data from the
registers.
 Purpose:
o Understand the instruction and fetch the required register values.
 Steps Involved:
1. The instruction is split into its parts:
 The opcode (operation code) specifies the type of operation (e.g., ADD, SUB, LOAD, etc.).
 The operands specify the data the instruction will use (registers or constants).
2. The source registers (if any) are read from the register file.
3. If the instruction uses an immediate value (a constant), it is sign-extended to match the required bit-width.
4. If the instruction is a branch, the branch target address is calculated by adding the offset to the PC.
 Key Components:
o Instruction Decoder: Splits the instruction into opcode and operands.
o Register File: Stores the values of all registers.
o Sign Extender: Converts smaller constants into larger ones.
o Adder: Calculates the branch target address if needed.
 Output:
o Decoded instruction, register values, and (if applicable) a branch target address.

3. Execution / Effective Address Calculation (EX)

This is the core stage where the actual computation or address calculation happens, depending on the instruction type.
 Purpose:
o Perform arithmetic or logical operations (e.g., addition, subtraction, AND, OR).
o Calculate the memory address for load or store instructions.
 Steps Involved:
1. The ALU (Arithmetic Logic Unit) performs the required operation. This could be:
 An arithmetic operation like addition (e.g., ADD R1, R2, R3).
 A logical operation like AND, OR, etc.
 An address calculation for memory instructions (base address + offset).
2. For branch instructions, the condition is evaluated (e.g., check if two registers are equal for a BEQ instruction).
 Key Components:
o ALU: Executes arithmetic and logical operations.
o Multiplexers: Select the inputs to the ALU (register values or immediate values).
 Output:
o The result of the ALU operation (e.g., a computed value or an effective address).

4. Memory Access (MEM)

This stage is responsible for interacting with the data memory for instructions that involve loading or storing data.
 Purpose:
o Perform memory read or write operations for load/store instructions.
 Steps Involved:
1. If the instruction is a load, the processor reads the data from the memory address calculated in the previous stage.
2. If the instruction is a store, the processor writes the data (from a register) to the memory address.
3. For instructions that don’t involve memory (e.g., arithmetic operations), this stage is skipped.
 Key Components:
o Data Memory: Stores and retrieves data.
 Output:
o Data read from memory (for load instructions).
o Nothing for other instructions.

5. Write-Back (WB)
This is the final stage, where the result of the instruction is written back into the register file.
 Purpose: Update the destination register with the result of the instruction.
 Steps Involved:
1. For arithmetic instructions, the result from the ALU is written to the destination register.
2. For load instructions, the data read from memory is written to the destination register.
3. For branch and store instructions, this stage does nothing.
 Key Components:
o Register File: Destination for the final result.
 Output:
o The destination register is updated with the result.

How the Pipeline Works

In a pipelined processor:
 Each instruction passes through all five stages.
 Different instructions are in different stages simultaneously. For example:
o While Instruction 1 is in the MEM stage, Instruction 2 is in the EX stage, and Instruction 3 is in the ID stage.
Example of Overlapping Execution:
Cycle Instruction 1 Instruction 2 Instruction 3 Instruction 4 Instruction 5
1 IF
2 ID IF
3 EX ID IF
4 MEM EX ID IF
5 WB MEM EX ID IF
6 WB MEM EX ID

Advantages of the Pipeline

1. Increased Throughput: Once the pipeline is filled, the processor can complete one instruction per clock cycle.
2. Efficient Resource Utilization: Different hardware components (e.g., ALU, memory) are used simultaneously for different instructions.
3. Modular Design: Each stage is independent and can be optimized separately.

Challenges in the Pipeline

1. Data Hazards:
o When one instruction depends on the result of a previous instruction still in the pipeline.
2. Control Hazards:
o When the pipeline encounters a branch instruction and doesn’t know which instruction to fetch next.
3. Structural Hazards:
o When two instructions need the same hardware resource at the same time.
4. Stalls:
o Situations where the pipeline must stop temporarily to resolve hazards.

Conclusion
The Classic Five-Stage Pipeline is a powerful and efficient design for executing instructions in a RISC processor. By overlapping the execution of
instructions, it maximizes throughput. However, careful management of hazards and pipeline control is required to maintain its performance.
See the textbook 2 page 655 for diagrams and other diagram from qb

this is just an eg
The advantages of pipelined execution over non-pipelined execution include:
1. Higher throughput.
2. Better resource utilization.
3. Faster program execution.
4. Improved scalability and modularity.
5. Support for higher clock speeds.
6. Reduced CPI.
7. Better exploitation of parallelism.
pipelined instruction execution offers several advantages over non-pipelined instruction execution, especially in terms of performance,
efficiency, and resource utilization. Here's a detailed and easy-to-understand comparison of the two approaches:

1. Increased Instruction Throughput

 Pipelined Execution:
o Instructions are processed in an overlapping manner, with multiple instructions at different stages of execution simultaneously.
o Once the pipeline is filled, a new instruction is completed every clock cycle.
 Non-Pipelined Execution:
o Each instruction is executed sequentially, meaning the processor has to wait for one instruction to finish all its stages before starting
the next.
 Advantage: In pipelining, the processor can execute more instructions per unit of time, increasing overall throughput.

2. Improved CPU Efficiency

 Pipelined Execution:
o Different hardware components (e.g., ALU, instruction memory, data memory) are utilized simultaneously by different instructions
in different stages of the pipeline.
o This ensures that no part of the CPU sits idle for long.
 Non-Pipelined Execution:
o Each instruction occupies the entire CPU until completion, leaving many hardware units idle during certain stages.
 Advantage: Pipelining minimizes idle time, making better use of available resources.

3. Faster Program Execution

 Pipelined Execution:
o The time to execute a single instruction (latency) might not decrease, but the effective time to execute a set of instructions is
significantly reduced.
 Non-Pipelined Execution:
o Each instruction takes the full processing time sequentially, resulting in a slower program execution.
 Advantage: Programs execute faster in a pipelined processor because multiple instructions are processed in parallel.

4. Scalability and Modularity

 Pipelined Execution:
o The processor design is more modular, with clearly defined stages (e.g., fetch, decode, execute). This makes it easier to improve or
scale individual stages without affecting the entire system.
 Non-Pipelined Execution:
o The processor design is monolithic and harder to optimize or scale since all stages are tightly integrated.
 Advantage: Pipelining allows easier optimization and the introduction of more advanced features.

5. Higher Clock Speeds

 Pipelined Execution:
o Each stage of the pipeline performs only a small part of the instruction execution, so the clock cycle can be shortened. This allows
the processor to operate at a higher frequency.
 Non-Pipelined Execution:
o The processor needs a longer clock cycle to accommodate the entire execution of an instruction, limiting its operating frequency.
 Advantage: Pipelining enables higher clock speeds, resulting in faster instruction processing.

6. Better Performance in Multi-Instruction Workloads

 Pipelined Execution:
o Pipelining performs exceptionally well for programs with a large number of sequential instructions because it overlaps their
execution.
 Non-Pipelined Execution:
o Performance is limited to sequential execution, with no opportunity to overlap instructions.
 Advantage: Pipelining is especially beneficial for instruction-heavy tasks like data processing and large loops.

7. Reduced CPI (Cycles Per Instruction)

 Pipelined Execution:
o After the pipeline is filled, only one clock cycle is needed to complete one instruction, resulting in a CPI close to 1.
 Non-Pipelined Execution:
o Each instruction takes multiple clock cycles to execute, leading to a CPI greater than 1.
 Advantage: Pipelining reduces the average cycles per instruction, improving performance.

8. Enhanced Instruction-Level Parallelism (ILP)

 Pipelined Execution:
o Instructions are executed in parallel, increasing the degree of instruction-level parallelism.
 Non-Pipelined Execution:
o Instructions are executed sequentially, with no parallelism.
 Advantage: Pipelining allows for better exploitation of parallelism, improving processor efficiency.

Example to Illustrate the Advantage

Let’s say it takes 5 clock cycles to execute an instruction in a non-pipelined processor.
Non-Pipelined Execution:
 If there are 5 instructions, each takes 5 cycles.
 Total time = 5×5=25 clock cycles.
Pipelined Execution:
 In a pipelined processor with 5 stages, the first instruction still takes 5 cycles, but subsequent instructions overlap.
 Total time = 5+(5−1)=9 clock cycles.
This shows how pipelining significantly reduces the execution time.

Challenges to Pipelining
While pipelining has clear advantages, it comes with challenges like data hazards, control hazards, and structural hazards, which require
additional mechanisms (like stalls, forwarding, and branch prediction) to address. However, despite these challenges, pipelined execution is
far more efficient than non-pipelined execution.

1. Structural Hazard
Definition: A structural hazard occurs when there are insufficient hardware resources to execute multiple instructions simultaneously in the
pipeline. This happens when two or more instructions need the same resource at the same time (e.g., ALU, memory).
Example:
Consider a simple pipeline with only one memory unit for both instruction fetch and data memory. If an instruction is trying to fetch data
from memory while another is trying to fetch an instruction, a structural hazard occurs.

2. Data Hazard
Definition: A data hazard occurs when one instruction depends on the result of a previous instruction that has not yet completed its
execution. There are three types of data hazards:
 RAW (Read After Write): The next instruction tries to read a register before the previous instruction writes to it.
 WAR (Write After Read): The next instruction writes to a register before the previous instruction reads it.
 WAW (Write After Write): Two instructions try to write to the same register.
In this case, the SUB instruction needs the value of R1, but R1 is only written back after the ADD instruction completes its WB stage. This
creates a RAW hazard, where the SUB instruction is waiting for the result of the ADD instruction.
To resolve this, a stall can be inserted to delay the SUB instruction until the result of the ADD instruction is available, or data forwarding
can be used to send the value directly from the EX stage of the ADD instruction to the EX stage of the SUB instruction.

3. Control Hazard
Definition: A control hazard occurs when the pipeline does not know which instruction to fetch next because of a branch instruction. The
pipeline may fetch incorrect instructions before the branch is resolved.

Branch Hazard
A branch hazard occurs in pipelined processors when the pipeline is unable to determine which instruction to fetch next due to a branch
instruction (e.g., BEQ, BNE, JMP). This happens because the outcome of the branch is not known immediately—only after the branch instruction is
decoded and evaluated in the pipeline. Until the branch decision is made, the processor might fetch incorrect instructions, leading to pipeline
inefficiency.
Types of Branch Hazards:
1. Branch Taken: The branch condition evaluates to true, so the PC (Program Counter) is updated to the branch target address.
2. Branch Not Taken: The branch condition evaluates to false, and the PC simply increments by 4 (the next sequential instruction).
Example of a Branch Hazard:
Here, the processor fetches the ADD instruction (which is the successor of the branch) even though it doesn't know whether the BEQ branch is
taken or not. If the branch is taken, the instruction fetched will be incorrect, causing a branch misprediction and a delay due to pipeline flush or
correction.
Handling Branch Hazards:
Several techniques can be used to handle branch hazards:
1. Stalling (Pipeline Flush): The pipeline is stalled (i.e., no new instructions are fetched) until the branch outcome is known. This adds delay
but ensures that no incorrect instructions are fetched.
2. Branch Prediction: The processor predicts the outcome of the branch (whether it will be taken or not) and continues to fetch instructions
based on this prediction. If the prediction is wrong, the pipeline is flushed, and the correct instructions are fetched.
3. Delayed Branch: The instruction immediately following the branch (the "delay slot") is always executed, whether the branch is taken or not.
This ensures that some work is done even if the branch decision hasn't been made yet.

What are Data Hazards? Consider two instructions i and j, with i

preceding j in program order. What are the possible data hazards
that can occur in this case.

Data hazards occur in pipelined processors when an instruction depends on the result of a previous instruction that hasn't completed yet. This
dependency can cause delays or incorrect behavior because instructions may proceed before the data they need is available.
For two instructions iii and jjj where iii precedes jjj in program order, the possible data hazards are:
1. Read After Write (RAW) Hazard (True Dependency)
 Definition: This occurs when instruction jjj needs to read a register that instruction iii writes to, but iii has not yet completed writing to that
register.
 Example:
ADD R1, R2, R3 ; i: R1 = R2 + R3
SUB R4, R1, R5 ; j: R4 = R1 - R5
In this case, instruction jjj reads R1R1R1, but R1R1R1 is written by iii. If jjj is executed before iii writes the value to R1R1R1, this creates a RAW
hazard.
2. Write After Read (WAR) Hazard (Anti-dependency)
 Definition: This occurs when instruction jjj writes to a register before instruction iii reads from it. This can lead to incorrect results if jjj
writes to the register before iii has used its value.
 Example:
ADD R1, R2, R3 ; i: R1 = R2 + R3
SUB R2, R4, R5 ; j: R2 = R4 - R5
In this case, iii reads R2R2R2, but jjj writes to R2R2R2. If jjj writes to R2R2R2 before iii reads it, this can cause incorrect behavior, resulting in a WAR
hazard.
3. Write After Write (WAW) Hazard (Output Dependency)
 Definition: This occurs when both instructions iii and jjj write to the same register, and jjj writes to the register before iii does. This can
cause the program to behave incorrectly because the final written value is not the one intended by the program order.
 Example:
ADD R1, R2, R3 ; i: R1 = R2 + R3
SUB R1, R4, R5 ; j: R1 = R4 - R5
In this case, both iii and jjj write to R1R1R1. If jjj writes to R1R1R1 before iii does, this creates a WAW hazard.
Summary of Data Hazards:
1. RAW (Read After Write): Instruction jjj reads from a register written by instruction iii that has not completed yet.
2. WAR (Write After Read): Instruction jjj writes to a register before instruction iii can read from it.
3. WAW (Write After Write): Both instructions write to the same register, and the second one writes before the first one.
Data hazards can be managed using techniques like forwarding (bypassing), stalling (inserting no-ops), or reordering instructions in the code to
avoid dependencies.
The Simple Implementation Without Pipelining" and "Describe the process of implementation of a RISC instruction set with suitable clock
cycles." Here's the detailed explanation:

Simple Implementation Without Pipelining:

In this implementation, every instruction is executed sequentially through five stages. Each instruction will pass through these five stages one by
one, with each stage taking one clock cycle. This is the unpipelined execution model.
Stages and Their Clock Cycles:
1. Instruction Fetch (IF):
o The instruction is fetched from memory, and the program counter (PC) is updated to point to the next instruction.
o Takes 1 clock cycle.
2. Instruction Decode/Register Fetch (ID):
o The instruction is decoded, and the necessary registers are read from the register file.
o The condition for branch instructions (if applicable) is tested, and the offset (if needed) is sign-extended.
o Takes 1 clock cycle.
3. Execution/Effective Address Calculation (EX):
o The ALU performs the operation specified by the instruction.
 For memory reference (load/store), the ALU computes the effective address by adding the base register and the offset.
 For register-register ALU instructions, the ALU performs the operation on values from the register file.
 For register-immediate ALU instructions, the ALU performs the operation with one register value and a sign-extended
immediate.
o Takes 1 clock cycle.
4. Memory Access (MEM):
o If the instruction is a load, memory reads the data from the computed effective address.
o If the instruction is a store, memory writes data to the computed effective address.
o Takes 1 clock cycle.
5. Write-back (WB):
o The result from the ALU or memory is written back to the register file.
o Takes 1 clock cycle.

Process of Implementation of a RISC Instruction Set with Suitable Clock Cycles:

In this RISC instruction set, we assume that all instructions take a maximum of 5 clock cycles. The stages listed above are used for all types of
instructions. Here's the breakdown:
 Instruction Fetch (IF): 1 cycle
 Instruction Decode/Register Fetch (ID): 1 cycle
 Execution/Effective Address Calculation (EX): 1 cycle
 Memory Access (MEM): 1 cycle
 Write-back (WB): 1 cycle
Thus, for most instructions, it takes 5 clock cycles to complete an instruction.
Special Cases:
 Branch Instructions: These typically take 2 cycles since the branch decision (comparison) might require an additional cycle to compute.
 Store Instructions: These take 4 cycles because of the extra memory write operation involved.
Overall Execution Time:
 For a typical instruction (ALU, Load): 5 cycles
 For a branch: 2 cycles (but may take longer if branch misprediction occurs)
 For a store: 4 cycles
Calculation of Average CPI (Cycles Per Instruction):
Given that:
 Branch frequency is 12% (0.12),
 Store frequency is 10% (0.10),
 Load and ALU frequency is 78% (0.78), assuming these are the remaining instructions.
The overall CPI can be calculated as:

So, the overall CPI is approximately 4.54.

Summary:
 Total Clock Cycles per instruction: 5 (for most instructions)
 Branch Instructions: 2 cycles
 Store Instructions: 4 cycles
 Average CPI: 4.54
This simple implementation focuses on a basic 5-stage pipeline where each instruction goes through the same stages in sequence, and each stage
takes 1 clock cycle.

COS 104 Project
No ratings yet
COS 104 Project
24 pages
Digital - Questions - With - Answers - PDF Version 1
No ratings yet
Digital - Questions - With - Answers - PDF Version 1
69 pages
2 TypesofParallelism
No ratings yet
2 TypesofParallelism
69 pages
4-Advanced Pipelining - 241114 - 060906
No ratings yet
4-Advanced Pipelining - 241114 - 060906
80 pages
CompanionAsset 9780128119051 Chapter03
No ratings yet
CompanionAsset 9780128119051 Chapter03
67 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
Unit Iv
No ratings yet
Unit Iv
17 pages
Module 5 Instruction Level Parallelism and Pipelining
No ratings yet
Module 5 Instruction Level Parallelism and Pipelining
54 pages
Coa Mod3 QP
No ratings yet
Coa Mod3 QP
14 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Topic 1 Introduction To Digital Logic and Boolean Algebra
No ratings yet
Topic 1 Introduction To Digital Logic and Boolean Algebra
99 pages
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
No ratings yet
Instruction Level Parallelism and Its Exploitation: Unit Ii by Raju K, Cse Dept
201 pages
ILP Overview and Scoreboard
No ratings yet
ILP Overview and Scoreboard
60 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
MCP Unit 1
No ratings yet
MCP Unit 1
41 pages
Bio Mod5
No ratings yet
Bio Mod5
15 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Pipeline Hazard
No ratings yet
Pipeline Hazard
8 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Semcoa 1
No ratings yet
Semcoa 1
19 pages
Lecture-7-15 01 2025
No ratings yet
Lecture-7-15 01 2025
19 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
CBA Processor
No ratings yet
CBA Processor
21 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Datasheet - HK Mx29lv640ebti-70g 4620702
No ratings yet
Datasheet - HK Mx29lv640ebti-70g 4620702
61 pages
U3.1 Concepts and Challenges
No ratings yet
U3.1 Concepts and Challenges
12 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
MIPS Pipeline For Multi-Cycle Operations: CS223 Computer Architecture & Organization
No ratings yet
MIPS Pipeline For Multi-Cycle Operations: CS223 Computer Architecture & Organization
15 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
22 pages
Chapter 8 - Parellel Processing
No ratings yet
Chapter 8 - Parellel Processing
22 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
775Twins-HDTV multiQIG PDF
No ratings yet
775Twins-HDTV multiQIG PDF
129 pages
Coaint
No ratings yet
Coaint
16 pages
Lecture - 17 - MIPS - Instruction Level Parallelism
No ratings yet
Lecture - 17 - MIPS - Instruction Level Parallelism
27 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
Lec 3
No ratings yet
Lec 3
14 pages
Coaa
No ratings yet
Coaa
6 pages
Mic 4
No ratings yet
Mic 4
31 pages
Lec 2
No ratings yet
Lec 2
21 pages
CA Slides#5 Pipeline Hazards
No ratings yet
CA Slides#5 Pipeline Hazards
33 pages
Temp
No ratings yet
Temp
12 pages
Curly Braces
No ratings yet
Curly Braces
4 pages
PIC Microcontrollers
75% (4)
PIC Microcontrollers
20 pages
Lec5 PDF
No ratings yet
Lec5 PDF
39 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
Data Hazards and Its Handling Methods
No ratings yet
Data Hazards and Its Handling Methods
8 pages
Electronic Nosepdf
No ratings yet
Electronic Nosepdf
5 pages
Coa Iat-2 QB Soln
No ratings yet
Coa Iat-2 QB Soln
16 pages
Data Sheet
No ratings yet
Data Sheet
159 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
09CE2401 ComputerOrganizationpdf 2023 12 18 08 52 11pdf 2025 01 15 08 50 44
No ratings yet
09CE2401 ComputerOrganizationpdf 2023 12 18 08 52 11pdf 2025 01 15 08 50 44
3 pages
Hari's Physics Project
No ratings yet
Hari's Physics Project
12 pages
b0902096 hw4 Report
No ratings yet
b0902096 hw4 Report
3 pages
Circulatory System
No ratings yet
Circulatory System
3 pages
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
No ratings yet
Instruction-Level Parallelism: Stalls Control Stalls WAW Stalls WAR Stalls RAW Stalls Structural CPI CPI
50 pages
Data Dependences and Hazards
No ratings yet
Data Dependences and Hazards
24 pages
CAP EndSem Unit 5
No ratings yet
CAP EndSem Unit 5
8 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Study Guide Chapter 3
No ratings yet
Study Guide Chapter 3
3 pages
MPCA Assignment 11 B - 66
No ratings yet
MPCA Assignment 11 B - 66
5 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
Data Hazards My
No ratings yet
Data Hazards My
2 pages
Chapter One: Overview of Computer Systems
No ratings yet
Chapter One: Overview of Computer Systems
37 pages
Quemador PIC
No ratings yet
Quemador PIC
13 pages
Home Automation
No ratings yet
Home Automation
27 pages
Unit - 1 Microprocessor Architecture
No ratings yet
Unit - 1 Microprocessor Architecture
52 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Parallelism I: Inside The Core
No ratings yet
Parallelism I: Inside The Core
61 pages
Cpu Z
No ratings yet
Cpu Z
33 pages
CEG 2131 - Fall 2002 - Final
No ratings yet
CEG 2131 - Fall 2002 - Final
11 pages
Snps Eda Tool Flow Front End Digital Ic Design Syllabus 2-14-2018
No ratings yet
Snps Eda Tool Flow Front End Digital Ic Design Syllabus 2-14-2018
3 pages
Module2 - Capacitor and Resistor Model
No ratings yet
Module2 - Capacitor and Resistor Model
37 pages
03 Dynamic Sched
No ratings yet
03 Dynamic Sched
84 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
19 pages
Assembly Language Code
No ratings yet
Assembly Language Code
23 pages
A2Z Robo: Project Report
No ratings yet
A2Z Robo: Project Report
45 pages
Instruction Level Parallelism: Soner Onder
No ratings yet
Instruction Level Parallelism: Soner Onder
25 pages
Clariion Codes
No ratings yet
Clariion Codes
20 pages
Instruction-Level Parallelism and Superscalar Processors
No ratings yet
Instruction-Level Parallelism and Superscalar Processors
22 pages
Paper-Loc and Los
No ratings yet
Paper-Loc and Los
5 pages
Chapter 3 - CPU and Memory (Organization)
No ratings yet
Chapter 3 - CPU and Memory (Organization)
6 pages
Minimum Maximum Mode
No ratings yet
Minimum Maximum Mode
29 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Timers in 8051
No ratings yet
Timers in 8051
5 pages
Compiler Techniques For Exposing ILP
No ratings yet
Compiler Techniques For Exposing ILP
26 pages
(6es7952-1al00-0aa0) Memory Card
No ratings yet
(6es7952-1al00-0aa0) Memory Card
1 page
Nfa Epsilon Defined
No ratings yet
Nfa Epsilon Defined
11 pages
Service Manual - Acer Aspire 1400 Series
No ratings yet
Service Manual - Acer Aspire 1400 Series
114 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Instruction Level Parallelism-Concepts N Challenges
100% (1)
Instruction Level Parallelism-Concepts N Challenges
4 pages
System Requirements
No ratings yet
System Requirements
16 pages
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
From Everand
ORACLE PL/SQL Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
5/5 (1)
Interview Questions for IBM Mainframe Developers
From Everand
Interview Questions for IBM Mainframe Developers
Robert Wingate
1/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Cod 5 Coa

Uploaded by

Cod 5 Coa

Uploaded by

What are Data Dependencies?

Example of Data Dependency:

L.D F0, 0(R1) ; Load value into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store result in F4

2. Types of Data Dependencies

ffects of Data Dependencies in Instruction-Level Parallelism

What are Data Dependencies?

Types of Data Dependencies

1. True Data Dependence (RAW - Read After Write)

L.D F0, 0(R1) ; Load value into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store result in F4

2. Write After Write (WAW)

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F4, F4, F1 ; F4 = F4 * F1

3. Write After Read (WAR)

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F2, F4, F1 ; F2 = F4 * F1

4. Data Hazards and Their Effects

Raw Hazard (Read After Write)

Example of RAW Hazard:

L.D F0, 0(R1) ; Load data into F0

ADD.D F4, F0, F2 ; Add F0 and F2, store in F4

WAW and WAR Hazards

Example of Memory Dependency:

L.D F0, 0(R1) ; Load data into F0

S.D F0, 0(R1) ; Store data from F0 into memory

7. Techniques to Improve Instruction-Level Parallelism

ADD.D F4, F2, F0 ; F4 = F2 + F0

S.D F4, 0(R1) ; Store F4

Name Dependencies and Their Effects on Instruction-Level Parallelism

1. Antidependence (Read After Write)

S.D F4, 0(R1) ; Store value in F4 to memory at address 0(R1)

DADDIU R1, R1, #-8 ; Decrement R1 by 8

2. Output Dependence (Write After Write)

ADD.D F4, F2, F0 ; F4 = F2 + F0

MUL.D F4, F4, F1 ; F4 = F4 * F1

 Impact on Instruction-Level Parallelism

Consider the following code snippet with an if statement:

S2; // S2 is executed unconditionally

How Does Pipelining Work?

1. Pipeline Latency and Imbalance Among Stages

4. Pipelining Overheads in Real-World Processors

 Instruction Fetch (IF):

What is the Five-Stage Pipeline in a RISC Processor?

The Five Stages of the Pipeline

1. Instruction Fetch (IF)

2. Instruction Decode and Register Fetch (ID)

3. Execution / Effective Address Calculation (EX)

4. Memory Access (MEM)

How the Pipeline Works

Advantages of the Pipeline

Challenges in the Pipeline

1. Increased Instruction Throughput

2. Improved CPU Efficiency

3. Faster Program Execution

4. Scalability and Modularity

5. Higher Clock Speeds

6. Better Performance in Multi-Instruction Workloads

7. Reduced CPI (Cycles Per Instruction)

8. Enhanced Instruction-Level Parallelism (ILP)

Example to Illustrate the Advantage

What are Data Hazards? Consider two instructions i and j, with i

Simple Implementation Without Pipelining:

Process of Implementation of a RISC Instruction Set with Suitable Clock Cycles:

So, the overall CPI is approximately 4.54.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.