Chapter 8_Parallel Processing
Chapter 8_Parallel Processing
PARALLEL PROCESSING
+
Multiple Processor Organization
Figure 8.3
Figure 8.4
Symmetric Multiprocessor
Organization
Figure 8.5
+
The bus organization has several
attractive features:
Simplicity
Simplest approach to multiprocessor organization
Flexibility
Generally easy to expand the system by attaching more
processors to the bus
Reliability
The bus is essentially a passive medium and the failure of any
attached device should not cause failure of the whole system
+
Disadvantages of the bus organization:
Scheduling
Any processor may perform scheduling so conflicts must be avoided
Scheduler must assign ready processes to available processors
Synchronization
With multiple active processes having potential access to shared address spaces or I/O resources, care must be
taken to provide effective synchronization
Synchronization is a facility that enforces mutual exclusion and event ordering
Memory management
In addition to dealing with all of the issues found on uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
Paging mechanisms on different processors must be coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement
Modified
The line in the cache has been modified and is available only in
this cache
Exclusive
The line in the cache is the same as that in main memory and is
not present in any other cache
Shared
The line in the cache is the same as that in main memory and may
be present in another cache
Invalid
The line in the cache does not contain valid data
Table 8.1
MESI Cache Line States
MESI State Transition Diagram
Figure 8.6
+
Multithreading and Chip
Multiprocessors
Processor performance can be measured by the rate at which it
executes instructions
Multithreading
Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption
Instruction stream is divided into several smaller streams, known as
threads, that can be executed in parallel
Definitions of Threads
and Processes Thread in multithreaded
processors may or may not be
the same as the concept of
software threads in a
multiprogrammed operating
system
Thread:
• Dispatchable unit of work within a Process:
process • An instance of program running on
• Includes processor context (which computer
includes the program counter and • Two key characteristics:
stack pointer) and data area for stack
• Resource ownership
• Executes sequentially and is
interruptible so that the processor can • Scheduling/execution
turn to another thread
Process switch
• Operation that switches the processor
from one process to another by saving all
the process control data, registers, and
other information for the first and
replacing them with the process
information for the second
Implicit and Explicit
Multithreading
All commercial processors and most
experimental ones use explicit multithreading
Concurrently execute instructions from different
explicit threads
Interleave instructions from different threads on
shared pipelines or parallel execution on parallel
pipelines
Approaches to
Executing Multiple
Threads
Figure 8.7
+
Example Systems
Figure 8.8
Clusters
Alternative to SMP as an approach to providing
high performance and high availability
Defined as:
A group of interconnected whole computers working
together as a unified computing resource that can
create the illusion of being one machine
(The term whole computer means a system that can run
on its own, apart from the cluster)
Cluster
Configurations
Figure 8.9
Table 8.2
Clustering Methods: Benefits and Limitations
+
Operating System Design Issues
Two approaches:
Highly available clusters
Fault tolerant clusters
Failover
The function of switching applications and data resources over from a failed system
to an alternative system in the cluster
Failback
Restoration of applications and data resources to the original system once it
has been fixed
Load balancing
Incremental scalability
Automatically include new computers in scheduling
Middleware needs to recognize that processes may switch between machines
Parallelizing Computation
Figure 8.10
Example
100-Gbps
Ethernet
Configuration
for Massive
Blade Server
Site
Figure 8.11
+
Clusters Compared to SMP
Both provide a configuration with multiple processors to
support high demand applications
Both solutions are available commercially
SMP Clustering
Easier to manage and Far superior in terms of
configure incremental and absolute
scalability
Much closer to the original
single processor model for Superior in terms of
which nearly all applications availability
are written
All components of the system
Less physical space and lower can readily be made highly
power consumption redundant
CC-NUMA
Organization
Figure 8.12
+
NUMA Pros and Cons
Array processor
Designed to address the need for vector computation
Configured as peripheral devices by both mainframe and minicomputer users
to run the vectorized portions of programs
Vector Addition Example
Figure 8.13
+
Matrix Multiplication
(C = A * B)
Figure 8.14
+
Approaches to
Vector
Computation
Figure 8.15
+
Pipelined Processing
of Floating-Point
Operations
Figure 8.16
A Taxonomy of
Computer Organizations
Figure 8.17
+
Figure 8.18
+
Alternative
Programs
for Vector
Calculation
Figure 8.19
+
Figure 8.20
Table 8.3
IBM 3090 Vector Facility:
Arithmetic and Logical Instructions
+ Summary Parallel
Processing
Chapter 8
Multithreading and chip multiprocessors
Implicit and explicit multithreading
Approaches to explicit multithreading
Multiple processor organizations Example systems
Types of parallel processor systems
Clusters
Parallel organizations
Cluster configurations
Symmetric multiprocessors Operating system design issues
Cluster computer architecture
Organization
Blade servers
Multiprocessor operating system
design considerations Clusters compared to SMP