Mod5 - Aca 1 52
Mod5 - Aca 1 52
Process B
Shared Variable
Process A
in common
memory
Process C
Critical Section
• Critical Section(CS) is a code segment
accessing shared variable, which must be
executed by only one process at a time and
which once started must be completed
without interruption.
Critical Section Requirements
• It should satisfy following requirements-:
Mutual Exclusion
At most one process executing CS at a time.
No deadlock in waiting
No circular wait by 2 or more process.
No preemption
No interrupt until completion.
Eventual Entry
Once entered CS,must be out after completion.
Protected Access
• Granularity of CS affects the performance.
• If CS is too large,it may limit parallism due to
excessive waiting by process.
• When CS is too small,it may add unnecessary
code complexity/Software overhead.
4 operational Modes
• Multiprogramming
• Multiprocessing
• Multitasking
• Multithreading
Multiprogramming
• Multiple independent programs running on
single processor/multiprocessor by time
sharing use of system resource.
• When program enters the I/O mode, the
processor switches to another program.
Multiprocessing
• When multiprogramming is implemented at the
process level on a multiprocessor, it is called
multiprocessing.
• 2 types of multiprocessing-:
If interprocessor communication are handled at
the instruction level, the multiprocessor operates
in MIMD mode.
If interprocessor communication are handled at
the program,subroutine or procedure level, the
multiprocessor operates in MPMD mode.
Multitasking
• A single program can be partitioned into
multiple interrelated tasks concurrently
executed on a multiprocessor.
• Thus multitasking provides the parallel
execution of 2 or more parts of single
program.
Multithreading
• The traditional UNIX/OS has a single threaded
kernal in which 1 process can receive OS
kernal service at a time.
• In multiprocessor we extend single kernal to
be multithreaded.
• The purpose is to allow multiple threads of
light weight processes to share same address
space.
Partitioning and Replication
• Goal of parallel processing is to exploit
parallelism as much as possible with lowest
overhead.
• Program partitioning is a technique for
decomposing a large program and data set
into many small pieces for parallel execution
by multiple processors.
• Program partitioning involves both
programmers and compilers.
Partitioning and Replication
• Program replication refers to duplication of
same program code for parallel execution on
multiple processors over different data sets.
Scheduling and Synchronization
• Scheduling further classified-:
Static Scheduling
• It is conducted at post compile time.
• Its advantage is low overhead but shortcomings is a
possible mismatch with run time profile of each task.
Dynamic Scheduling
• Catches the run time conditions.
• Requires fast context switching,premption and much
more OS support.
• Advantage include better resource utilization at
expense of highest scheduling overhead.
Cache Coherence & Protection
• Multicache coherance problem demands an
invalidation or update after each write
operation.
Message Passing Model
• Two processes D and E residing at different
processor nodes may communicate with each
other by passing messages through a direct
network.
• The messages may be instructions, data,
synchronization or interrupt signals etc.
• Multicomputers are considered loosely
coupled multiprocessors.
IPC using Message Passing
Message(Send/Recieve)
Process D Process E
Synchronous Message Passing
• No shared Memory
• No mutual Exclusion
• Synchronization of sender and reciever
process just like telephone call.
• No buffer used.
• If one process is ready to cummunicate and
other is not, the one that is ready must be
blocked.
Asynchronous Message Passing
• Does not require that message sending and
receiving be synchronised in time and space.
• Arbitrary communication delay may be
experienced because sender may not know if and
when the message has been received until
acknowledgement is received from receiver.
• This scheme is like a postal service using mailbox
with no synchronization between senders and
recievers.
Data Parallel Model
• Used in SIMD computers
• Parallelism handled by hardware
synchronization and flow control.
• Fortran 90 ->data parallel lang.
• Require pre-distributed data sets.
Data Parallelism
• This technique used in array processors(SIMD)
• Issue->match problem size with machine size.
Array Language Extensions
• Various data parallel language used
• Represented by high level data types
• CFD for Illiac 4,DAP fortran for Distributed
array processor, C* for Connection machine
• Target to make the number of PE’s of problem
size.
Object Oriented Model
• Objects dynamically created and manipulated.
• Processing is performed by sending and
receiving messages among objects.
Concurrent OOP
• Need of OOP because of abstraction and
reusability concept.
• Objects are program entities which
encapsulate data and operations in single unit.
• Concurrent manipulation of objects in OOP.
Actor Model
• This is a framework for Concurrent OOP.
• Actors->independent component
• Communicate via asynchronous message
passing.
• 3 primitives->create, send to and become.
Parallelism in COOP
• 3 common patterns for parallelism-:
1)Pipeline concurrency
overlapped enumeration of successive solutions and
concurrent testing of solutions
2)Divide and conquer
concurrent elaboration of different subprograms and
combining of their solutions to produce a overall problem
solution
3)Cooperative Problem Solving
aims at mutually-supported agreements, improved
relationships, and continued problem-solving capacity among
the parties.
Functional and logic Model
• Functional Programming Language->
Lisp,Sisal and Strand 88.
Logic Programming Language->
Concurrent Prolog and Parlog
Functional Programming Model
• Should not produce any side effects.
• No concept of storage, assignment and
branching.
• Single assignment and data flow language
functional in nature.
Logic Programming Models
• Used for knowledge processing from large
database.
• Supports implicitly search strategy.
• AND-parallel execution and OR-Parallel
Reduction technique used.
• Used in artificial intelligence
10.2 Parallel Language and Compilers
• Programming environment is collection of s/w
tools and system support.
– Parallel Software Programming environment
needed.
• Users still forced to focus on hardware details
rather than parallelism using high level
abstraction.
10.2.1 Language Features For
Parallelism
• Optimization Features
• Availability Features
• Synchronization/communication Features
• Control Of Parallelism
• Data Parallelism Features
• Process Management Features
Optimization Features
• Theme->Conversion of sequential Program to
Parallel Program.
• The purpose is to match s/w parallelism with
hardware parallelism.
• Software in Practice-:
1)Automated Parallelizer
Express C automated parallelizer and Allaint FX
Fortran compiler.
2)Semiautomated Parallizer
Needs compiler directives or programmers
interaction.
3) Interactive restructure support
static analyzer, run-time statistics and code
translator for restructuring.
Availability Features
• Theme-:Enhance user friendliness, make
language portable for large no of parallel
computers and expand the applicability of
software libraries.
1)Scalability
Language should be scalable to number of
processors and independent of hardware
topology.
2)Compatibility
Compatible with sequential language.
3)Portability
Language should be portable to shared memory
multiprocessor, message passing or both.
Synchronization/Communication
Features
• Shared Variable (locks) for IPC.
• Single assignment language.
• Send/receive for message passing.
• Logical shared memory such as the row space
in Linda.
• Remote procedure call.
• Data flow languages such as id.
• Mailbox, Semaphores, Monitors
Control Of Parallelism
• Coarse, Medium and fine grain
• Explicit vs implicit parallelism
• Global Parallelism
• Loop Parallelism
• Task Parallelism
• Divide and Conquer Parallelism
Data Parallelism Features
Theme-:how data is accessed and distributed in
either SIMD and MIMD computers.
1)Runtime automatic decomposition
Data automatically distributed with no user
interaction.
2)Mapping Specification
User specifies patterns and input data mapped
to hardware.
3) Virtual Processor Support
Compilers made statically and maps to physical
processor.
4) Direct Access to shared data
Shared data is directly accessed by operating
system.
Process Management Features
Theme-:
Support efficient creation of parallel process,
implementation of multithreading or
multitasking, program partitioning and
replication and dynamic load balancing at run
time.
1)Dynamic Process Creation at Run Time.
2)Creation of lightweight processes.
3)Replication technique.
4)Partitioned Networks.
5)Automatic Load Balancing
10.2.3 Optimizing Compilers for
Parallelism
• Role of compiler to remove burden of
optimization and generation.
3 Phases-:
1)Flow analysis
2)Optimization
3)Code Generation
Flow Analysis
• Reveals design flow patters to determine data
and control dependencies.
• Flow analysis carried at various execution
levels.
1)Instruction level->VLSI or superscaler
processors.
2)Loop level->Simd and systolic computer
3)Task level->Multiprocessor/Multicomputer
Program Optimization
• Transformation of user program to explore
hardware capability.
• Explores better performance.
• Goal to maximize speed of code execution.
• To minimize code length.
• Local and global optimizations.
• Machine dependent Transformation
Parallel Code Generation
• Compiler directive can be used to generate
parallel code.
• 2 optimizing compilers-:
1)Parafrase and Parafrase 2
2)PFC and Parascope
Parafrase and Parafrase2
• Transforms sequential programs of fortran 77
into parallel programs.
• Parafrase consists of 100 program that are
encoded and passed.
• Pass list indentifies dependencies and
converts it to concurrent program.
• Parafrase2 for c and pascal in extension to
fortran.
PFC and ParaScope
• Translates fortran 77 to fortran 90 code.
• PFC package extended to PFC + for parallel
code generation on shared memory
multiprocessor.
• PFC performs analysis as following steps
below-:
1)Interprocedure Flow analysis using call graph
2)Transformation (do-loop normalization etc)
3)dependence analysis
4)Vector Code Generation