0% found this document useful (0 votes)

0 views15 pages

HPC - Unit 3

HIGH PERFORMANCE COMPUTING

Uploaded by

B.REENA HICET STAFF CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views15 pages

HPC - Unit 3

HIGH PERFORMANCE COMPUTING

Uploaded by

B.REENA HICET STAFF CSE

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 15

UNIT

Open MP

Open Multi-processing (OpenMP)

is a technique of parallelizing a section(s) of C/C++/Fortran code. OpenMP is also seen as
an extension to C/C++/Fortran languages by adding the parallelizing features to them. In
general, OpenMP uses a
portable

scalable
model that gives programmers a simple and flexible interface for developing parallel
applications for platforms that ranges from the normal desktop computer to the high-end
supercomputers.
THREAD Vs PROCESS
A process is created by the OS to execute a program with given resources(memory,
registers); generally, different processes do not share their memory with another. A thread
is a subset of a process, and it shares the resources of its parent process but has its own
stack to keep track of function calls. Multiple threads of a process will have access to the
same memory.
Parallel Memory Architectures
Before getting deep into OpenMP, let's revive the basic parallel memory architectures.
These are divided into three categories;

 Shared memory:
OpenMP comes under the shared memory concept. In this, different CPU's (processors)
will have access to the same memory location. Since all CPU's connect to the same
memory, memory access should be handled carefully.

Distributed memory
 Here, each CPU(processor) will have its own memory location to access and use. In
order to make them communicate, all independent systems will be connected together
using a network. MPI is based on distributed architecture.
 Hybrid: Hybrid is a combination of both shared and distributed architectures. A simple
scenario to showcase the power of OpenMP would be comparing the execution time of
a normal C/C++ program and the OpenMP program.
Steps for Installation of OpenMP
 STEP 1: Check the GCC version of the compiler
gcc --version
GCC provides support for OpenMP starting from its version 4.2.0. So if the system has
GCC compiler with the version higher than 4.2.0, then it must have OpenMP features
configured with it.

 If the system doesn't have the GCC compiler, we can use the following command
sudo apt install gcc
For more detailed support for installation, we can refer here
 STEP 2: Configuring OpenMP We can check whether the OpenMP features are
configured into our compiler or not, using the command
echo |cpp -fopenmp -dM |grep -i open

 If OpenMP is not featured in the compiler, we can configure it use using the command
sudo apt install libomp-dev
 STEP 3: Setting the number of threads In OpenMP, Before running the code, we can
initialise the number of threads to be executed using the following command. Here, we
set the number of threads to be getting executed to be 8 threads.
export OMP_NUM_THREADS=8
Running First Code in OpenMP

// OpenMP header

#include <omp.h>

#include <stdio.h>

#include <stdlib.h>

int main(int argc, char* argv[])

int nthreads, tid;

// Begin of parallel region

#pragma omp parallel private(nthreads, tid)

// Getting thread number

tid = omp_get_thread_num();
printf("Welcome to GFG from thread = %d\n",

tid);

if (tid == 0) {

// Only master thread does this

nthreads = omp_get_num_threads();

printf("Number of threads = %d\n",

nthreads);

} }}
Output:
This program will print a message which will be getting executed by various threads.
Compile:
gcc -o gfg -fopenmp geeksforgeeks.c
Execute : ./gfg

OpenMP topic: Synchronization

In the constructs for declaring parallel regions above, you had little control over in what order
threads executed the work they were assigned. This section will
discuss synchronization constructs: ways of telling threads to bring a certain order to the
sequence in which they do things.
 critical : a section of code can only be executed by one thread at a time; see 23.2.2 .
 atomic Atomic update of a single memory location. Only certain specified syntax
patterns are supported. This was added in order to be able to use hardware support for
atomic updates.
 barrier : section 23.1 .
 locks: section 23.3 .
 flush : section 23.4 .

Loop-related synchronization constructs were discussed earlier:

 ordered : section 19.6 .
 nowait : section 19.7 .

23.1 Barrier
crumb trail: > omp-sync > Barrier
A barrier defines a point in the code where all active threads will stop until all threads have
arrived at that point. With this, you can guarantee that certain calculations are finished. For
instance, in this code snippet, computation of y can not proceed until another thread has
computed its value of x .
#pragma omp parallel
{
int mytid = omp_get_thread_num();
x[mytid] = some_calculation();
y[mytid] = x[mytid]+x[mytid+1];
}
This can be guaranteed with a barrier pragma:
#pragma omp parallel
{
int mytid = omp_get_thread_num();
x[mytid] = some_calculation();
#pragma omp barrier
y[mytid] = x[mytid]+x[mytid+1];
}
23.1.1 Implicit barriers
crumb trail: > omp-sync > Barrier > Implicit barriers
Apart from the barrier directive, which inserts an explicit barrier, OpenMP has implicit
barriers \index[omp]{omp!barrier!implicit} after a work sharing construct; see
section 21.3 . Thus the following code is well defined:
#pragma omp parallel
{
#pragma omp for
for (int mytid=0; mytid<number_of_threads; mytid++)
x[mytid] = some_calculation();
#pragma omp for
for (int mytid=0; mytid<number_of_threads-1; mytid++)
y[mytid] = x[mytid]+x[mytid+1];
}
You can also put each parallel loop in a parallel region of its own, but there is some overhead
associated with creating and deleting the team of threads in between the regions.
At the end of a parallel region the team of threads is dissolved and only the master thread
continues. Therefore, there is an implicit barrier at the end of a parallel region \index[omp]
{parallel region!barrier at the end of}. This barrier behavior can be canceled with the \
indexclause{nowait} clause.
You will often see the idiom
#pragma omp parallel
{
#pragma omp for nowait
for (i=0; i<N; i++)
a[i] = // some expression
#pragma omp for
for (i=0; i<N; i++)
b[i] = ...... a[i] ......
Here the nowait clause implies that threads can start on the second loop while other threads
are still working on the first. Since the two loops use the same schedule here, an iteration that
uses a[i] can indeed rely on it that that value has been computed.
23.2 Mutual exclusion
crumb trail: > omp-sync > Mutual exclusion
Sometimes it is necessary to limit a piece of code so that it can be executed by only one tread
at a time. Such a piece of code is called a critical section , and OpenMP has several
mechanisms for realizing this.
23.2.1 Race conditions
crumb trail: > omp-sync > Mutual exclusion > Race conditions
OpenMP, being based on shared memory, has a potential for race conditions These happen
when two threads access the same data item, with at least one access a write. The problem
with race conditions is that programmer convenience runs counter to efficient execution.
For a simple example: \csnippetwithoutput{racecounter}{examples/omp/c}{race}
The basic rule about multiple-thread access of a single data item is:
Any memory location that is written by one thread, can not be read by another thread in the
same parallel region, if no synchronization is done.
To start with that last clause: any workshare construct ends with an implicit barrier , so data
written before that barrier can safely be read after it.
23.2.2 critical and atomic
crumb trail: > omp-sync > Mutual exclusion > critical and atomic
There are two pragmas for critical sections: critical and atomic . Both denote atomic
operations in a technical sense. The first one is general and can contain an arbitrary sequence
of instructions; the second one is more limited but has performance advantages.
Beginning programmers are often tempted to use critical for updates in a loop:
#pragma omp parallel
{
int mytid = omp_get_thread_num();
double tmp = some_function(mytid);
// This Works, but Not Best Solution:
#pragma omp critical
sum += tmp;
}
but this should really be done with a reduction clause, which will be far more efficient.
A good use of critical sections is doing file writes or database updates.
Exercise Consider a loop where each iteration updates a variable.
#pragma omp parallel for shared(result)
for ( i ) {
result += some_function_of(i);
}
Discuss qualitatively the difference between:
 turning the update statement into a critical section, versus
 letting the threads accumulate into a private variable tmp as above, and summing
these after the loop.

Do an Ahmdal-style quantitative analysis of the first case, assuming that you do nn iterations
on pp threads, and each iteration has a critical section that takes a fraction ff. Assume the
number of iterations nn is a multiple of the number of threads pp. Also assume the default
static distribution of loop iterations over the threads.
End of exercise
Critical sections are an easy way to turn an existing code into a correct parallel code.
However, there are performance disadvantages to critical sections, and sometimes a more
drastic rewrite is called for.
A critical section works by acquiring a lock, which carries a substantial overhead.
Furthermore, if your code has multiple critical sections, they are all mutually exclusive: if a
thread is in one critical section, the other ones are all blocked.
The problem with critical sections being mutually exclusive can be mitigated by naming
them:
#pragma omp critical (optional_name_in_parens)
On the other hand, the syntax for atomic sections is limited to the update of a single memory
location, but such sections are not exclusive and they can be more efficient, since they
assume that there is a hardware mechanism for making them critical. See the next section.
23.2.3 atomic construct
crumb trail: > omp-sync > Mutual exclusion > atomic construct
While the \indexclause{critical} construct can enclose arbitrary blocks of code, the \
indexompclause{atomic} clause has one of a limited number of forms, for which hardware
support is likely. Those consist of assigning to a variable:
x++;
// or:
x += y;
possibly combination with reading that variable:
v = x; x++;
There are various further refinements on the atomic specification:
1. omp atomic write is followed by a single assignment statement to a shared variable.
2. omp atomic read is followed by a single assignment statement from a shared variable.
3. omp atomic is equivalent to omp atomic update ; it accomodates statements such as
4. x++; x += 1.5;
5. omp atomic capture can accommodate a single statement similar to omp atomic
update , or a block that essentially combines a read and update form.

23.3 Locks
crumb trail: > omp-sync > Locks
OpenMP also has the traditional mechanism of a lock . A lock is somewhat similar to a
critical section: it guarantees that some instructions can only be performed by one process at
a time. However, a critical section is indeed about code; a lock is about data. With a lock you
make sure that some data elements can only be touched by one process at a time.
23.3.1 Routines
crumb trail: > omp-sync > Locks > Routines
Create/destroy:
void omp_init_lock(omp_lock_t *lock);
void omp_destroy_lock(omp_lock_t *lock);
Set and release:
void omp_set_lock(omp_lock_t *lock);
void omp_unset_lock(omp_lock_t *lock);
Since the set call is blocking, there is also
omp_test_lock();
Unsetting a lock needs to be done by the thread that set it.

Exercise In the following code, one process sets array A and then uses it to update B; the
other process sets array B and then uses it to update A. Argue that this code can deadlock.
How could you fix this?
#pragma omp parallel shared(a, b, nthreads, locka, lockb)
#pragma omp sections nowait
{
#pragma omp section
{
omp_set_lock(&locka);
for (i=0; i<N; i++)
a[i] = ..

omp_set_lock(&lockb);
for (i=0; i<N; i++)
b[i] = .. a[i] ..
omp_unset_lock(&lockb);
omp_unset_lock(&locka);
}
#pragma omp section
{
omp_set_lock(&lockb);
for (i=0; i<N; i++)
b[i] = ...

omp_set_lock(&locka);
for (i=0; i<N; i++)
a[i] = .. b[i] ..
omp_unset_lock(&locka);
omp_unset_lock(&lockb);
}
} /* end of sections */
} /* end of parallel region */

End of exercise
23.3.2 Example: object with atomic update
crumb trail: > omp-sync > Locks > Example: object with atomic update
OO languages such as C++ allow for syntactic simplification, for instance building the
locking and unlocking actions into the update operator.
C++ note
C Code: lockobject
Running this:
C Code: lockobjectuse
End of C++ note
23.3.3 Example: histogram / binning
crumb trail: > omp-sync > Locks > Example: histogram / binning
One simple example of the use of locks is generation of a histogram (also known as
the binning problem). A histogram consists of a number of bins, that get updated depending
on some data. Here is the basic structure of such a code:
int count[100];
float x = some_function();
int ix = (int)x;
if (ix>=100)
error();
else
count[ix]++;
It would be possible to guard the last line:
#pragma omp critical
count[ix]++;
but that is unnecessarily restrictive. If there are enough bins in the histogram, and if
the some_function takes enough time, there are unlikely to be conflicting writes. The solution
then is to create an array of locks, with one lock for each count location.
Another solution would be to give each thread a local copy of the result array, and perform a
reduction on these. See section 20.2.2 .
23.3.4 Nested locks
crumb trail: > omp-sync > Locks > Nested locks
A lock as explained above can not be locked if it is already locked. A nested lock can be
locked multiple times by the same thread before being unlocked.
 omp_init_nest_lock
 omp_destroy_nest_lock
 omp_set_nest_lock
 omp_unset_nest_lock
 omp_test_nest_lock

23.4 Relaxed memory model

crumb trail: > omp-sync > Relaxed memory model
flush
 There is an implicit flush of all variables at the start and end of a parallel region .
 There is a flush at each barrier, whether explicit or implicit, such as at the end of
a work sharing .
 At entry and exit of a critical section
 When a lock is set or unset.

23.5 Example: Fibonacci computation

crumb trail: > omp-sync > Example: Fibonacci computation
The Fibonacci sequence is recursively defined as
F(0)=1,F(1)=1,F(n)=F(n−1)+F(n−2) for n≥2.(1)(1)F(0)=1,F(1)=1,F(n)=F(n−1)+F(n−2) for n≥
2.
We start by sketching the basic single-threaded solution. The naive code looks like:
int main() {
value = new int[nmax+1];
value[0] = 1;
value[1] = 1;
fib(10);
}

int fib(int n) {
int i, j, result;
if (n>=2) {
i=fib(n-1); j=fib(n-2);
value[n] = i+j;
}
return value[n];
}
However, this is inefficient, since most intermediate values will be computed more than once.
We solve this by keeping track of which results are known:
...
done = new int[nmax+1];
for (i=0; i<=nmax; i++)
done[i] = 0;
done[0] = 1;
done[1] = 1;
...
int fib(int n) {
int i, j;
if (!done[n]) {
i = fib(n-1); j = fib(n-2);
value[n] = i+j; done[n] = 1;
}
return value[n];
}
The OpenMP parallel solution calls for two different ideas. First of all, we parallelize the
recursion by using tasks (section OpenMP topic: Synchronization :
int fib(int n) {
int i, j;
if (n>=2) {
#pragma omp task shared(i) firstprivate(n)
i=fib(n-1);
#pragma omp task shared(j) firstprivate(n)
j=fib(n-2);
#pragma omp taskwait
value[n] = i+j;
}
return value[n];
}
This computes the right solution, but, as in the naive single-threaded solution, it recomputes
many of the intermediate values.
A naive addition of the done array leads to data races, and probably an incorrect solution:
int fib(int n) {
int i, j, result;
if (!done[n]) {
#pragma omp task shared(i) firstprivate(n)
i=fib(n-1);
#pragma omp task shared(i) firstprivate(n)
j=fib(n-2);
#pragma omp taskwait
value[n] = i+j;
done[n] = 1;
}
return value[n];
}
For instance, there is no guarantee that the done array is updated later than the value array, so
a thread can think that done[n-1] is true, but value[n-1] does not have the right value yet.
One solution to this problem is to use a lock, and make sure that, for a given index n , the
values done[n] and value[n] are never touched by more than one thread at a time:
int fib(int n)
{
int i, j;
omp_set_lock( &(dolock[n]) );
if (!done[n]) {
#pragma omp task shared(i) firstprivate(n)
i = fib(n-1);
#pragma omp task shared(j) firstprivate(n)
j = fib(n-2);
#pragma omp taskwait
value[n] = i+j;
done[n] = 1;
}
omp_unset_lock( &(dolock[n]) );
return value[n];
}
This solution is correct, optimally efficient in the sense that it does not recompute anything,
and it uses tasks to obtain a parallel execution.
However, the efficiency of this solution is only up to a constant. A lock is still being set, even
if a value is already computed and therefore will only be read. This can be solved with a
complicated use of critical sections, but we will forego this.
Since the beginning of multiprocessors, programmers have faced the challenge of how to
take advantage of the power of process available. Sometimes parallelism is available but it
is present in a form that is too complicated for the programmer to think about. In addition,
there exists a large sequential code that has for years has incremental performance
improvements afforded by the advancement of single-core execution. For a long time,
automatic parallelization has been seen as a good solution to some of these challenges.
Parallelization removes the programmer's burden of expressing and understanding the
parallelism existing in the algorithm.
Loop-level parallelism in computer architecture helps us with taking out parallel tasks
within the loops in order to speed up the process. The utility for this parallelism arises
where data is stored in random access data structures like arrays. A program that runs in
sequence will iterate over the array and perform operations on indices at a time, a program
that has loop-level parallelism will use multi-threads/ multi-processes that operate on the
indices at the same time or at different times.
Loop Level Parallelism Types:
1. DO-ALL parallelism(Independent multithreading (IMT))
2. DO-ACROSS parallelism(Cyclic multithreading (CMT))
3. DO-PIPE parallelism(Pipelined multithreading (PMT))
1. DO-ALL parallelism(Independent multi-threading (IMT)):

In DO-ALL parallelism every iteration of the loop is executed in parallel and completely
independently with no inter-thread communication. The iterations are assigned to threads in
a round-robin fashion, for example, if we have 4 cores then core 0 will execute iterations 0,
4, 8, 12, etc. (see Figure). This type of parallelization is possible only when the loop does
not contain loop-carried dependencies or can be changed so that no conflicts occur between
simultaneous iterations that are executing. Loops which can be parallelized in this way are
likely to experience speedups since there is no overhead of inter-thread communication.
However, the lack of communication also limits the applicability of this technique as many
loops will not be amenable to this form of parallelization.

DO-ALL parallelism(Independent multithreading (IMT))

2. DO-ACROSS parallelism(Cyclic multi-threading (CMT)) :

In DO-ACROSS parallelism, like Independent multi-threading, assigns iterations to threads

in a round-robin manner. Optimization techniques described to increase parallelism in
Independent multi-threading loops are also available in Cyclic multi-threading. In this
technique, dependencies are identified by the compiler and the beginning of each loop
iteration is delayed till all dependencies from previous iterations are satisfied. In this
manner, the parallel portion of one iteration is overlapped with the sequential portion of the
subsequent iteration. As a result, it ends up in parallel execution. For example, in the figure
the statement x = x->next; causes a loop-carried dependence since it cannot be evaluated
until the statement has been completed in the previous iteration. Once all cores have started
their first iteration, this can approach linear speedup if the parallel part of the loop is very
large to allow full utilization of the cores.
DO-ACROSS parallelism(Cyclic multi-threading (CMT))

3. DO-PIPE parallelism(Pipeline multi-threading (PMT)):

DO-PIPE parallelism is the way for parallelization loops with cross-iteration dependencies.
In this approach, the loop body is divided into a number of pipeline stages with each
pipeline stage being assigned to a different core. Each iteration of the loop is then
distributed across the cores with each stage of the loop being executed by the core which
was assigned that pipeline stage. Each individual core only executes the code associated
with the stage which was allocated to it. For instance, in the figure the loop body is divided
into 4 stages: A, B, C, and D. Each iteration is distributed across all four cores but each
stage is only executed by one core.
DO-PIPE parallelism(Pipeline multi-threading (PMT))

Pega Customer Service Pricing Matrix PDF
No ratings yet
Pega Customer Service Pricing Matrix PDF
3 pages
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
Programming with MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
4.5/5 (3)
Parallel Programming Module 2
No ratings yet
Parallel Programming Module 2
112 pages
Lecture 06 - OpenMP
No ratings yet
Lecture 06 - OpenMP
37 pages
High Performance Computing (HPC) - Lec3
No ratings yet
High Performance Computing (HPC) - Lec3
35 pages
Presentation2 HS OpenMP
No ratings yet
Presentation2 HS OpenMP
29 pages
Openmp: Parallel Processing
No ratings yet
Openmp: Parallel Processing
40 pages
4 Openmp
No ratings yet
4 Openmp
32 pages
Omp Sync Data Runtime Environment
No ratings yet
Omp Sync Data Runtime Environment
59 pages
Unit 4 Shared-Memory Parallel Programming With Openmp
No ratings yet
Unit 4 Shared-Memory Parallel Programming With Openmp
37 pages
Open MPLecture
No ratings yet
Open MPLecture
54 pages
Updated - CS8083 MCP UNIT III Notes
No ratings yet
Updated - CS8083 MCP UNIT III Notes
26 pages
Open MP
No ratings yet
Open MP
28 pages
Openmp Overview
No ratings yet
Openmp Overview
74 pages
10 OpenMP-2
No ratings yet
10 OpenMP-2
25 pages
Openmp HPC Ass1
No ratings yet
Openmp HPC Ass1
43 pages
Xe 62011 Open MP
No ratings yet
Xe 62011 Open MP
46 pages
Omp Hands On SC08 PDF
No ratings yet
Omp Hands On SC08 PDF
153 pages
Omp Hands On SC08
No ratings yet
Omp Hands On SC08
153 pages
OpenMP 01 Introduction
No ratings yet
OpenMP 01 Introduction
70 pages
OpenMP Examples
No ratings yet
OpenMP Examples
12 pages
OMP Common Core-Voss
No ratings yet
OMP Common Core-Voss
217 pages
Unit 3
No ratings yet
Unit 3
13 pages
Introduction To OpenMP
No ratings yet
Introduction To OpenMP
46 pages
OpenMP Basics
No ratings yet
OpenMP Basics
47 pages
OpenMPSlides Tamu SC PDF
No ratings yet
OpenMPSlides Tamu SC PDF
74 pages
Lec 12 OpenMP
No ratings yet
Lec 12 OpenMP
152 pages
Openmp 6pp
No ratings yet
Openmp 6pp
5 pages
Openmp
No ratings yet
Openmp
21 pages
OpenMP P1
No ratings yet
OpenMP P1
32 pages
OpenMP Tutorial
100% (1)
OpenMP Tutorial
82 pages
OpenMP 2
No ratings yet
OpenMP 2
3 pages
Chapter 3 - Shared-Memory Programming, OpenMP
No ratings yet
Chapter 3 - Shared-Memory Programming, OpenMP
65 pages
Parallel Programming Unit 2
No ratings yet
Parallel Programming Unit 2
71 pages
NGK Openmp
No ratings yet
NGK Openmp
13 pages
A Tutorial On Parallel Computing On Shared Memory Systems
No ratings yet
A Tutorial On Parallel Computing On Shared Memory Systems
23 pages
Beginning OpenMP
No ratings yet
Beginning OpenMP
20 pages
Openmp Programming: Aiichiro Nakano
No ratings yet
Openmp Programming: Aiichiro Nakano
10 pages
Mcap-Lab Manual 1
No ratings yet
Mcap-Lab Manual 1
19 pages
Shared Memory Parallel Programming: Introduction To Openmp
No ratings yet
Shared Memory Parallel Programming: Introduction To Openmp
39 pages
Introduction To Open MP
No ratings yet
Introduction To Open MP
42 pages
Lect11 Openmp1
No ratings yet
Lect11 Openmp1
35 pages
CS-3006 8 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 8 UsingOpenMP SharedMemoryProgramming
61 pages
OpenMP Workshop Day 1
No ratings yet
OpenMP Workshop Day 1
49 pages
Program Excecution ExpFinal
No ratings yet
Program Excecution ExpFinal
10 pages
Shared Memory: Openmp Environment and Synchronization
No ratings yet
Shared Memory: Openmp Environment and Synchronization
32 pages
DS1822-Parallel Computing - Unit2
No ratings yet
DS1822-Parallel Computing - Unit2
25 pages
CP4253 Map Unit Iii
No ratings yet
CP4253 Map Unit Iii
26 pages
Lecture Open MP
No ratings yet
Lecture Open MP
35 pages
Mpsoc Architectures Openmp
No ratings yet
Mpsoc Architectures Openmp
35 pages
Openmp 1
No ratings yet
Openmp 1
38 pages
Ipc - Assig 1
No ratings yet
Ipc - Assig 1
9 pages
CS-3006 5 UsingOpenMP SharedMemoryProgramming
No ratings yet
CS-3006 5 UsingOpenMP SharedMemoryProgramming
76 pages
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
No ratings yet
Govindarajan - ParallelizationPrinciples NSM AstroPhysics
50 pages
OpenMP Presentation
No ratings yet
OpenMP Presentation
51 pages
About OpenMP
No ratings yet
About OpenMP
86 pages
CS8083 UNIT III Notes
No ratings yet
CS8083 UNIT III Notes
26 pages
The Complete Future Trait Guide
From Everand
The Complete Future Trait Guide
Hamze Ghalebi
No ratings yet
Node.js 63 Interview Questions and Answers
From Everand
Node.js 63 Interview Questions and Answers
John Edward Cooper Berg
No ratings yet
Node.js: Tools & Skills
From Everand
Node.js: Tools & Skills
James Hibbard
No ratings yet
The Mac Terminal Reference and Scripting Primer
From Everand
The Mac Terminal Reference and Scripting Primer
Jay Docherty
4.5/5 (3)
HPC Part B
No ratings yet
HPC Part B
5 pages
NT Test
No ratings yet
NT Test
11 pages
IDA Pro Function Analysis and Graphing Part4
No ratings yet
IDA Pro Function Analysis and Graphing Part4
10 pages
NOTES Unit - II
No ratings yet
NOTES Unit - II
36 pages
CMSC 449 - Lec04 - Packed Malware
No ratings yet
CMSC 449 - Lec04 - Packed Malware
10 pages
Secure Hash Algorithm
No ratings yet
Secure Hash Algorithm
4 pages
Cryptography Notes UNIT 3
No ratings yet
Cryptography Notes UNIT 3
21 pages
Misrate2 10.94.141.55 GG
No ratings yet
Misrate2 10.94.141.55 GG
141 pages
STQA Unit I
No ratings yet
STQA Unit I
18 pages
Addressing Modes of 8086 MR Binu Joy 2
No ratings yet
Addressing Modes of 8086 MR Binu Joy 2
17 pages
Resume Tata Consultancy Services
No ratings yet
Resume Tata Consultancy Services
3 pages
Lab Task 3-Question
No ratings yet
Lab Task 3-Question
4 pages
Instructions For Installing VisionMasterFT in The Field
No ratings yet
Instructions For Installing VisionMasterFT in The Field
38 pages
FortiNAC Demo Walkthrough
No ratings yet
FortiNAC Demo Walkthrough
13 pages
Computer Networks - The International Journal of Computer and Telecommunications Networking Home
No ratings yet
Computer Networks - The International Journal of Computer and Telecommunications Networking Home
6 pages
Lista Precios202204
No ratings yet
Lista Precios202204
5 pages
About Me Sample of PPT WD
No ratings yet
About Me Sample of PPT WD
6 pages
PL 400 Demo
No ratings yet
PL 400 Demo
25 pages
GE8151 Python Programming - Unit I Question Bank With Sample Code
100% (1)
GE8151 Python Programming - Unit I Question Bank With Sample Code
25 pages
2024 Igcse
No ratings yet
2024 Igcse
24 pages
Error Detecting and Correcting Codes: Appendix A
No ratings yet
Error Detecting and Correcting Codes: Appendix A
143 pages
Op
No ratings yet
Op
259 pages
Anjalie G@sliit LK
No ratings yet
Anjalie G@sliit LK
10 pages
Re-Dim: Handle Exceptions in QTP
No ratings yet
Re-Dim: Handle Exceptions in QTP
12 pages
Amiga Floppy Drive Compatibility List
No ratings yet
Amiga Floppy Drive Compatibility List
3 pages
Exchange Server 2010 Introduction To Supporting Administration
No ratings yet
Exchange Server 2010 Introduction To Supporting Administration
101 pages
Oacon LMI3d Lazer Profil Ve Snapshot Sensor
No ratings yet
Oacon LMI3d Lazer Profil Ve Snapshot Sensor
28 pages
Quiz Questions For Chapter 1
No ratings yet
Quiz Questions For Chapter 1
19 pages
Data Structures & Algorithms
No ratings yet
Data Structures & Algorithms
5 pages
Datasheet of DS 7732NI K4 NVR E - V4.71.400 - 20221017
No ratings yet
Datasheet of DS 7732NI K4 NVR E - V4.71.400 - 20221017
5 pages
NPTEL CC Assignment11
33% (3)
NPTEL CC Assignment11
4 pages
Chapter 7
No ratings yet
Chapter 7
101 pages
SQL Questions For Journal
No ratings yet
SQL Questions For Journal
9 pages
Untitled Document
No ratings yet
Untitled Document
7 pages
Operating Systems & Gui: in This Lesson Students Will: Get Familiar With The Following Terms
No ratings yet
Operating Systems & Gui: in This Lesson Students Will: Get Familiar With The Following Terms
9 pages
SecureDevOps Project1
No ratings yet
SecureDevOps Project1
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

HPC - Unit 3

Uploaded by

HPC - Unit 3

Uploaded by

UNIT

Open Multi-processing (OpenMP)

int main(int argc, char* argv[])

int nthreads, tid;

// Begin of parallel region

#pragma omp parallel private(nthreads, tid)

// Getting thread number

// Only master thread does this

printf("Number of threads = %d\n",

OpenMP topic: Synchronization

Loop-related synchronization constructs were discussed earlier:

23.4 Relaxed memory model

23.5 Example: Fibonacci computation

DO-ALL parallelism(Independent multithreading (IMT))

2. DO-ACROSS parallelism(Cyclic multi-threading (CMT)) :

In DO-ACROSS parallelism, like Independent multi-threading, assigns iterations to threads

3. DO-PIPE parallelism(Pipeline multi-threading (PMT)):

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.