0% found this document useful (0 votes)

63 views9 pages

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

The document describes a parallel code for computing elements of a 2D matrix stored in a 1D array. The code is parallelized across 3 processors that each compute a portion of the iterations. Memory is distributed across 3 NUMA nodes, each mapped to a processor and portion of the array elements. The questions ask to: 1) Determine the directory state before one processor's task runs 2) List the coherence actions when it runs 3) Determine the directory state after it and other tasks finish

Uploaded by

Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views9 pages

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

Uploaded by

Juan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

PAR – 2nd In-Term Exam – Course 2019/20-Q1

December 18th , 2019

Problem 1 (3 points) Assume the following serial code computing a two–dimensional NxN matrix u:
void compute(int N, double *u) {
int i, j;
double tmp;
for (i = 1; i < N-1; i++)
for (j = 1; j < N-1; j++) {
tmp = u[n*(i+1) + j] + u[n*(i-1) + j] + // elements u[i+1][j] and u[i-1][j]
u[n*i + (j+1)] + u[n*i + (j-1)] - // elements u[i][j+1] and u[i][j-1]
4 * u[n*i + j]; // element u[i][j]
u[n*i + j] = tmp/4; // element u[i][j]
}
}

The code is parallelised on three processors (P0−2 ) with the assignment of iterations to processors shown on
the left part of Figure 1.
memory line
N-2
N-1

N-1
j column
0
1

0 0
1

task00 task01 task02 P0 M0

i task10 task11 task12 P1 row M1

task20 task21 task22 P2 M2

N-2
N-1 N-1

Figure 1: (Left) Assignment of iterations to processors. (Right) Mapping of array elements to memory
modules in the NUMA system.

Observe that each processor is assigned the computation of N/3 consecutive iterations of the i loop (except
iterations 0 an N-1), starting with processor P0 (the number of processors 3 perfectly divides the number of
rows and columns N); each processor executes its assigned computation in 3 tasks, each one computing N/3
consecutive iterations of the j loop. Due to the dependences in this code, you should already know that for
example task11 can only be executed by P1 once processor P0 finishes with the execution of task01 and the
same processor (P1 ) finishes with the execution of task10 .
The three processors compose a multiprocessor architecture with 3 NUMA nodes sharing the access to main
memory, each NUMA node with a single processor Pp , a cache memory (of sufficient size to store all the
lines required to execute all tasks) and portion of main memory Mp (p in the range 0–2). Each memory Mp
has an associated directory to maintain the coherence at the NUMA level. Each entry in the directory uses
2 bits for the status (M, S and U) and 3 bits in the sharers list. The rows of matrix u are distributed among
NUMA nodes as shown on the right part of Figure 1. In that figure rectangles represent the memory lines
that are involved in the computation of task11 , for a specific case in which N=24 and each cache line is able
to host 4 consecutive elements of matrix u.
We ask you:

1. Assuming U status for all memory lines at the beginning of the parallel execution, which will be the
contents in the directories for the lines shown on the right part of Figure 1 when task11 is ready
for execution? Use the provided answer sheet for this question, indicating for each memory line the
contents of the directory (e.g. S011 meaning that the line is in S state with copies in NUMA nodes 1
and 0).

2. Indicate the sequence of coherence actions (RdReq, WrReq, UpgrReq, Fetch, Invalidate, Dreply, Ack or
WriteBack ) that will occur when processor P1 executes the first iteration in task11 , i.e. the iteration
in the upper left corner of task11 on the left part of Figure 1.
3. Which will be the contents in the directories for the same lines once task11 finishes its execution?
At that time you should assume that task02 and task20 have also finished their execution. Use the
provided answer sheet for this question.
Solution: the figure on the left represents the contents in the directories before the execution (question
1.1); the figure on the right after the execution (question 1.3).
Solution for question 1.1 (before execution) Solution for question 1.3 (after execution)

0 N-1 0 N-1
0 0

M0 M0

M 001 M 001 S 011 S 011

M 010 S 011 S 001 U 000 M 010 M 010 M 010 S 011
M 010 S 010 U 000 U 000 M 010 M 010 M 010 S 010
M 010 S 010 U 000 U 000 M 010 M 010 M 010 S 010
M 010 S 010 U 000 U 000 M1 M 010 M 010 M 010 S 010 M1
M 010 S 010 U 000 U 000 M 010 M 010 M 010 S 010
M 010 S 010 U 000 U 000 M 010 M 010 M 010 S 010
M 010 S 010 U 000 U 000 M 010 M 010 M 010 S 010
M 010 S 010 U 000 U 000 S 110 M 010 M 010 S 010
U 000 U 000 S 110 S 010

M2 M2

N-1 N-1

Regarding question 1.2, processor P1 first performs several read accesses, one of them to a memory
position in state M in memory M0 ; to read it, P0 issues RdReq1→0 to the home node M0 , which
responds with the contents of the line and a Dreply0→1 . After the computation, P1 has to write one
element for which it is the home node; since the line containing that element is in S state, with copy
in P0 ’s cache, M1 has to send and Invalidate1→0 command, which is acknowledged with Ack0→1 .

Problem 2 (4 points) A ticket lock is a lock implemented using two shared counters, next_ticket and
now_serving, both initialised to 0. A thread wanting to acquire the lock uses an atomic operation to
fetch the current value of next_ticket as its unique sequence number and increments it by 1 to generate
the next sequence number. The thread then waits until now_serving is equal to its sequence number.
Releasing the lock consists on incrementing now_serving in order to pass the lock to the next waiting
thread.
Given the following data structure and incomplete implementation of the primitives that support the ticket
lock mechanism:

typedef struct {
int next_ticket;
int now_serving;
} tTicket_lock;
void ticket_lock_init (tTicket_lock *lock) {
lock->now_serving = 0; lock->next_ticket = 0;
}
void ticket_lock_acquire (tTicket_lock *lock) {
// obtain my unique sequence number from next_ticket
// generate the next_ticket sequence number
// wait until my sequence number is equal to now_serving
}
void ticket_lock_release (tTicket_lock *lock) {
lock->now_serving++;
}

We ask:

1. Complete the code for the ticket_lock_acquire primitive to be executed on two different platforms
that provide the following different atomic operations:

(a) fetch_and_inc atomic operation:

int fetch_and_inc(int *addr);
Recall that fetch_and_inc operation reads the value stored in addr, increments it by 1, stores
it in addr and returns the old value (before doing the increment).
Solution:
void ticket_lock_acquire (tTicket_lock *lock) {
int my_ticket;
my_ticket = fetch_and_inc (&lock->next_ticket);
while (lock->now_serving != my_ticket);
}

(b) load_linked and store_conditional operations:

int load_linked (int *addr);
int store_conditional (int *addr, int value);
Recall that store_conditional returns 0 in case it fails or 1 otherwise.
Solution:
void ticket_lock_acquire (tTicket_lock *lock) {
do {
int my_ticket = load_linked (&lock->next_ticket);
int last_ticket = my_ticket+1;
} while (store_conditional (&lock->next_ticket, last_ticket)==0);
while (lock->now_serving != my_ticket);
}

2. Consider the following implementation for the basic lock explained in class using test-test-and-set:

void lock_init (int *lock) {

*lock = 0;
}
void lock_acquire (int *lock) {
do {
while (*lock>0);
int res = test_and_set(lock); // stores 1 at lock address, returns old value
} while (res > 0);
}
void lock_release (int *lock) {
*lock = 0;
}
Given an SMP system with 3 processors (P0 , P1 and P2 ), each with its own cache memory initially
empty and a Snoopy-based write-invalidate MSI cache coherency protocol. Fill in the table provided in
the answer sheet indicating CPU events (PrRd or PrWr), Bus transactions (BusRd, BusRdX, BusUpgr
or Flush) and state of the line cache (M, S or I) in each cache memory after the access to memory,
assuming the indicated sequence of instructions. There are three threads, each one executing on a
different processor (Ti indicates that thread i executes on processor Pi ). The three threads try to
acquire the lock at almost the same time, following the order T0 , T1 , T2 and succeed in acquiring it in
the same order.
Solution for Problem 2.2
3. Compare the ticket lock, implemented with fetch_and_inc, with the basic lock, implemented as
shown in the previous question using test-test-and-set, in terms of coherence traffic and assuming
the same scenario described in the previous question. Hint: Take a look at the number of writes to
memory in the scenario previously proposed.
Solution: The basic lock even optimized with a test-test-and-set technique, in the worst case where
the first test allows to continue with the lock, the test-and-set could fail, generating more than one
invalidate transaction per thread on the lock variable. The ticket lock mechanism only generates one
invalidate transaction (when obtaining the ticket number) per thread.

Problem 3 (3 points)
We have the parallel code shown in the following code excerpt:

int a[N], b[N], c[N];

int i;
...
#pragma omp parallel
#pragma omp single
{
#pragma omp taskloop
for (i = 0, i < N; i++) { // Initialization
a[i] = i;
b[i] = i*i;
}
for (int iter = 0; iter < num_iters; iter++) { // Computation
#pragma omp taskloop
for (i = 0, i < N; i++)
c[i] = foo(a[i], b[i]);

#pragma omp taskloop

for (i = 0, i < N; i++) {
a[i] = goo(c[i]);
b[i] = hoo(a[i]);
}
}
}
...

Since the computation part is repeteated num_iters times we want to exploit locality in all levels of the
memory hierarchy in order to speed up the parallel execution of code. The first time a memory location is
accessed by a thread, the operating system will assign memory in the NUMA node where the thread that is
doing such first access is being executed (provided there is free space). This can help us get memory for each
thread within the memory of the NUMA node where it is executed. Therefore, accesses to memory will be
faster if we manage to have a thread execute the very same iterations in all three loops controlled by variable
i. And locality will be exploited by implementing a block data decomposition.

1. (1.5 points) Given these indications above, we ask you to write a faster parallel code for both the
initialization and computation stages using the appropriate OpenMP pragmas and invocations
to intrinsic functions, assuming the following constraints: 1) you cannot make use of the for work–
sharing construct in OpenMP to distribute work among threads; 2) you cannot assume that the
number of threads evenly divides the problem size N; 3) physical memory has not been assigned yet by
the time we reach the initialization loop; 4) parallelization overheads should always be kept as low as
possible (due to thread/task creation, synchronization, load imbalance, ...).
Solution:
We need to make sure that a thread executes the very same iterations for all the executions of each
loop, both in the initialization and computation stages.

#define N ...
int i;
...
#pragma omp parallel private(i)
{
int id = omp_get_thread_num();
int howmany = omp_get_num_threads();
int basesize = N / howmany;
int reminder = N % howmany;
int extra = id < reminder;
int extraprev= extra ? id : reminder;
int lb = id * basesize + extraprev; /* Loop lower bound */
int ub = lb + basesize + extra; /* Loop upper bound */

for (i = lb, i < ub; i++) { /* Initialization */

a[i] = i;
b[i] = i*i;
}

for (int iter = 0; iter < num_iters; iter++) { /* Computation */

for (i = lb, i < ub; i++) {
c[i] = foo(a[i], b[i]);
}
for (i = lb, i < ub; i++) { /* This loop could be fused with the one
above to have a single inner loop */
a[i] = goo(c[i]);
b[i] = hoo(a[i]);
}
}

}
...

After the previous code finishes its computation the code continues with the following function call:

...
final_processing (a, b, c);
...

where function final_processing is defined as follows

void final_processing (int a, int b, int *c) {

for (int i = 0; i < N; i++) c[i] = zoo(a[i], b[i], c[i]);
}

Contrary to the computation stage studied above, the loop that appears in the final_processing
stage is executed only once. However, we need to solve another problem, namely, that function zoo presents
a highly variable execution time depending on the actual numerical values received as inputs. Consequently,
the block distribution used in the previous stages could easily cause load imbalance in this final processing
stage. Additionally, assume cache lines are 64 bytes long and integers occupy 4 bytes.
2. (1.5 points) We ask you to provide an efficient parallelization of the loop above. As before, you are
not allowed to make use of the for work–sharing construct in OpenMP to distribute work among
threads.
Solution:
We are interested in avoiding load imbalance while avoiding false sharing in the accesses to vector c.
For this, we parallelize the loop following a block-cyclic geometric data decomposition for the output
vector c. The block size should allow the processor to use all the elements in a cache line, benefiting
from spatial locality and avoiding false sharing. To implement the block-cyclic decomposition we need
two nested loops, with an outer loop jumping the blocks cyclically and an inner loop traversing all the
elements in each block, as follows:

#define min(a,b) ( (a) > (b) ? (b) : (a) )

#define CACHE_LINE_SIZE 64

...
#pragma omp parallel
{
int id = omp_get_thread_num();
int howmany = omp_get_num_threads();
/* Computation of the block size */
int block_size = CACHE_LINE_SIZE / sizeof(int);

/* Loop used to jump blocks cyclically */

for (int ii = id*block_size; ii < N; ii+=(howmany*block_size))
/* Loop used to traverse each block */
for (int i = ii; i < min(N, ii+block_size); i++)
c[i] = zoo(a[i], b[i], c[i]);
}
...
Student name: ………………………………………………………………………………

Answer sheet for question 1.1 (before execution) Answer sheet for question 1.3 (after execution)

0 N-1 0 N-1
0 0

M0 M0

M1 M1

M2 M2

N-1 N-1

Directory entry for memory line (for example S011 to Directory entry for memory line (for example S011 to
Indicate line in state S with copies in nodes 1 and 0. Indicate line in state S with copies in nodes 1 and 0.
Student name: ………………………………………………………………………………

P0 P1 P2
cache cache cache
Bus Bus Bus
steps instructions CPU event line CPU event line CPU event line lock
transaction transaction transaction
status status status
Initially I I I 0
T0 tries to acquire lock load lock
T1 tries to acquire lock load lock
T2 tries to acquire lock load lock
T0 adquires lock t&s lock
T1 tries to acquire lock t&s lock
T2 tries to acquire lock t&s lock
T1 tries to acquire lock load lock
T2 tries to acquire lock load lock
T0 release lock store lock
T1 tries to acquire lock load lock
T2 tries to acquire lock load lock
T1 adquires lock t&s lock
T2 tries to acquire lock t&s lock
T1 release lock store lock
T2 tries to acquire lock load lock
T2 adquires lock t&s lock
T2 release lock store lock

Indepth Study of Consumer Buying Behaviour of Home Textiles in India
100% (2)
Indepth Study of Consumer Buying Behaviour of Home Textiles in India
61 pages
MySQL Exercises & Solutions
80% (10)
MySQL Exercises & Solutions
61 pages
Database Management System (Data Modelling) Answer Key - Activity 2
No ratings yet
Database Management System (Data Modelling) Answer Key - Activity 2
17 pages
Sample Midterm
No ratings yet
Sample Midterm
14 pages
Data Scientist Cover Letter PDF
100% (2)
Data Scientist Cover Letter PDF
5 pages
Database Management System Answer Key - Activity 1
No ratings yet
Database Management System Answer Key - Activity 1
10 pages
SMP, MPP For Olap
100% (2)
SMP, MPP For Olap
10 pages
Sqlite Internals PDF
No ratings yet
Sqlite Internals PDF
124 pages
30 Must Know Data Analyst SQL Interview Questions
No ratings yet
30 Must Know Data Analyst SQL Interview Questions
15 pages
SqlConnection Class (System - data.SqlClient)
100% (1)
SqlConnection Class (System - data.SqlClient)
8 pages
Unit II - Data Science
No ratings yet
Unit II - Data Science
113 pages
Operating System - Weekly Test 04 - Test Paper
No ratings yet
Operating System - Weekly Test 04 - Test Paper
6 pages
DR Ayo - Patriarchy and Female Objectification in Selected Proverbs of Pete Edochie
No ratings yet
DR Ayo - Patriarchy and Female Objectification in Selected Proverbs of Pete Edochie
41 pages
Assignment 12 Solution
No ratings yet
Assignment 12 Solution
20 pages
Basic Terms in Statistics
No ratings yet
Basic Terms in Statistics
7 pages
Exercise Synchronization Locks S2025
No ratings yet
Exercise Synchronization Locks S2025
8 pages
Oracle Passit4sure 1z0-062 v2017-09-14 by Johan 183q
No ratings yet
Oracle Passit4sure 1z0-062 v2017-09-14 by Johan 183q
145 pages
cs330 Endsems 2023
No ratings yet
cs330 Endsems 2023
7 pages
PAR - Final Exam: Part 1 - Course 2020/21-Q1 January 18, 2020
No ratings yet
PAR - Final Exam: Part 1 - Course 2020/21-Q1 January 18, 2020
5 pages
Perform DB System Test Exam
No ratings yet
Perform DB System Test Exam
2 pages
This Is Examined First Whenever The Processor Tries To Read Data From The Main Memory.
No ratings yet
This Is Examined First Whenever The Processor Tries To Read Data From The Main Memory.
32 pages
Mid 18
No ratings yet
Mid 18
3 pages
Operating System Gate Questions
No ratings yet
Operating System Gate Questions
11 pages
CPS 110 Problem Set #1
No ratings yet
CPS 110 Problem Set #1
7 pages
PAR Control1 2016 Q1
No ratings yet
PAR Control1 2016 Q1
6 pages
Os Cca 2
No ratings yet
Os Cca 2
18 pages
Solved Verification Test Papers
No ratings yet
Solved Verification Test Papers
51 pages
Bank Asia
No ratings yet
Bank Asia
19 pages
HW 3
No ratings yet
HW 3
12 pages
Manual Plaxis Network
No ratings yet
Manual Plaxis Network
11 pages
Solutions To Exercises On Parallelism and Concurrency
No ratings yet
Solutions To Exercises On Parallelism and Concurrency
5 pages
Sp23 Mt1 Solutions
No ratings yet
Sp23 Mt1 Solutions
24 pages
Data Models
No ratings yet
Data Models
28 pages
Closed Book Component
No ratings yet
Closed Book Component
2 pages
CS3103 Midterm Answer
No ratings yet
CS3103 Midterm Answer
16 pages
COS226 2022 - Semester Test 1
No ratings yet
COS226 2022 - Semester Test 1
4 pages
8 Week Report
No ratings yet
8 Week Report
23 pages
PY0791 Assessment Brief 002 Research Proforma RESIT 2022-23 (1) - Tagged
No ratings yet
PY0791 Assessment Brief 002 Research Proforma RESIT 2022-23 (1) - Tagged
8 pages
OS Midterm
No ratings yet
OS Midterm
8 pages
CS3214 Spring 2022 Test 2 Solution
No ratings yet
CS3214 Spring 2022 Test 2 Solution
21 pages
Bca Bca 401 Database Management System 2012
No ratings yet
Bca Bca 401 Database Management System 2012
7 pages
Graded Quiz Unit 3 Attempt Review PDF
No ratings yet
Graded Quiz Unit 3 Attempt Review PDF
13 pages
Final 18
No ratings yet
Final 18
7 pages
OS Assignment#3
No ratings yet
OS Assignment#3
20 pages
Chapter 11 NTFS Concepts 1695602749
No ratings yet
Chapter 11 NTFS Concepts 1695602749
4 pages
File Handling With Linked List in C++
0% (1)
File Handling With Linked List in C++
3 pages
Week 2 SQL Queries
No ratings yet
Week 2 SQL Queries
2 pages
Os MCQ
No ratings yet
Os MCQ
21 pages
MTL458 Major 2020-21 Sem2
No ratings yet
MTL458 Major 2020-21 Sem2
4 pages
System Programming With Solution
No ratings yet
System Programming With Solution
17 pages
Eapp Tables
No ratings yet
Eapp Tables
3 pages
OS
No ratings yet
OS
12 pages
hệ điều hành
No ratings yet
hệ điều hành
18 pages
TDDI11.2022 06 03 - Updated
No ratings yet
TDDI11.2022 06 03 - Updated
6 pages
Par - 1 In-Term Exam - Course 2018/19-Q2
No ratings yet
Par - 1 In-Term Exam - Course 2018/19-Q2
9 pages
Midterm Fall2012Solutions
No ratings yet
Midterm Fall2012Solutions
6 pages
Exam2 f10 PDF
No ratings yet
Exam2 f10 PDF
15 pages
Full Exam Prep Cs 3307 PDF
No ratings yet
Full Exam Prep Cs 3307 PDF
110 pages
hw3 PDF
No ratings yet
hw3 PDF
5 pages
Unit 8 DBMS
No ratings yet
Unit 8 DBMS
82 pages
Department of Computer Science & Engineering CSL718 Architecture of High Performance Systems Major Test Solution
No ratings yet
Department of Computer Science & Engineering CSL718 Architecture of High Performance Systems Major Test Solution
8 pages
Mid Sem QP&Solution
No ratings yet
Mid Sem QP&Solution
7 pages
CS205 - Operating Systems - Spring 2019 - Final - Solution
No ratings yet
CS205 - Operating Systems - Spring 2019 - Final - Solution
9 pages
Week 1: Database Introduction
No ratings yet
Week 1: Database Introduction
6 pages
Exercises On Threads Solutions
No ratings yet
Exercises On Threads Solutions
5 pages
Number (I) Max (Number (0), Number (1),, Number (N - 1) ) +1 J 0 J N J++) (Number (J) ! 0) &&
No ratings yet
Number (I) Max (Number (0), Number (1),, Number (N - 1) ) +1 J 0 J N J++) (Number (J) ! 0) &&
3 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Fall12exam Final KAISTans
No ratings yet
Fall12exam Final KAISTans
11 pages
PHD Comprehensive Examination Department of Computer Science & Engineering
No ratings yet
PHD Comprehensive Examination Department of Computer Science & Engineering
5 pages
Final Soln 2019 PDF
No ratings yet
Final Soln 2019 PDF
16 pages
PP ATL Skills
100% (2)
PP ATL Skills
4 pages
Homework 5
No ratings yet
Homework 5
6 pages
CMSC 412 Midterm #1 (Fall 2011) - Solutions
No ratings yet
CMSC 412 Midterm #1 (Fall 2011) - Solutions
4 pages
Praxis Business School Business Analysis
No ratings yet
Praxis Business School Business Analysis
16 pages
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
No ratings yet
Practice Problems: Concurrency: Lectures On Operating Systems (Mythili Vutukuru, IIT Bombay)
38 pages
2011 Fall Midterm1 Soln CS439
No ratings yet
2011 Fall Midterm1 Soln CS439
8 pages
CECS 526 - Exercises On Concurrent Processes & Synchronization Mechanisms
No ratings yet
CECS 526 - Exercises On Concurrent Processes & Synchronization Mechanisms
3 pages
Ajay Project New
No ratings yet
Ajay Project New
24 pages
Producer Consumer
No ratings yet
Producer Consumer
16 pages
Sample Final
No ratings yet
Sample Final
12 pages
Sir Syed University of Engineering and Technology
No ratings yet
Sir Syed University of Engineering and Technology
4 pages
Solutions - OS TS1
No ratings yet
Solutions - OS TS1
3 pages
Os Endterm CT 135 QP
No ratings yet
Os Endterm CT 135 QP
2 pages
Mid 19
No ratings yet
Mid 19
3 pages
Arukiran Resume
No ratings yet
Arukiran Resume
1 page
Exam2 s09 v2
No ratings yet
Exam2 s09 v2
10 pages
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Sri Lankan International School - Jeddah: Worksheet - Monitoring & Control Systems
No ratings yet
Sri Lankan International School - Jeddah: Worksheet - Monitoring & Control Systems
4 pages
A Friendly Introduction to MATLAB Programming
From Everand
A Friendly Introduction to MATLAB Programming
Orhan Gazi
No ratings yet
Exercises in Electronics: Operational Amplifier Circuits
From Everand
Exercises in Electronics: Operational Amplifier Circuits
Roland Büchi
3/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

Uploaded by

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

Uploaded by

PAR – 2nd In-Term Exam – Course 2019/20-Q1

December 18th , 2019

task00 task01 task02 P0 M0

i task10 task11 task12 P1 row M1

task20 task21 task22 P2 M2

M 001 M 001 S 011 S 011

(a) fetch_and_inc atomic operation:

(b) load_linked and store_conditional operations:

void lock_init (int *lock) {

int a[N], b[N], c[N];

#pragma omp taskloop

for (i = lb, i < ub; i++) { /* Initialization */

for (int iter = 0; iter < num_iters; iter++) { /* Computation */

where function final_processing is defined as follows

void final_processing (int a, int b, int *c) {

#define min(a,b) ( (a) > (b) ? (b) : (a) )

/* Loop used to jump blocks cyclically */

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

Uploaded by

Par - 2 In-Term Exam - Course 2019/20-Q1: Memory Line

Uploaded by

PAR – 2nd In-Term Exam – Course 2019/20-Q1

December 18th , 2019

task00 task01 task02 P0 M0

i task10 task11 task12 P1 row M1

task20 task21 task22 P2 M2

M 001 M 001 S 011 S 011

(a) fetch_and_inc atomic operation:

(b) load_linked and store_conditional operations:

void lock_init (int *lock) {

int a[N], b[N], c[N];

#pragma omp taskloop

for (i = lb, i < ub; i++) { /* Initialization */

for (int iter = 0; iter < num_iters; iter++) { /* Computation */

where function final_processing is defined as follows

void final_processing (int *a, int *b, int *c) {

#define min(a,b) ( (a) > (b) ? (b) : (a) )

/* Loop used to jump blocks cyclically */

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

void final_processing (int a, int b, int *c) {