0% found this document useful (0 votes)
111 views3 pages

Java Practise Exercise

This document provides instructions for four exercises in advanced parallel computing. Exercise 1 involves optimizing matrix multiplication and calculating matrix norms in C/C++ to improve cache usage. Exercise 2 introduces OpenMP and has students write a simple parallel "Hello World" program. Exercise 3 involves parallelizing a SAXPY operation in C++ using OpenMP work sharing directives. Exercise 4 instructs students to calculate the dot product of vectors in parallel.

Uploaded by

sivaterror
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views3 pages

Java Practise Exercise

This document provides instructions for four exercises in advanced parallel computing. Exercise 1 involves optimizing matrix multiplication and calculating matrix norms in C/C++ to improve cache usage. Exercise 2 introduces OpenMP and has students write a simple parallel "Hello World" program. Exercise 3 involves parallelizing a SAXPY operation in C++ using OpenMP work sharing directives. Exercise 4 instructs students to calculate the dot product of vectors in parallel.

Uploaded by

sivaterror
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Advanced Parallel Computing for Scientific

Applications
Autumn Term 2010
Prof. I. F. Sbalzarini Prof. P. Arbenz
ETH Zentrum, CAB G34 ETH Zentrum, CAB H89
CH-8092 Zürich CH-8092 Zürich

Exercise 3
Release: 12. Oct 2010
Due: 26 Oct. 2010

1 Practice in C/C++
The following two assignments illustrate the effects of caching in matrix operations. C
uses row major memory layout for storing matrices and high dimensional arrays. Hence
row-wise access of elements is more cache efficient than column-wise access.

Question 1: Matrix multiplication


The file matrixMult.cpp contains a program to find the execution time for matrix multi-
plication.

C =A·B

Each matrix is stored as a 1-D array. The multiplication is performed in the method void
Multiply(...). To calculate each element in C, the elements of A are accessed row-wise
and that of B are accessed column-wise resulting in several cache misses especially for
large matrix sizes. A better cache usage can be achieved if the matrix B is transposed
and the matrix multiplication operation is modified accordingly to give the same result as
before. Your task is to implement the methods void InPlaceTranspose(...) and void
MultiplyEfficient(...).
Compile your code using the default GNU compiler: g++ -o mult matrixMult.cpp
Do you observe better performance in the case of large matrices?

Question 2: Matrix norm


You have to calculate the 1-norm and infinity norm of a matrix A of size m × n given by
m
X
||A||1 = max |aij |
1≤j≤n
i=1

n
X
||A||∞ = max |aij |
1≤i≤m
j=1

1
Implement the above equations in the appropriate methods double Norm 1() and double
Norm Inf() in the file matrixNorm.cpp. Count the floating point operations in the calcu-
lation and compute Mflop/s for different matrix sizes n.
Compile your code using : g++ -o norm matrixNorm.cpp
Which of the above norms is calculated faster and why?

In each of the above examples, time measurement is done using the method double walltime(...)
which is implemented in the file walltime.h
Please submit the jobs to the batch queue as follows:

bsub -o <op_file> ./<executable>

2 Introduction to OpenMP
1. OpenMP is an application programming interface that provides a parallel program-
ming model for shared memory and distributed shared memory multiprocessors.

2. OpenMP is based on the Fork/Join Execution Model : An OpenMP program starts


as a single thread (master) and additional threads are created when the master hits
a parallel region.

3. There is a standard include file omp.h for C/C++ OpenMP programs.

4. The number of threads is fixed a priori by the programmer using environment


variable OMP NUM THREADS

5. omp get num threads() and omp get thread num() can be used to get the number
of threads created and the local number assigned to each thread.

6. The directive #pragma omp parallel in the program marks the beginning of parallel
section.

7. The keywords used for distributing work among threads are for, sections, critical
etc

Question 3: First OpenMP program


Using the above information, write a simple program that will create n = 2, 4, 6 threads and
each thread will display one of the following messages along with its own thread number.

This is Advanced Parallel Computing tutorial.


This is the first OpenMP program.
This program uses n threads
Hello World

2
Compile the program using the GNU compiler as follows:

gcc -lgomp -fopenmp -o omp1 omp1.c

Question 4: Work sharing among OpenMP threads


The file vectorAdd.cpp contains the code for serial and parallel execution of SAXPY oper-
ation along with time measurement.
a) Compile the program and execute it using say 4 threads. Why is there no speedup?
Modify the code in order to achieve appreciable speedup.
b) Write a code to calculate dot product ā.b̄ in parallel.

You may submit the jobs to the batch queue as follows:

bsub -n N -o <op_file> ./<executable>


{N is the number of processors}

Do not forget to set the environment variable OMP NUM THREADS before execution.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy