Model Order Reduction Via Matlab Parallel Computing Toolbox: Istanbul Technical University
Model Order Reduction Via Matlab Parallel Computing Toolbox: Istanbul Technical University
Toolbox
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 1 / 40
1 Parallel Computation
Why We Need Parallelism in MOR?
What is Parallelism?
Parallel Architectures
2 Tools of Parallelization
Programming Models
Parallel Matlab
4 Conclusions
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 2 / 40
Why We Need Parallelism in MOR?
Computational Complexity
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 3 / 40
What is Parallelism?
Sequential Programming
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 4 / 40
What is Parallelism?
Parallel Programming
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 5 / 40
Parallel Architectures
Shared Memory
Generally shared memory machines have in common the ability for all
processors to access all memory as global address space.
Multiple processors can operate independently but share the same
memory resources.
Shared memory machines can be divided into two main classes based
upon memory access times: UMA and NUMA
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 6 / 40
Parallel Architectures
UMA vs. NUMA
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 7 / 40
Parallel Architectures
Distributed Memory
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 8 / 40
Parallel Architectures
Hybrid Memory
The largest and the fastest computers in the world today employ both
shared and distributed memory architectures.
The shared memory component is usually a cache coherent SMP
machine. Processors on a given SMP can address that machine’s
memory as global.
Network communications are required to move data from one SMP to
another.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 9 / 40
Parallel Programming Models: Threads
POSIX Threads & OpenMP
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 10 / 40
Parallel Programming Models: Message Passing
Interface
MPI
A set of tasks that use their own local memory during computation.
Multiple tasks can reside on the same physical machine as well across
an arbitrary number of machines.
Tasks exchange data through communications by sending and
receiving messages.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 11 / 40
Matlab Distributed Computing Toolbox
Distributed or Parallel
From the view of Matlab terminology parallel jobs run on the internal
workers such as cores and distributed jobs run on the cluster nodes.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 12 / 40
Basics of Parallel Computing Toolbox
parfor
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 13 / 40
Basics of Parallel Computing Toolbox
when we can use parfor?
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 14 / 40
Basics of Parallel Computing Toolbox
when we can not use parfor?
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 15 / 40
Basics of Parallel Computing Toolbox
single process multiple data (spmd)
In Matlab you can use spmd blocks to run a process on different data
sets.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 16 / 40
Basics of Parallel Computing Toolbox
single process multiple data (spmd)
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 17 / 40
Basics of Parallel Computing Toolbox
distributed arrays
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 18 / 40
Basics of Parallel Computing Toolbox
distributed arrays
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 19 / 40
Matrix transposing
MPI-Fortran vs. Matlab -DCT
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 20 / 40
Rational Krylov Methods
ẋ = Ax + Bu
y = C T x + Du
 = W ∗ AV B̂ = W ∗ B Ĉ = CV (1)
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 21 / 40
Rational Krylov Method
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 22 / 40
Rational Krylov Projectors
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 23 / 40
Rational Krylov Projectors
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 24 / 40
H2 norm of a system
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 25 / 40
H2 optimality
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 26 / 40
Iterative Rational Krylov Algorithm (IRKA)
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 27 / 40
Example RLC network
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 28 / 40
Frequency plots of the reduced and original systems
N=201 and the order of reduced system k=20
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 29 / 40
Computational Cost of Methods
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 30 / 40
Parallel Parts of Algorithms
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 31 / 40
Parallel Version of Alg. 1
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 32 / 40
Parallel Version of Alg. 1
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 33 / 40
CPU times for Rational Krylov
Table: CPU times of parallel version of Alg.1 for different system orders where
the reduced system order k=200.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 34 / 40
CPU times for IRKA
Table: CPU times of parallel version of Alg.2 for different system orders where
the reduced system order k=200.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 35 / 40
Speedup graph for RK
where T1 is the CPU time for one processor and Tp is the CPU time for P
processor.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 36 / 40
Speedup graph for RK
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 37 / 40
Speedup graph for IRKA
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 38 / 40
continued
It can easily be seen from the figures, when we increase the number
of processors processing time decreases appreciably upto some point,
after which it starts to increase.
This is due to communication times becoming dominant over
computation time. But in both algorithm, when the size of the
system matrices are getting larger better speedups are obtained.
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 39 / 40
Conclusions
E. Fatih Yetkin (Istanbul Technical Univ.) Terschelling, 2009 September 21, 2009 40 / 40