Class 8
Class 8
Fortran: day 8
Paul Tackley, 2017
Today’s Goals
1. Introduction to parallel computing
(applicable to Fortran or C; examples are in
Fortran)
2. Finite Prandtl number convection
Motivation:
To model the Earth,
need a huge number
of grid points / cells
/elements!
• e.g., to fill mantle volume:
– (8 km)3 cells -> 1.9 billion
cells
– (2 km)3 cells -> 123 billion
cells
Huge problems => huge computer
www.top500.org
Huge problems => huge computer
www.top500.org
Progress: iPhone > fastest
computer in 1976 (cost: $8
million)
Piz Dora
Distributed memory:
each cpu has its own
memory. Parallelisation
usually requires
message-passing, e.g.
using MPI (message-
passing interface)
A brief history of supercomputers
1983-5
4 CPUs,
Shared
memory
1991: 512 CPUs, distributed memory
2010: 224,162 Cores, distributed + shared memory (12 cores per node)
Another possibility: build you own
(“Beowulf” cluster)
Using standard PC cases: or using rack-mounted cases
MPI: message-passing interface
• A standard library for communicating
between different tasks (cpus)
– Pass messages (e.g., arrays)
– Global operations (e.g., sum, maximum)
– Tasks could be on different cpus/cores of the
same node, or on different nodes
• Works with Fortran and C
• Works on everything from a laptop to the
largest supercomputers. 2 versions are:
– http://www.mcs.anl.gov/research/projects/mpic
h2/
– http://www.open-mpi.org/
How to parallelise a code:
worked example
Example: Scalar Poisson eqn.
∇ u= f
2
Finite-difference approximation:
1
h2 (u i+1 jk )
+ ui−1 jk + uij +1k + uij −1k + uijk +1 + uijk +1 − 6ui, j = f ij
CPU 0 CPU 1
CPU 2 CPU 3
CPU 4 CPU 5
CPU 6 CPU 7
Red=ext. boundaries
Green=int. boundaries
Yellow=iterated/solved
First things the code has to
do:
• Call MPI_init(ierr)
• Find #CPUs using MPI_com_size
• Find which CPU it is, using MPI_com_rank
(returns a number from 0…#CPUs-1)
• Calculate which part of the global grid it is
dealing with, and which other CPUs are
handling neighboring subdomains.
Example: “Hello world” program
Moving forward
• Update values in subdomain using
‘ghost points’ as boundary condition,
i.e.,
– Timestep (explicit), or
– Iteration (implicit)
• Update ghost points by communicating
with other CPUs
• Works well for explicit or iterative
approaches
Boundary communication
Step 1: x-faces
Computation : t=aN3
Communication : t=nL+bN2 /B
(L=Latency, B=bandwidth)
Smooth 32x32x32
Smooth 16x16x16
Residues (=error)
corrections
Smooth 8x8x8
Simple-minded multigrid:
Very inefficient coarse levels!
Exact coarse solution can take
long time!
New treatment:
follow minima
• Keep #points/core >
minimum (tuned for
system)
• Different for on-
node and cross-node
communication
Multigrid – now (& before): yin-yang
1.8 billion
Summary
• For very large-scale problems, need to
parallelise code using MPI
• For finite-difference codes, the best method is
to assign different parts of the domain to
different CPUs (“domain decomposition”)
• The code looks similar to before, but with some
added routines to take care of communication
• Multigrid scales fine on 1000s CPUs if:
– Treat coarse grids on subsets of CPUs
– Large enough total problem size
For more information
• https://computing.llnl.gov/tutorials/parall
el_comp/
• http://en.wikipedia.org/wiki/Parallel_com
puting
• http://www.mcs.anl.gov/~itf/dbpp/
• http://en.wikipedia.org/wiki/Message_Pa
ssing_Interface
Programming:
Finite Prandtl number convection
(i.e., almost any fluid)
∂t
1 ⎛ ∂v ⎞ 2 1
⎜ + v ⋅ ∇v ⎟ = −∇P + ∇ v + Ω × v + Ra.Tyˆ
Pr ⎝ ∂t ⎠ Ek
ν ν gα∇TD 3
Pr = Ek = 2 Ra =
κ 2ΩD νκ
Prandtl number Ekman number Rayleigh number
As before, use streamfunction
∂ψ ∂ψ
vx = vy = −
∂y ∂x
1 ⎛ ∂ω ∂ω ∂ω ⎞ ∂T
⎜ + vx + vy ⎟ = ∇ ω − Ra
2
Pr ⎝ ∂ t ∂x ∂y⎠ ∂x
=> the streamfunction-vorticity
formulation
1 ⎛ ∂ω ∂ω ∂ω ⎞ ∂T
⎜ + vx + vy ⎟ = ∇ ω − Ra
2
Pr ⎝ ∂ t ∂x ∂y⎠ ∂x
⎛ ∂ψ ∂ψ ⎞
∇ ψ = −ω
2
( )
vx ,vy = ⎜ ,− ⎟
⎝ ∂y ∂x ⎠
∂T
+ v ⋅ ∇T = ∇ 2T + Q
∂t
Note: Effect of high Pr
1 ⎛ ∂ω ∂ω ∂ω ⎞ ∂T
⎜ + vx + vy ⎟ = ∇ ω − Ra
2
Pr ⎝ ∂ t ∂x ∂y⎠ ∂x
If Pr->infinity, left-hand-side=>0 so equation becomes Poisson
like before:
∂T
∇ ω = Ra
2
∂x
Taking a timestep
(i) Calculate ψ from ω using: ∇ ψ =ω
2
⎛ ∂ψ ∂ψ ⎞
(ii) Calculate v from ψ ( )
vx ,vy = ⎜ ,− ⎟
⎝ ∂y ∂x ⎠
(iii) Time-step ω and T using explicit finite differences:
∂T ∂T ∂T
= −vx − vy +∇ T
2
∂t ∂x ∂y
∂ω ∂ω ∂ω ∂T
= −vx − vy + Pr ∇ ω − Ra Pr
2
∂t ∂x ∂y ∂x
T time step is the same as before
Δt ∂x ∂y ∂x
h2
Diffusion: dtdiff = adiff
max(Pr,1)
⎛ h h ⎞
Advection: dt adv = aadv min ⎜ ,
⎝ max val(abs(vx)) max val(abs(vy)) ⎟⎠