0% found this document useful (0 votes)
48 views49 pages

Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK

This document discusses various parallel architectures including massively parallel processors (MPPs), symmetric multi-processors (SMPs), cache-coherent non-uniform memory access (cc-NUMA), clusters, distributed systems, grid computing, and cloud computing. It compares and contrasts these architectures in terms of their memory access mechanisms, node interconnection methods, usage models, and other characteristics. The document provides an overview of key differences between concepts like MPPs and NUMA systems, distributed systems and clusters, distributed computing and grid computing, and grid computing and cloud computing.

Uploaded by

Nishshanka CJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views49 pages

Cluster Computing: Dr. C. Amalraj 01/03/2021 The University of Moratuwa Amalraj@uom - LK

This document discusses various parallel architectures including massively parallel processors (MPPs), symmetric multi-processors (SMPs), cache-coherent non-uniform memory access (cc-NUMA), clusters, distributed systems, grid computing, and cloud computing. It compares and contrasts these architectures in terms of their memory access mechanisms, node interconnection methods, usage models, and other characteristics. The document provides an overview of key differences between concepts like MPPs and NUMA systems, distributed systems and clusters, distributed computing and grid computing, and grid computing and cloud computing.

Uploaded by

Nishshanka CJ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Cluster Computing

(IN 4700)

Dr. C. Amalraj
01/03/2021
The University of Moratuwa
amalraj@uom.lk
Lecture 4:
Parallel Architectures
Today’s Outline

 Classification of Parallel Architectures

 NUMA vs. MPP

 Distributed systems vs. Clusters

 Distributed Computing vs. Grid Computing

 Cluster Computing vs. Cloud Computing

 Introduction to Cloud Computing


3
Classification of
Parallel
Architectures
Parallel Computation’s Space

Parallel computing occupies a unique spot in


the universe of distributed systems.
 Parallel computing is centralized—all of the
processes are typically under the control of a distributed
single entity.
 Parallel computing is usually hierarchical— heterarchical
parallel architectures are frequently described centralized
as grids, trees, or pipelines.
decentralized
 Parallel computing is co-located—for efficiency, hierarchical
parallel processes are typically located very
close to each other, often in the same chassis
or at least the same data center. co-located

These choices are driven by the problem space


and the need for high performance.

5
Classification of parallel architectures
parallel architectures
architectures can be roughly
 parallel
broken down into many categories
(some are listed below):

 MPP (Massively Parallel Processor )

 SMP (Symmetric Multi-Processor system)

 cc-NUMA (cache-coherent non-uniform memory access)

 Clusters

 Grid computing systems


MPP vs. NUMA

 NUMA has many similarities with


MPP:
 they are composed of multiple nodes
 each node has its own CPU, memory,
I/O
 informationexchange between nodes
take place by the node interconnection
mechanism.
MPP vs. NUMA
 Node interconnection mechanism is different
 NUMA's node interconnection mechanism is
implemented inside the same physical server.
 The node interconnection mechanism of MPP is
implemented by I/O outside different SMPs
 Memory access mechanism is different
 Inside a NUMA server, any CPU can access the entire
system's memory, but the performance of remote
access is much lower than the local memory access
 In the MPP, each node accesses local memory, and
remote memory access is done by using some form of
messaging interface
Distributed Systems vs. Clusters
 Distributed computing just means coordinating a
number of computers to accomplish a single
task. Cluster computing means the computers
are specifically organized just to work together to
accomplish a single task.
 For example, massively parallel "grid computing"
projects like seti@home and folding@home are
examples of distributed computing but they are not
cluster computing. Here, the computers all work
together to accomplish a task, so this is distributed
computing. But they are not specifically arranged for
this purpose (the arrangement do not have an
obvious order or plan), so they are not a cluster and
this is not cluster computing.
Distributed Computing vs. Grid Computing
 Distributed Computing normally refers to
managing or pooling the hundreds or thousands
of computer systems which individually are more
limited in their memory and processing power.
On the other hand, grid computing has some
extra characteristics. It is concerned to efficient
utilization of a pool of heterogeneous systems
with optimal workload management utilizing an
enterprise's entire computational resources
(servers, networks, storage, and information)
acting together to create one or more large pools
of computing resources. There is no limitation of
users, departments or originations in grid
computing.
Distributed Computing vs. Grid Computing
 Grid computing is focused on the ability to
support computation across multiple
administrative domains that sets it apart from
traditional distributed computing. Grids offer a
way of using the information technology
resources optimally inside an organization
involving virtualization of computing resources.
Its concept of support for multiple
administrative policies and security
authentication and authorization mechanisms
enables it to be distributed over a local,
metropolitan, or wide-area network.
Grid vs. Cloud : What?

 Grids enable access to shared computing power


and storage capacity from your desktop

 Clouds enable access to leased computing power


and storage capacity from your desktop
Grid vs. Cloud : What?

Grids:
 Research institutes and universities federate
their services around the world through projects
such as EGI-InSPIRE and the European Grid
Infrastructure.
Clouds:
 Large individual companies e.g. Amazon and
Microsoft and at a smaller scale, institutes and
organisations deploying open source software
such as Open Slate, Eucalyptus and Open
Nebula.
Grid vs. Cloud : Who uses the service?

Grids:
 Research collaborations, called "Virtual
Organisations", which bring together researchers
around the world working in the same field.
Clouds:
 Small to medium commercial businesses or
researchers with generic IT needs
Grid vs. Cloud : Who pays for the service?

Grids:
 Governments - providers and users are usually
publicly funded research organisations, for
example through National Grid Initiatives.
Clouds:
 The cloud provider pays for the computing
resources; the user pays to use them.
Grid vs. Cloud : Where are the computing resources?

Grids:
 In computing centres distributed across different
sites, countries and continents.

Clouds:
 The cloud providers private data centres which
are often centralised in a few locations with
excellent network connections and cheap
electrical power.
Grid vs. Cloud : What are they useful for?

Grids:
 Grids were designed to handle large sets of
limited duration jobs that produce or use large
quantities of data
 (e.g. the Large Hadron Collider(LHC) and life
sciences)

Clouds:
 Clouds best support long term services and
longer running jobs (E.g. facebook.com)
Grid vs. Cloud : How do they work?

Grids:
 Grids are an open source technology. Resource
users and providers alike can understand and
contribute to the management of their grid
Clouds:
 Clouds are a proprietary technology. Only the
resource provider knows exactly how their cloud
manages data, job queues, security requirements
and so on.
Massively Parallel Processing (MPP)
 Simply put, Massively Parallel Processing is the
use of many processors.
 Traditional MPP machines are distributed
memory machines that use multiple processors
(versus SMP's, which employ a shared memory
architecture).
 MPPs have many of the same characteristics as
clusters, but MPPs have specialized
interconnect networks (whereas clusters use
commodity hardware for networking).
Massively Parallel Processing (MPP)
 However, there seems to be a blurred line
separating distributed computing, MPP,
clusters and Networks of Workstations (NOW's).
 All four architectures are distributed memory
architectures.
 The term also applies to massively parallel
processor arrays (MPPAs), a type of integrated
circuit with an array of hundreds or thousands
of CPUs and RAM banks. These processors pass
work to one another through a reconfigurable
interconnect of channels.
Distributed Computers
 Distributed Computing usually is the process of stealing
idle cycles from ordinary computers.
 Most NoWs fall into this category (Distributed Computing ),
although sometime people classify a dedicated group of
machines with different hardware architectures as a NoW
(which would actually seem to fall into the category of a
cluster). While Distributed Computers use many processors
and distributed memory, they are usually not dedicated to
the parallel applications.
 Clusters are typically dedicated to the parallel application.
These systems are usually built from commercial hardware
and software joined by a network and employing some type
of message passing.
Massively Parallel Processing (MPP)
 True MPPs for the most part employ proprietary OS's
and at least some special hardware. They are typically
very large, very powerful machines. Most MPPs are
used for computationally intensive research in the
engineering and sciences fields.
 CC-NUMA employs a different type of memory access.
While the memory is physically distributed a CC-NUMA
machine still retains some of the benefits of an SMP,
because the memory is logically considered shared.
Shared Memory Architecture (SMP)
 CPUs access shared memory through a bus
 All processors share a single view of data and the
communication between processors can be as
fast as memory accesses to a same location
 CPU-to-memory connection becomes bottleneck
(req. high speed interconnects !!!)

P P P P
M M M M
BUS (Network Bus)
P NIC P NIC P NIC P NIC

Network Memory
Distributed Memory Shared Memory
Shared Memory Architecture
UMA (Uniform Memory Access): individual
processors share memory (and I/O) in such a way that
each of them can access any memory location with the
same speed
Many small shared machines are symmetric
Larger shared memory machines do not satisfy this
definition (NUMA or cc-NUMA)
P P P P P P P P
NUMA (Non Uniform
Memory Access) architecture BUS (Network Bus) BUS (Network Bus)

was designed to overcome the Memory Memory

scalability limits of the SMP


(Shared Memory Processor / BUS (Network Bus)
Symmetric Multiprocessor)
Distributed Shared Memory
architecture.
Shared Memory Architecture
• What is cache?
– Extremely fast and relatively small memory unit
•L1 Cache: built into cpu itself
•L2 Cache: resides on a separate chip next to the CPU
– CPU does not use motherboard system bus to data transfer
– Reduce memory access time
– Decrease bandwidth requirement of local memory module
and global interconnect

L
CPU 2
C Disk
Register a
c
Memory
L1 Cache h
e
Shared Memory Architecture

• NUMA Architecture Types:


– ccNUMA means cache coherent NUMA architecture.
– Cache coherence is integrity of data stored in local caches
of a shared resource.
Introduction to MPI
What is MPI?
 MPI = Message Passing Interface
 Specification of message passing libraries for
developers and users
 Not a library by itself, but specifies what such a library
should be
 Specifies application programming interface (API) for such
libraries
 Many libraries implement such APIs on different platforms –
MPI libraries
 Goal: provide a standard for writing message passing
programs
 Portable, efficient, flexible
 Language binding: C, C++, FORTRAN programs
29
History & Evolution
 1980s – 1990s: incompatible libraries and
software tools; need for a standard
 1994, MPI 1.0;
 1995, MPI 1.1, revision and clarification to
MPI 1.0
 Major milestone
 C, FORTRAN
 Fully implemented in all MPI libraries
 1997, MPI 1.2
 Corrections and clarifications to MPI 1.1
 1997, MPI 2
 Major extension (and clarifications) to
MPI 1.1 MPI Evolution
 C++, C, FORTRAN
 Partially implemented in most libraries; a
few full implementations (e.g. ANL
MPICH2)
 Sep 2012: The MPI-3.0 standard was
approved.

30
Why Use MPI?
 Standardization: de facto standard for parallel
computing
 Not an IEEE or ISO standard, but “industry standard”
 Practically replaced all previous message passing libraries
 Portability: supported on virtually all HPC platforms
 No need to modify source code when migrating to
different machines
 Performance: so far the best; high performance and
high scalability
 Rich functionality:
 MPI 1.1 – 125 functions
 MPI 2 – 152 functions. If you know 6 MPI functions,
you can do almost everything
in parallel.
31
Programming Model
 Message passing model: data exchange through explicit
communications.
 For distributed memory, as well as shared-memory parallel
machines
 User has full control (data partition, distribution) : needs to
identify parallelism and implement parallel algorithms using MPI
function calls.
 Number of CPUs in computation is static
 New tasks cannot be dynamically spawned during run time (MPI 1.1)
 MPI 2 specifies dynamic process creation and management, but not
available in most implementations.
 Not necessarily a disadvantage
 General assumption: one-to-one mapping of MPI processes to
processors (although not necessarily always true).

32
MPI 1.1 Overview

 Point to point communications


 Collective communications
 Process groups and communicators
 Process topologies
 MPI environment management

33
MPI 2 Overview
 Dynamic process creation and management
 Creation of new processes on the fly
 Connecting previously existing processes
 One-sided communications
 reduces synchronization and so improve performance
 reduces data movement
 simplifies programming
 MPI Input / Output (Parallel I/O)
 Extended collective communications
 Useful in pipelined algorithms where data needs to be
moved from one group of processes to another
 C++ binding 34
Message Passing Models

35
MPI Resources

 MPI Standard:
 http://www.mpi-forum.org/
 MPI web sites/tutorials etc, available online
 Public domain (free) MPI implementations
 MPICH2 and MPICH3 (from ANL)
 LAM MPI
• More recently, it has evolved to OpenMPI (not to be
confused with OpenMP)
 Microsoft MPI (for Windows OS)

36
General MPI Program Structure

37
Example On 4 processors:

Hello, I am process 1 among 4 processes


Hello, I am process 2 among 4 processes
Hello, I am process 0 among 4 processes
Hello, I am process 3 among 4 processes
#include <mpi.h>
#include <stdio.h>

int main(int argc, char **argv)


{
int my_rank, num_cpus;

MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &num_cpus);
printf(“Hello, I am process %d among %d processes\n”,
my_rank, num_cpus);
MPI_Finalize();
return 0;
}

38
Example On 4 processors:

Hello, I am process 1 among 4 processes


Hello, I am process 2 among 4 processes
Hello, I am process 0 among 4 processes
Hello, I am process 3 among 4 processes
program hello
implicit none

include ‘mpif.h’
integer :: ierr, my_rank, num_cpus

call MPI_INIT(ierr)
call MPI_COMM_RANK(MPI_COMM_WORLD, my_rank)
call MPI_COMM_SIZE(MPI_COMM_WORLD, num_cpus)
write(*,*) “Hello, I am process “, my_rank, “ among “ &
, num_cpus, “ processes”
call MPI_FINALIZE(ierr)

end program hello

39
MPI Header Files

 In C/C++:
#include <mpi.h>

 In FORTRAN:

include ‘mpif.h’
or (in FORTRAN90 and later)
use MPI

40
MPI Naming Conventions
 All names have MPI_ prefix.
 In FORTRAN:
 All subroutine names upper case, last argument is return code
call MPI_XXXX(arg1,arg2,…,ierr)
call MPI_XXXX_XXXX(arg1,arg2,…,ierr)

 A few functions without return code If ierr == MPI_SUCCESS,


Everything is ok; otherwise,
something is wrong.
 In C: mixed uppercase/lowercase
ierr = MPI_Xxxx(arg1,arg2,…);
ierr = MPI_Xxxx_xxx(arg1,arg2,…);

 MPI constants all uppercase


MPI_COMM_WORLD, MPI_SUCCESS, MPI_DOUBLE, MPI_SUM, …

41
Initialization
 Initialization: MPI_Init() initializes MPI environment;
(MPI_Init_thread() if multiple threads)
 Must be called before any other MPI routine (so put it at the beginning
of code) except MPI_Initialized() routine.
 Can be called only once; subsequent calls are erroneous.
 MPI_Initialized() to check if MPI_Init() is called

int MPI_Init(int *argc, char ***argv) MPI_INIT(ierr)

int main(int argc, char ** argv)


{ program test
MPI_Init(&argc, &argv); integer ierr
int flag; call MPI_INIT(ierr)
MPI_Initialized(&flag); …
if(flag != 0) … // MPI_Init called call MPI_FINALIZE(ierr)
… … end program test
MPI_Finalize();
return 0;
} 42
Termination
 MPI_Finalize() cleans up MPI environment
 Must be called before exits.
 No other MPI routine can be called after this call, even
MPI_INIT()
 Exception: MPI_Initialized() (and
MPI_Get_version(), MPI_Finalized()).
 Abnormal termination: MPI_Abort()
 Makes a best attempt to abort all tasks
int MPI_Finalize(void)
MPI_FINALIZE(IERR)
integer IERR

int MPI_Abort(MPI_Comm comm, int errorcode)


MPI_ABORT(COMM,ERRORCODE,IERR)
integer COMM, ERRORCODE, IERR
43
MPI Processes
 MPI is process-oriented: program consists of multiple
processes, each corresponding to one processor.
 MIMD: Each process runs its own code. In practice,
runs its own copy of the same code (SPMD: single
program, multiple data).
 MPI process and threads: MPI process can contain a
single thread (common case) or multiple threads.
 Most MPI implementations do not support multiple threads.
Needs special processing with that support.
 We will assume a single thread per process from now on.
 MPI processes are identified by their ranks:
 If total nprocs processes in computation, rank ranges from
0, 1, …, nprocs-1. (true in C and FORTRAN).
 nprocs does not change during computation.
Communicators and
Process Groups

 Communicator: is a group of processes that can communicate


with one another.
 Most MPI routines require a communicator argument to specify
the collection of processes the communication is based on.
 All processes in the computation form the communicator
MPI_COMM_WORLD.
 MPI_COMM_WORLD is pre-defined by MPI, available anywhere
 Can create subgroups / sub-communicators within
MPI_COMM_WORLD.
 A process may belong to different communicators, and have different
ranks in different communicators.

45
How many CPUs, Which one am I …
 How many CPUs: MPI_COMM_SIZE()
 Who am I: MPI_COMM_RANK()
 Can compute data decomposition etc.
 Know total number of grid points, total number of cpus and current
cpu id; can calc which portion of data current cpu is to work on.
 E.g. Poisson equation on a square
 Ranks also used to specify source and destination of
communications.

… my_rank value different on different processors !


int my_rank, ncpus;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);
MPI_Comm_size(MPI_COMM_WORLD, &ncpus);

int MPI_Comm_rank(MPI_Comm comm, int *rank)
int MPI_Comm_size(MPI_Comm comm, int *size)
MPI_COMM_RANK(comm,rank,ierr)
MPI_COMM_SIZE(comm,size,ierr) 46
Compiling, Running
 MPI standard does not specify how to start up the program
 Compiling and running MPI code implementation dependent
 MPI implementations provide utilities/commands for
compiling/running MPI codes
 Compile: mpicc, mpiCC, mpif77, mpif90, mpCC, mpxlf …
mpiCC –o myprog myfile.C (cluster)
mpif90 –o myprog myfile.f90 (cluster)
CC –Ipath_mpi_include –o myprog myfile.C –lmpi (SGI)
mpCC –o myprog myfile.C (IBM)
 Run: mpirun, poe, prun, ibrun …
mpirun –np 2 myprog (cluster)
mpiexec –np 2 myprog (cluster)
poe myprog –node 1 –tasks_per_node 2 … (IBM)

SGI: Silicon Graphics International 47


Example mpirun –np 2 test_hello

Process 1 received: Hello, there!


#include <mpi.h>
#include <stdio.h>
#include <string.h> 6 MPI functions:
MPI_Init()
int main(int argc, char **argv) MPI_Finalize()
{
char message[256];
MPI_Comm_rank()
int my_rank; MPI_Comm_size()
MPI_Status status; MPI_Send()
MPI_Recv()
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD,&my_rank);
if(my_rank==0){
strcpy(message,”Hello, there!”);
MPI_Send(message,strlen(message)+1,MPI_CHAR,1,99,MPI_COMM_WORLD);
}
else if(my_rank==1) {
MPI_Recv(message,256,MPI_CHAR,0,99,MPI_COMM_WORLD,&status);
printf(“Process %d received: %s\n”,my_rank,message);
}
MPI_Finalize();
return 0;
} 48
MPI Communications
 Point-to-point communications
 Involves a sender and a receiver, one processor to
another processor
 Only the two processors participate in communication
 Collective communications
 All processors within a communicator participate in
communication (by calling same routine, may pass different
arguments);
 Barrier, reduction operations, gather, …

49

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy