CS621 Final Term Current Papers
CS621 Final Term Current Papers
Introduction to GPU:
“Graphics Processing Unit ()GPU) is a chip or electrnic circuict capable of
rendering graphics for
Multiple processors handle seprate parts separate parts of the same task.”
The world’s first GPU, the GeForce 256, was marketed by NVIDIA in 1999.
These GPU chips
can process a minimum of 10 million polygons per second and are used in
nearly every computer
on the market today.
i. Eventual Consistency
ii. Monotonic Writes
iii. Read Your Writes
iv. Writes Follow Reads.
Types of mutex
1. A normal mutex
2. A recursive mutex
3. Error check mutex.
Architecture of GPU
MPI_FINALIZE
–MPI_COMM_SIZE
–MPI_COMM_RANK
–MPI_SEND
–MPI_RECV
Advantages/Benefits of the Message Passing Interface(MPI)
• Portability (There is no need to modify your source code when you port
your application to a different platform that supports (and is compliant with)
the MPI standard)
• Functionality:( Over 115 routines are defined in MPI-1 Al)
• Scalability: (MPI is designed to scale to large numbers of processors, which
makes it well-suited for high performance computing)
• Availability:( A variety of implementations are available, both vendor and
public domain.)
Parallel I/O Tools
Collections of system software and libraries have grownup to address I/O issues:
• At Parallel file systems • MPI-IO
• High level libraries
• Relationships between these are not always clear. Choosing between tools can be
diffcult.
Performance of GPU:
Bill Dally of Stanford University considers power and massive parallelism as the
major benefits of GPUs over CPUs for the future.
Two Aspects:
➢ Data Access Rate Capability
✓ Bandwidth
➢ Data Processing Capability
✓ How many ops per sec.
Un-buffered:
A routine may be used to perform a specific message passing task, such as sending
or receiving a message between two processes. Does not use a buffer. Messages are
sent and received immediately without any buffering. This mode is useful when
low latency is requird.
• Efficiency is given by
𝑛
E =0 (𝑙𝑜𝑔𝑛)
1𝑛
=0 ( )
𝑙𝑜𝑔𝑛
Independent MPI-IO
For independent I/O each MPI task is handling the I/O independently using non
collective calls like MPI_File_write() and MPI_File_read()
Fined-grained
Decomposition of computation into a large number of tasks results in fine-grained
decomposition.
Coarse Grained
Decomposition of computation into a small number of tasks results in a coarse
grained decomposition.
Latency:
Total time to send a message.
Bandwidth:
Number of bits transmitted in a unit of time.
Bisection:
The number of connections that need to be cut to partition the network in 2.
I/O bottleneck:
There are three main reason for I/O bottleneck: • Increasing CPU Speed as
compared to I/O • Increase in number of CPUs • New application domains that
increasing I/O demand.
MapReduce:
Simple programming model that enables parallel execution of data processing
programs •Executes the work on the data near the data •In a nutshell: HDFS places
the data on the cluster and MapReduce does the processing work.