0% found this document useful (0 votes)
50 views3 pages

Parallel Models of Computation

The document discusses three important principles that have emerged for developing parallel models of computation: 1. Work-efficiency - An algorithm is efficient if it performs the same amount of work as the fastest sequential algorithm using the same number of processors. The work captures the actual cost of the computation. 2. Emulation - A useful model does not need to mimic a real machine, but algorithms efficient in the model must map to efficient algorithms on real machines. For example, a PRAM algorithm can be translated to an algorithm for a multi-processor memory system. 3. Modeling communication - To optimize performance, a model may explicitly include capabilities like bandwidth and topology that impact communication between processors. [/SUMMARY]

Uploaded by

debasish behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views3 pages

Parallel Models of Computation

The document discusses three important principles that have emerged for developing parallel models of computation: 1. Work-efficiency - An algorithm is efficient if it performs the same amount of work as the fastest sequential algorithm using the same number of processors. The work captures the actual cost of the computation. 2. Emulation - A useful model does not need to mimic a real machine, but algorithms efficient in the model must map to efficient algorithms on real machines. For example, a PRAM algorithm can be translated to an algorithm for a multi-processor memory system. 3. Modeling communication - To optimize performance, a model may explicitly include capabilities like bandwidth and topology that impact communication between processors. [/SUMMARY]

Uploaded by

debasish behera
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Parallel Models of Computation

Developing a standard parallel model of computation for analyzing algorithms has proven
difficult because different parallel computers tend to vary significantly in their organizations.
In spite of this difficulty, useful parallel models have emerged, along with a deeper
understanding of the modeling process. In this section we describe three important principles
that have emerged.

1. Work-efficiency. In designing a parallel algorithm, it is more important to make it


efficient than to make it asymptotically fast. The efficiency of an algorithm is determined
by the total number of operations, or work that it performs. On a sequential machine, an
algorithm's work is the same as its time. On a parallel machine, the work is simply the
processor-time product. Hence, an algorithm that takes time t on a P-processor machine
performs work W = Pt. In either case, the work roughly captures the actual cost to
perform the computation, assuming that the cost of a parallel machine is proportional to
the number of processors in the machine. We call an algorithm work-efficient (or just
efficient) if it performs the same amount of work, to within a constant factor, as the
fastest known sequential algorithm. For example, a parallel algorithm that sorts n keys

in time using processors is efficient since the work, , is as


good as any (comparison-based) sequential algorithm. However, a sorting algorithm that

runs in time using processors is not efficient. The first algorithm is better
than the second - even though it is slower - because it's work, or cost, is smaller. Of
course, given two parallel algorithms that perform the same amount of work, the faster
one is generally better.
2. Emulation. The notion of work-efficiency leads to another important observation: a
model can be useful without mimicking any real or even realizable machine. Instead, it
suffices that any algorithm that runs efficiently in the model can be translated into an
algorithm that runs efficiently on real machines. As an example, consider the widely-used
parallel random-access machine (PRAM) model. In the PRAM model, a set of processors
share a single memory system. In a single unit of time, each processor can perform an
arithmetic, logical, or memory access operation. This model has often been criticized as
unrealistically powerful, primarily because no shared memory system can perform
memory accesses as fast as processors can execute local arithmetic and logical operations.
The important observation, however, is that for a model to be useful we only require that
algorithms that are efficient in the model can be mapped to algorithms that are efficient
on realistic machines, not that the model is realistic. In particular, any algorithm that runs
efficiently in a P-processor PRAM model can be translated into an algorithm that runs

efficiently on a -processor machine with a latency L memory system , a much

more realistic machine. In the translated algorithm, each of the processors


emulates L PRAM processors. The latency is ``hidden'' because a processor has useful
work to perform while waiting for a memory access to complete. Although the translated
algorithm is a factor of L slower than the PRAM algorithm, it uses a factor of L fewer
processors, and hence is equally efficient.
3. Modeling Communication. To get the best performance out of a parallel machine, it is
often helpful to model the communication capabilities of the machine, such as its latency,
explicitly. The most important measure is the communication bandwidth. The bandwidth
available to a processor is the maximum rate at which it can communicate with other
processors or the memory system. Because it is more difficult to hide insufficient
bandwidth than large latency, some measure of bandwidth is often included in parallel
models. Sometimes the specific topology of the communication network is modeled as
well. Although including this level of detail in the model often complicates the design of
parallel algorithms, it's essential for designing the low-level communication primitives for
the machine. In addition to modeling basic communication primitives, other operations
supported by hardware, including synchronization and concurrent memory accesses, are
often modeled, as well as operations that mix computation and communication, such as
fetch-and-add and scans. A final consideration is whether the machine supports shared
memory, or whether all communication relies on passing messages between the
processors.

Algorithmic Techniques

A major advance in parallel algorithms has been the identification of fundamental algorithmic
techniques. Some of these techniques are also used by sequential algorithms, but play a more
prominent role in parallel algorithms, while others are unique to parallelism. Here we list
some of these techniques with a brief description of each.

1. Divide-and-Conquer. Divide-and-conquer is a natural paradigm for parallel algorithms.


After dividing a problem into two or more subproblems, the subproblems can be solved in
parallel. Typically the subproblems are solved recursively and thus the next divide step
yields even more subproblems to be solved in parallel. For example, suppose we want to
compute the convex-hull of a set of n points in the plane (i.e., compute the smallest
convex polygon that encloses all of the points). This can be implemented by splitting the

points into the leftmost and rightmost , recursively finding the convex hull of
each set in parallel, and then merging the two resulting hulls. Divide-and-conquer has
proven to be one of the most powerful techniques for solving problems in parallel with
applications ranging from linear systems to computer graphics and from factoring large
numbers to n-body simulations.
2. Randomization. The use of random numbers is ubiquitous in parallel algorithms.
Intuitively, randomness is helpful because it allows processors to make local decisions
which, with high probability, add up to good global decisions. For example, suppose we
want to sort a collection of integer keys. This can be accomplished by partitioning the
keys into buckets then sorting within each bucket. For this to work well, the buckets must
represent non-overlapping intervals of integer values, and contain approximately the same
number of keys. Randomization is used to determine the boundaries of the intervals. First
each processor selects a random sample of its keys. Next all of the selected keys are
sorted together. Finally these keys are used as the boundaries. Such random sampling is
also used in many parallel computational geometry, graph, and string matching
algorithms. Other uses of randomization include symmetry breaking, load balancing, and
routing algorithms.
3. Parallel Pointer Manipulations. Many of the traditional sequential techniques for
manipulating lists, trees, and graphs do not translate easily into parallel techniques. For
example techniques such as traversing the elements of a linked list, visiting the nodes of a
tree in postorder, or performing a depth-first traversal of a graph appear to be inherently
sequential. Fortunately, each of these techniques can be replaced by efficient parallel
techniques. These parallel techniques include pointer jumping, the Euler-tour technique,
ear decomposition, and graph contraction. For example, one way to label each node of
an n-node list (or tree) with the label of the last node (or root) is to use pointer jumping.
In each pointer-jumping step each node in parallel replaces its pointer with that of its

successor (or parent). After at most steps, every node points to the same node, the
end of the list (or root of the tree).
4. Others. Other useful techniques include finding small graph separators for partitioning
data among processors to reduce communication, hashing for balancing load across
processors and mapping addresses to memory, and iterative techniques as a replacement
for direct methods for solving linear systems.

These techniques have led to efficient parallel algorithms in most problem areas for which
efficient sequential algorithms are known. In fact, some of the techniques originally
developed for parallel algorithms have led to improvements in sequential algorithms.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy