0% found this document useful (0 votes)
63 views5 pages

Topological Sorting of Large Networks

The document describes a method for topologically sorting large networks, such as those used in PERT (Program Evaluation Review Technique). The method can topologically sort a network of 30,000 activities in less than an hour on an IBM 7090 computer. It breaks the sorting process into three parts: preparation of the input data, the core algorithm for sorting, and formatting the output. The core algorithm uses two lists - an activity list sorted by predecessor and an event list with pointers to the first activity for each predecessor - to efficiently sort the network without requiring random searching.

Uploaded by

Pepe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views5 pages

Topological Sorting of Large Networks

The document describes a method for topologically sorting large networks, such as those used in PERT (Program Evaluation Review Technique). The method can topologically sort a network of 30,000 activities in less than an hour on an IBM 7090 computer. It breaks the sorting process into three parts: preparation of the input data, the core algorithm for sorting, and formatting the output. The core algorithm uses two lists - an activity list sorted by predecessor and an event list with pointers to the first activity for each predecessor - to efficiently sort the network without requiring random searching.

Uploaded by

Pepe
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

D.

TEICHROEW, Editor

Topological Sorting of Large change in the network organization can require renumber-
ing of large sections of the network. Since the PERT net-
Networks works are not static but are undergoing revision on a
periodic schedule, this tedious job is repetitive. In ad-
A. B. ]£.AHN dition, the identity of elements is constantly changing
raising administrative difficulties.
Westinghouse Electric Corporation, Baltimore, Maryland Thus it becomes vital that the computer be capable of
reducing a randomly labeled network to topological order.
Topological Sorting is a procedure required for many problems involving
Most of the existing programs are the result of earlier
analysis of networks. An example of one such problem is PERT. The present attempts with capacity limits of 5,000 to 10,000 elements.
paper presents a very general method for obtaining topological order. It Only two papers on these approaches have been published
permits treatment of larger networks than can be handled on present pro- [1, 2]. More recently, there have been several attempts at
cedures and achieves this with greater efficiency. Although the procedure can
larger capacities, but no discussions of them have been
be adapted to any machine, it is discussed in terms of the 7090. A PERT network
of 30,000 activities can be ordered in less than one hour of machine time.
published. Furthermore, although the lack of information
The method was developed as a byproduct of procedure needed by West- prevents final judgment, it is felt that the current approach
inghouse, Baltimore. It has not been programmed and at present there are no m a y be more efficient in machine time.
plans to implement it. In regard to the techniques described, Westinghouse's There have been numerous groups who have felt the
present and anticipated needs are completely served by the Lockheed pro-
need for much larger capacities. One of the difficulties is
gram, which is in current use.
that the blocking and merging techniques of conventional
sorts are not readily applicable here. The purpose of this
Introduction paper is to disclose a technique which will order about
In recent years much work has been done on the compu- 30,000 elements in about one hour of I B M 7090 time.
tational analysis of problems which can be formulated as This method is discussed with respect to the 7090. How-
networks with directed elements. In particular, this class ever, it can be readily adapted to make maximum use of
of problems include PERT (Program Evaluation Review the capacity of most medium or large scale computers. I t
Technique) which is used as a management schedule tool. is essentially based upon the ranking approach.
To a large extent the feasibility of network computations The method to be presented was not developed to meet
depends upon the ability to arrange the network infor- a specific need. Rather it resulted as a byproduct of a
mation in topological order. A list in topological order has technique developed for small networks. Thus in no way
a special property. Simply expressed: proceeding from can this paper be taken as a recommendation for use of
element to element along any path in the network, one large networks. Rather it is intended solely to provide an
passes through the list in one direction only. Stated efficient technique for those who have decided that it is
another way, a list in topological order is such that no to their advantage to handle large networks. In addition,
element appears in it until after all elements appearing on there are many other network problems for Which these
all paths leading to the particular element have been techniques may be desirable. The author has neither
listed. programmed nor at present plans to program the procedure
In the customary PERT computation the topological described.
ordering is the most difficult aspect of the problem from The method conveniently breaks down into three parts.
the c o m p u t a t i o n a l viewpoint. A simple solution is to The middle part uses the basic algorithm which is the key
number the elements so that the numbers along each path to the procedure. The purpose of the preparatory part is
always ascend as one proceeds along the path. When this solely to get from a convenient input format to the most
is done, topological order can be achieved with a simple efficient format for the basic algorithm. Similarly, the con-
sort. While this technique requires tedious effort, it has cluding part merely serves to get to the desired output
been used successfully on moderate-sized networks. How- format which can be subsequently used for computation.
ever, for large networks two difficulties arise: (1) I t is Although the first and last parts contain extensive proc-
difficult to assign blocks of numbers to groups working essing they consist entirely of standard data processing
independently on sections of the network, taking into techniques. Therefore, the discussion will commence with
account the problems of merging the sections. (2) A minor the basic algorithm .followed by the auxiliary portions of

558 C o m m u n i c a t i o n s o f t h e ACM
the procedure. Before this, however, there will be a brief Considerable processing is required to arrange the input
discussion of PERT terminology. into the most efficient form for the basic algorithm. For
the present we will assume the data have been prepared
1. P E R T Network Terminology
for the basic algorithm. The preparatory procedures are
I n general a network consists of a set of nodes or points discussed below.
connected by links which may or may not be directed. In The key to the efficiency of the algorithm is two lists:
PERT networks the nodes are points in time and are called a list of activities and a list of events. In order to reduce the
events. Directed links connect the events and are called need for random searching certain features are present in
activities. They usually represent tasks which consume these lists when the basic algorithm is executed. These are
time. The events are uniquely, though arbitrarily labeled. described below and illustrated in Figure 2. Figure 2
The activities are designated by the pair of labels for its presents the two tables at the start of the algorithm. Only
predecessor event, from which it proceeds, and the suc- columns a, b, c and d are actually in core. The other
cessor event at which it terminates. The terms "predeces- columns are included solely to permit the reader to relate
sor" and "successor" are used rather loosely to indicate these tables to the sample network of Figure 1.
either activities or events which proceed or succeed a spe- 1. The activity list is sorted by predecessor. This puts
cific activity or event. The terms may be used to designate all activities with the same predecessor event together in
either an immediate or a remote connection. In general a block. Column a in the activity list is a one-bit flag to
the meaning is clear through context. For example, in indicate the last activity in such a block.
2. The location of the first activity of the block men-
tioned above is placed opposite the predecessor in the
event list (column d). In the special case of a terminal

Activity List
Tutorial In Core

Event Labels
Predecessor Successor Event
Item Flag Location
Predecessor Successor
(a) (b)

FIG. 1. Sample PERT network 1 A C


2 A E
3 B A
Figure 1 events B and F are immediate predecessors of 4 B G
event G, and events E and C are immediate successors. 5 D A
Events H, B, D and F are remote predecessors to event E. 6 F B
Finally, G is a remote successor to event H. All events 7 F A
8 F D
have predecessors except H and F which are initial events.
9 F G
Events E and C are terminal events with no successors. 10 G C
Events B and D are on parallel paths and are neither 11 G E
predecessors nor successors to each other. Other pairs of 12 H B
events related in this fashion are: H and F; H and D; 1 bit 15 bits
G and A; and E and C.
Similarly, we can talk of activity A-C as being a sue-
cessor to event H or activity D-A, and so forth. Event List

2. T h e Basic Algorithm Tutorial In Core

In general there are two classes of techniques for Item Label Count Activity Location
(c) (d)
topological ordering: threading and ranking. The basic
algorithm to be presented makes use of a ranking pro- A 1
cedure, which proceeds in parallel on all paths and there- B 3
fore requires only as many iterations as the maximum C Z
number of reversals on any one path. D 5
E Z
The basic procedure is to establish a list of all unique F 6*
events with a count of the number of immediate prede- G 10
cessors of each event. To begin, the initial events can be H 12"
assigned sequence numbers starting from one. The
5 bits 15 bits
sequence numbers represent the sorting key which can
ultimately be used to obtain the desired order. F1G. 2. Tables at start of basic algorithm

Communications o f t h e ACM 559


event a special flag, Z, is used to indicate t h a t there is no
cross reference.
3. I t is necessary to enter the event list from the activity
list; therefore, the location (in the event list) of the suc- ~ _
jI
START NEW ITERATION
cessor event of each activity is available in column b of IO-~M MACHINE CYCLES

the activity list. r~"


START SCAN OF I (
4. Column c in the event list is the count of the number COUNT LIST /
I ..-ID i I

of outstanding predecessors for those events which are not CHECK

as yet ordered.
The procedure is indicated in the flow chart in Figure 3.
,
"~ CONTINUESCAN
J ' ,o.M MACH,NE
CYCLES
:E•
~L
FOR END
OF EVENT
LIST
SEARCH FOR
I t commences b y searching the count list (column c) for A ZERO 60"~ E MACHINE
CYCLES
zeros. (The loop through locations 10 and 20 shown with
h e a v y lines in Figure 3.) At the start this count would be
zero for all initial events. These events can be immediately
RESET ZERO;] - I --"~ Ci
assigned serial numbers; and the zero in column c is set to ASSIGN I di--D.TEMP
SERIAL S--~.di ; S-I- I --D. S
--1, indicating t h a t the event has been ordered (location NUMBER
11). Furthermore, their successors now have one less CHECK FOR
TERMINAL D(
predecessor outstanding; therefore, the count for these EVENT
successors can be decremented b y one (loop through lo-
PICK UP CROSS I
cations 30 and 40). Finally, the search for zeros continues REF. TO ACTIVITY TEM P - - ~ j I
down the list (location 20).
I n the above process the count for some events will be
reduced to zero. This means t h a t all the predecessors of
LIST

USE CROSS
+. 40•A MACHINE CYCLES '~' CHECK
| END OFFOR
bj -.-~ k

+.
INDEX b TO
these events have been ordered and now they in turn can DECREMENT Ck- b--pC k
COUNTERS
be ordered by assigning to t h e m the next available serial
numbers. Now the count for the successors of these events SET F FLAG
IF ZERO WAS
can in turn be decreased. Wherever a counter is decre- CREATED
mented it is tested and if a zero was created a flag, F, is
set (following location 30). When the complete list is
searched a check: is made (location 90), and if F is set it FIG. 3. Flow chart of basic algorithm
is reset and another iteration is made (location 1), starting
the search again from the top of the count list (column c). event has been ordered.) I n rare cases where more t h a n
If the flag, i¢', is not set at the end of an iteration (lo- this number are required d u m m y events can be introduced.
cation 100), one of three conditions exist: the process Furthermore, this can be done automatically in the early
is completed, there is an illegal loop in the network, or portions of the program.
segmenting procedures are being used. The first condition I n short, one 36-bit word can contain all the information
can be verified simply b y comparing the number of events required for one event and one activity. Since there are
against the current value of the serial number. The latter generally more activities than events some space is wasted
two conditions are discussed in a later section. b y making both lists the same length. However, this space
Although shown in the event list for simplicity, it is could be used if there is sufficient need to warrant the
actually more convenient to keep the initial events, additional logic. If this were done the m a x i m u m n u m b e r
marked with an * in Figure 2, in a separate list. For the of activities could be 60,000 less the n u m b e r of events.
case illustrated, this list consists simply of: 6, 12. This list This would necessitate another bit in the d field which
obviates the need to search for the initial zeros. could be obtained b y reducing the count field.
3. Space Requirements
4. Analysis of Timing
Assuming that; the program for the basic algorithm is a
separate link in a chain of programs, 2 K should be ample One of the esthetic features of this procedure is t h a t the
for the basic algorithm. This leaves 30K (in a 32K ma- timing can be readily analyzed. This method is fairly
chine) for the lists a, b, c and d in which about 30,000 efficient for a number of reasons:
activities can be accommodated in the following fashion. 1. There is never any searching for an item, the location
The activity list items consist of two fields: a and b. I t is always known.
is clear t h a t a is one bit while 15 bits for b would allow 2. If the network is labeled in sequence, then only a
for 32,678 events. Similarly, 15 bits for d in the event single iteration is made because the first zero count is en-
list would allow the same number of activities. If the countered at the top of the list and as it is processed
count c is limited to 5 bits this means t h a t up to 30 ac- further zeros are created which will be encountered in the
tivities can end at any single event. (Two of the 32 possible same pass. As these in turn are processed, additional zeros
values are reserved for 0; and a flag to signify t h a t an are created and all are processed in a single pass.

560 C o m m u n i c a t i o n s o f t h e ACM
3. When an event is not in topological order, it means order. A more reasonable expected value for M would be
that when its count is reduced to zero the scan for zero about 100 which would result in a machine time of less
counts will be below rather than above its location in the than 2 minutes.
list. This means that this zero will not be picked up im-
mediately but will wait until the next iteration. However, 5. T e r m i n a t i o n o f t h e B a s i c A l g o r i t h m
progress proceeds unhindered on parallel paths. Thus the The basic algorithm is terminated when a complete
number of iterations is not determined b y the total number iteration is made with no new zeros created and location
of events out of order, but only the maximum number of 100 on the flow chart is reached. At this point one of the
adverse t labels on any one path. Let this quantity be following three conditions exists.
called M. (1) All events have been ordered.
Let us now estimate an extreme value for M. Most PERT (2) A loop exists in the network.
networks cover a period of less than 5 years with the (3) If segmenting has been used to increase capacity it
average duration of activities greater than two weeks. Let is necessary to go to other segments.
us assume a ten-year network with activities of one week The first condition is verified by simply comparing the
duration. Under these assumptions the longest possible number of events ordered to the total number of events
chain would have less than 600 activities. I t is very un- present (within the segment).
likely than an entire chain of 600 activities would be in I n the case of a loop in the PERT network it is important
adverse order. However, let us assume the worst which to not only determine its existence but also to obtain some
yields a maximum value for M of 600. indication of its l o c a t i o n / T h e simplest diagnostic to imple-
I n the flow diagram of Figure 3 the lower portion of the ment is to list all events which have not been ordered but
diagram (location 11) is entered only once for each event have had their predecessor count decremented at least
as it is ordered. If we assume 30,000 events this loop once. Such events are either in a loop or have a predecessor
should require less than 1 minute of I B M 7090 time. This in a loop. (This requires saving the event labels and
represents a minimum total time which is required if the counts on tape in the preparatory portion of the program.)
net is in order to begin with. The upper double loop which A second procedure is to start over again from the input
is drawn heavily is where the effect of adverse event label- data working backwards. This can be achieved by simply
ing is felt. The outer loop (starting at location 1) is exe- reversing the role of predecessor and successor of each
cuted M times and the inner loop (starting at location 10) activity as they are read in. Any event which remains un-
E (the nmnber of events) times. Although M is the number ordered on both the forward and the backward passes is
of adverse labels in the network, the number of instructions either in a loop or lies between two loops. A third diag-
in these loops is small. The longer loops in the lower part nostic, using precedence matrices, has been developed
of the flow chart are dependent only on the size of the which will specify a loop exactly and probably even isolate
network and not its organization. The portion from 11 to two or more independent loops. However, it is too lengthy
30 is executed once for each event and the portion from 30 to report here and it would be difficult to apply to more
to 40 once for each activity. Using 2.18 microseconds for than 1,000 events.
the cycle time of the 7090, and estimating the number of The third condition which can arise occurs when the
instructions in each loop, the time, T, is: technique is expanded to handle larger networks which
T = 2.18 X 10.8 X 10(M + M X E + 4A + 6E) must be segmented. Here again there is an inherent ad-
seconds vantage in the method in that it is always known precisely
M = max number of events labeled out of order on any what segment each event is in. However, rather than
one path switch segments each time a request is generated, the
A = number of activities requests can be collected and effort continued with the
E = number of events current segment until an impasse is met. Just as in the
Assuming A = E = 30,000, (a) for the most favorable case of adversely numbered events, here too the fact that
case M = 0; T = 6.5 seconds; (b) for the worst case progress continues along parallel paths increases efficiency.
M = 600, T = 402 seconds. The number of segment exchanges required will not be
Thus the time ranges from under 10 seconds in the equal to the number of segment interfaces but probably
most favorable case to under 7 minutes in the worst case. to the maximum number of interfaces encountered in any
The value of 600 is a safe upper bound for M since even single path.
a PERT network with 30,000 activities is likely to last less
than 10 years and to have activity duration of more than 2 Since it is sometimes difficult to isolate loops, and several of
a week. Furthermore, it is unlikely to be labeled in worst the popular programs do not provide such diagnostics, it is worth
mentioning a helpful procedure. The method is based upon the fact
' The term "adverse label" is used here to mean an event which that any loop must contain at least one activity which is adversely
has a label greater than the label on an immediate successor labeled. Thus, it is useful to list all activities which have a prede-
event. The more usual case of an event label which is less than cessor event label which is higher in the sorting sequence than the
all its immediate successors is referred to as a "favorable label." successor event label. In many networks this list will contain only
In both cases the terms "greater than" and "less than" are in refer- a small percentage of the activities, and thus the search is con-
ence to the sorting sequence used. siderably narrowed.

Communications of the ACM 561


6. P r e p a r a t o r y a n d C o n c l u d i n g P r o c e s s i n g with the serial numbers are then assembled in a separate
There is, of course, much processing required in addition pass.
to the basic algorithm. In fact, in general, this processing Following the basic algorithm the location of the input
will require more time than the basic algorithm. However, records are merged with serialized activity records. This is
there are two distinguishing advantages here: (1) all the sorted by input file location to permit merging with the
processing other than the basic algorithm has open-ended rest of the input. Finally, the serial numbers are used as a
capacity; (2) for the most part standard, available routines key to sort the full input records into topological order.
such as conventional sorts are used. In general, the input All the processing other than the basic algorithm in-
will consist of a record for each activity with predecessor volves 3 sorting passes plus about 7 passes which are
event label, successor event label, and other information. essentially tape limited. Assuming that an efficient record
I t is assumed t h a t with the volume of data involved size is adapted at an early stage, all the tape passes should
there will be some form of file maintenance. I n this case require less than 20 minutes. Another 20 minutes should
we can assume or require the input to be sorted b y suc- be sufficient for the sorts (using, for example, IB sort).
cessor event label. Thus the overall time to topologically order 30,000 ac-
The first step is to abstract from each record the required tivities would be 40 to 50 minutes on an IBM 7090 de-
information: predecessor event label; successor event label; pending on the structure of the network.
location of record in the input. This is the working activity Summary
file. Most of the existing procedures for topological ordering
Since this is sorted by successor it is a simple m a t t e r to are time consuming and limited in capacity. The method
develop an event file from it with successor event labels presented permits a considerable increase in capacity over
and the counts of immediate predecessors (column c of other available methods. The use of available internal
Figure 2). (These two field items should be saved on tape storage is economized to the maxinmm extent possible to
for the loop diagnostic.) At the same time the location of achieve this capacity with minimum machine time.
the successor in the event file is appended to the working The author has neither experienced nor anticipated a
activity file (column b). need to deal with large networks. This paper is not in-
At this point the working activity file is resorted b y tended as an endorsement of large PERT networks, but
predecessor, following which the location of the input rather as technical assistance to those who have such
records m a y be placed on tape to be merged back when needs. For this reason, the author does not plan to imple-
needed. With this sort the predecessor block flag (column ment this technique, although he would be happy to aid
a) can be readily established. I n addition, since the prede- anyone who wishes to do so.
cessor labels of the activity file are in the same order as the An intriguing application, which would require the dis-
successor labels in the event file, it is possible to compare closed technique is analysis of the structure of a glossary.
the two sets of labels with a matching operation. Three Here each definition can be represented by a group of
conditions arise for each event: activities with a common successor event representing the
Predecessor Successor Type of Event Action term being defined. The predecessors would be the terms
1. Present Absent Initial Place location of activ- used within the definition. Topological order would then
ity block in starting
list be equivalent to arranging the definitions in textbook
2. Present Present Middle Place location of activ- fashion such that each term is defined in terms of pre-
ity event list (col- viously defined terms. The current technique could order
umn d) a specialized glossary including about 3,000 terms with
3. Absent Present Terminal Place terminal flag, Z, each term defined by use of about i0 other terms. The
in event list (column
d) author is currently pursuing this as well as other appli-
cations.
At this point the event labels are no longer required. ACKNOWLEDGMENTS
Columns a, b, c and d are all t h a t remain in core for the
Appreciation is expressed to Homer L. Smith of Boeing
use of the basic algorithm which m a y now proceed. At
Aircraft Company, Seattle for stimulating the preparation
first the starting list is exhausted and then the iterative
of this paper by raising the question of how far a small
searching for zeros commences. As the serial number for
capacity version of the algorithm could be extended.
each event is found several procedures can be used ac-
Appreciation is also extended to J. W. Froggatt and N.
cording to whether the topological order is to be b y
l(]einer of Westinghouse Baltimore for critical review of
predecessor or successor and whether the serial numbers
the technique.
are to serve solely as a sorting key or whether they are
also to be used in further computation. I n the first two REFERENCES
cases the serial numbers are overlaid on column b or d, 1. Anonymous (1958), Summary Report, Phases 1 and 2, Program
respectively. In the last case they are written in a separate Evaluation Research Task. U. S. Government Printing
Office, Washington, D . C . .
file (on tape if :maximum capacity is required) with the 2. D. J. LASSER(1961), Topological ordering of a list of randomly-
location of the event in the event file. The activity records numbered elements of a network. Comm. A C M 4, 12 (1961).

562 Communications of the ACM

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy