Resurrecting Extinct Computers - The Connection Machine
Resurrecting Extinct Computers - The Connection Machine
Winston
Zoe-Rose
Chang
GuyLiu
Candidate Number: 1057025
MCompSci Computer Science - Part B
Trinity 2023
Word Count: 4989
Abstract
While the architectures of current commercial processors are well established, and relatively
static [12, 16], the early days of computing saw extensive experimentation and exploration
of alternative designs. These included the Connection Machine (CM-1) consisting of 65,536
Through the development of a cycle accurate simulator of the Connection Machine, and
several example programs, an evaluation of the machine has been conducted and its reasons for
failure analysed. An RTL hardware description of the Machine’s building block chip has also
been created, which would allow a full replica to be constructed; both of these are important
preservation steps for a piece of computing history at severe risk of being forgotten.
remarkably well, even against hardware from almost 40 years later, on certain tasks: a breadth-
first search algorithm runs at around 2 cycles per element, made even more astounding by
the 1 bit word size and approximately 700 cycle latency of message passing. However, these
factors become much more limiting in other tasks, stunting performance in some traditionally
First and foremost, I would like to thank my supervisor, who I credit with both sparking my
interest in this field, and providing expert guidance on the project and its direction.
I would also like to thank Dr Paul Franzon of North Carolina State University, who’s
excellent course on Verilog and digital ASIC design is freely available on YouTube1 , and was
an extremely valuable resource as I learned the language. So too was Stuart Sutherland’s
I’d also like to thank my friends and family, who have been in my corner continuously as I
stressed about this project and the rest of my commitments. I am eternally grateful to them
for putting up with me harping on and on about a “weird computer from the 80s” for the
past 6 months.
Finally, to my college cat, who’s company through my early morning library sessionsn has
1
https://youtube.com/playlist?list=PLfGJEQLQIDBN0VsXQ68_FEYyqcym8CTDN
2
https://www.sutherland-hdl.com/pdfs/verilog_2001_ref_guide.pdf
Contents
1 Introduction 3
1.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Background 7
2.1 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The Simulator 9
4 Hardware Implementation 13
1
5.5 Asymptotic Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 Conclusions 23
6.1 Reflection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
A Program Text 25
A.4 libcm.h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2
Chapter 1
Introduction
1.1 Motivations
The Connection Machine was a supercomputer designed in 1985 by W. Daniel Hillis [9]. It
presented a fundamentally different computer, that can be thought of as “smart memory,” [11]
which can be issued SIMD style instructions to operate on its data massively in parallel. The
CM-1 used 65536 parallel 1-bit processors each with their own memory, able to communicate
via message passing to solve a wide range of complex problems. Without active preservation
The overarching motivation of this project is to create a suite of tools allowing anybody
to understand the structure, benefits, and drawbacks of the machine, and allow them to write
programs for it, in order to preserve this strange and innovative architecture. As few as
seven Connection Machines were built [17], which are quickly ageing and may soon become
unusable.
number of reasons. Lots of systems and standards carry historical baggage; for example,
modern Intel processors support 16-bit operations to allow compatibility with programs from
almost 50 years ago [12], and North American television broadcasts at 29.97 frames per second
as a hangover from the analogue NTSC standard [2]. Understanding historical context is
important to understanding these systems and the logic behind their design. The Connection
3
Machine, niche as it is, likely still had an influence on modern supercomputing through its
descendant, the CM-5, which was for a time the world’s fastest supercomputer [18].
Modern CPU architecture in particular is very well established and hasn’t experienced
a true paradigm shift since the introduction of the ARM and other RISC processors in the
1980s [16], and arguably since the Manchester Baby implemented von Neumann architecture
in 1948 [5]. In order to spur innovation into such stagnant fields, it is invaluable to be inspired
1.2 Contributions
This project contributes several pieces of code related to the Connection Machine.
The primary contribution is libcm, a full, cycle accurate, low-level programmable sim-
ulator of the machine. libcm takes the form of a C library, allowing a developer to write
standard C code interspersed with library calls that issue instructions to the machine. This
mirrors the physical machine’s structure, whereby a sequential “host” computer would issue
instructions to the cells as required. In fact, libcm’s structure has been carefully designed to
reflect the machine’s structure at the expense of speed, so that the source code may act as a
clear reference for those trying to understand the machine’s structure. libcm is as accurate as
reasonably possible, however, due to the scarcity of sources and original hardware examples,
there are bound to be some minor differences, which are enumerated in Section 3.2.
A Verilog RTL description of the Connection Machine’s chip has also been provided. From
this, with some simple changes to ensure compatibility with specific technologies, it would be
possible to build a full replica of the machine, which is also important for preservation. This
Several programs for the Connection Machine, including vector dot product and breadth
algorithm [1] is also included, a historically significant algorithm which was run on the machine
during its development [8]. The nature of the machine makes its programs seem alien and
4
A thorough evaluation of the architecture using libcm and the aforementioned programs
has been conducted, in order to expose its strengths and weaknesses, and give evidence as
to why no similar architecture exists today. The results are mixed. The machine excels in
problems such as breadth first search, which can fully utilise the routing network and make
use of the available parallelism, performing this task at a rate of just 2 cycles per element.
formance is very poor, yielding a significantly worse CPE in vector dot product than the
Finally, as Connection Machine code is very different to sequential code, a debugging tool,
CMFrames, is also contributed. It allows full dumps of the machine’s state created at runtime
to be read and explored afterwards, to truly understand how the programs operate.
The next chapter briefly discusses the Connection Machine’s history, and provides an overview
of its instruction set and communication hardware. Chapters 3 & 4 then discuss in detail
the implementations of libcm and the Verilog chip, in particular the implementation of the
inter-processor communication router, the most complicated element of the machine. Their
structures are discussed, which are designed to reflect the structure of the machine to allow
Chapter 5 then goes on to describe and demonstrate some Connection Machine programs,
and provide timing results for them, contextualised with results of equivalent sequential pro-
grams on a modern CPU. This is followed by a discussion of those results, the machine’s
limitations more broadly, and how these may have been responsible for its failure. The
demonstrated programs were chosen to cover a breadth of use cases for the machine, includ-
ing using its routing network in pathological ways. It also provides a short discussion into
the advantages to asymptotic run time that would be achievable on an infinite Connection
Machine - whilst such a machine is obviously impossible to build, these advantages remain
5
Finally Chapter 6 provides the author’s reflection on the project, and outlines future work
6
Chapter 2
Background
2.1 History
The Connection Machine was designed in 1985 by Daniel Hillis [9] as part of his PhD thesis,
and later built by the company he founded, Thinking Machines Corporation (TMC) [13].
Many notable individuals were involved in the company, including physicist Richard Feynman
[8] and Internet Archive founder Brewster Kahle [4]. TMC produced several generations of
the machine, namely the original CM-1, the CM-2 which found uses in scientific computing
The CM-1 was built from 4096 chips, each of which contained 16 processing cells and a router.
Processing cells were very simple, consisting of 4096 bits of memory, alongside 16 1-bit flags1
7
• Two 8-bit truth tables, one for each of memory and flags
The cell takes the value of the bits referenced by addresses A and B, and the value of the flag
R, looks up the corresponding value in the memory truth table, and stores it in address A. A
similar process occurs for the flag truth table and the flag referenced in W. All of this occurs
only if the value in flag C, the condition flag, is equal to the condition variable [9].
The machine provides a simple grid system for the cells on a chip to communicate with
each other. Cells are able to send messages to their north, east, south, and west neighbours,
The chips’ routers are connected in a 12 dimensional hypercube topology. Cells can
“inject” messages into their router via a special flag, which will then be sent by the routing
network to the addressed cell. A cell can send a message to any other cell using their relative
addresses on the hypercube, which will be delivered via a special flag in the cell [9].
A full description of the hardware can be found in Chapter 5 of Hillis’s PhD thesis.
8
Chapter 3
The Simulator
The primary contribution of the project is a full simulator of the Connection Machine, pro-
vided as a C library named libcm. libcm seems to be the only existing low-level simulator
of the Connection Machine1 , and has been designed to be cycle accurate to the largest ex-
tent made possible by the scarce available sources. Full source code for libcm is available on
GitHub.
The structure of libcm was deliberately designed to be similar to the structure of the Con-
nection Machine to provide a transparent description of its behaviour. The library primarily
provides a struct representing the machine, which contains an array of pointers to chips. Each
chip is a struct containing a pointer to its router, and to its 16 processing cells. Each router
contains its buffered messages and other data necessary for its function, as described in Sec-
tion 3.3, and each cell is a struct containing its 4096 bits of memory and 16 flags. Executing
instructions on the machine is achieved by calling the function cm exe() on the top-level
struct, which in turn calls corresponding functions on the chips and finally the cells. Wires
between chips, used in inter-router communication, have been implemented using pointers,
1
A higher level simulator ia available at https://www.softwarepreservation.org/projects/LISP/
starlisp/sim/, which simulates the machines *Lisp interface
9
which are assigned when the initial cm build() function is called.
There is little available literature regarding the Connection Machine. It is therefore inevitable
that libcm will have some slight differences from the machine; the known differences are
enumerated below:
• Referral, the process by which routers deal with more messages than their buffers can
handle, is very poorly defined in Hillis’s thesis. libcm implements this using a simple
• Similarly, the communication protocol between routers and cells isn’t defined; libcm
uses a simple protocol whereby agents wishing to communicate first send a 1, for which
• Routers contain buffers for seven messages as in Hillis’s thesis [9], whereas the eventual
• The “Global Pin”, a 1-bit signal that may be asserted by any processor, is poorly defined
in Hillis’s thesis; this has been implemented by writing to a specific flag in the cell.
• libcm has a function to test if the router network is empty, which greatly increases the
ability of the routers to deal with congestion; it is unclear whether such functionality
Whilst much of libcm’s implementation is self explanatory, the router is a fairly complicated
piece of hardware that is worth describing in more detail. Similarly to other parts of the
10
{
Message *inports[DIMENSIONS];
Message **outports[DIMENSIONS];
Message *buffer[BUFSIZE];
uint32_t listening[4];
Message *partials[4];
uint32_t id;
} Router;
refer to them.
The inports array contains pointers to messages. When a message is received from
another router, a pointer to it is placed in the array entry corresponding to the dimension of
the hypercube along which it was sent. Similarly, buffer contains pointers to the messages
currently stored in the router. The buffer itself is ordered, with messages at lower array
listening and partials are used in message injection from the processors. At the start
of every petit cycle2 , all processors that wish to send a message write a 1 to their router data
flag. A maximum of 4 of those are selected, and their identities placed into the listening
array. A new message struct is also allocated on the heap, and its pointer is placed in the
corresponding entry in partials. For the remainder of the injection cycle, bits are copied
from cells’ the router data flag into these partial messages. Once the message is complete, it
is placed in the router’s buffer. This data flag is accessible via the flags array, containing
setting the dereferenced outport to a message pointer, routers are capable of sending messages
2
the name given by Hillis to the routers’ communication cycle, encompassing processors injecting messages,
the transferal of messages across the network, and the delivery of messages to destination processors
11
to each other.
Finally, *referer points to an arbitrary different router in the machine. During cm exe(),
these are simply assigned in sequential order to create a Hamiltonian cycle. In the event of
a buffer overflow, the incoming message can be offloaded to this other router to solve the
problem. The router’s absolute address, id, is also required to ensure that relative addresses
are maintained - by XORing this with the address in the message, we obtain the absolute
address of the message, which can then be XORed with the referrer’s ID to set the relative
address correctly.
The original Connection Machine’s interface used a sequential “host” computer that could
be programmed in *Lisp3 [9], which would issue instructions to the machine. libcm operates
libcm provides several other functions for interfacing with the machine, including checking
the routing network’s status, speeding up routers for faster testing, and reading from the
Global Pin.
libcm also provides the option to dump the state of the machine throughout the entire
run of the program into a zip archive. Dumps can then by analysed using CMFrames to aid
3
A Lisp variant with parallel functions and data structures, designed specifically for the Connection Machine
12
Chapter 4
Hardware Implementation
A “sketch Verilog” implementation of the Connection Machine’s chip has also been provided.
With minimal changes, this could be synthesised into an actual piece of hardware; if several
of these were built, it would be possible to build a full replica of the Machine. These changes
mostly involve the mapping of pins to input lines. The chip requires at least 80 pins, includ-
ing 55 for instruction parameters, bidirectional connections to 12 other routers, and several
connections to the host. More would be required to communicate with memory if this is not
As stated in Section 3.3, a lot of the building blocks of the Connection Machine are very
simple. Processor flags are just 1-bit registers, and the “ALU” turns out to be nothing more
than an 8x3 multiplexer. The router is the exception - it is in fact even more awkward to
implement in hardware than software due to its priority system. In most cases, calculations
involving priorities must be done in a single clock cycle, creating yet more complexity. A
The injector portion, responsible for taking input from cells, and the heart, responsible for
inter-router communication, are both fairly simple. Both simply read priority values, make
calculations as to which buffer(s) to use, and send their results to multiplexers. Priority
13
Figure 4.1: Block diagram of the router
calculations here are primarily done by searching for minimum/maximum values with a tree
of comparators. Note that these portions do not set priorities, only signal a dedicated unit to
The priority calculator is able to use multiple clock cycles. While a message is being sent or
received, its priority cannot change, meaning the new priority value should be calculated over
several cycles and updated following complete message transmission. The calculator therefore
can iterate over all 8 possible priority values, incrementing the priority of any message with
The most complicated part of the router is the ejector, responsible for sending messages
from the network to the cells, which is necessarily quite large and complex. It can be broadly
split into two components, named the identifier and the distributor. The identifier selects
the messages to be sent to cells, either by selecting all messages destined for a cell on this
14
chip, or using priority calculations depending on the delivery mode. This result is sent to the
distributor and to the priority calculator to mark these messages for deletion at the end of
the cycle. The distributor then takes the logical AND of each router’s selection bit and the
bit it’s trying to communicate, then uses an array of barrel shifters to place this bit on the
correct line to the cells, before taking the OR of all of these to produce the final output to
the cells.
15
Chapter 5
The Connection Machine failed, but it is still possible that aspects of its architecture could
be useful today; therefore it is useful to evaluate the machine. To this end, several programs
have been developed that demonstrate various aspects of the machine’s function, and their
Note that Connection Machine programs look very different to traditional sequential pro-
grams. Source code for the evaluation programs can be found in Appendix A.
Whilst working on the Manhattan Project, physicist Richard Feynman developed an algorithm
for calculating logarithms [8, 1], which relies on the fact that any numbers between 1.0 and
2.0 can be expressed as a product of terms of the form (1 + 2−k ). This expression is very easy
- and the logarithm can be found by summing the logarithms of the component terms from
a table [1]. This worked well on the Connection Machine as this table is small and could
be shared by all processors [8]. This algorithm has been implemented for libcm during this
project.
As this demonstrates simple SIMD operation, the program source is not particularly in-
teresting. It is very similar to the sequential version, but with typical mathematical functions
16
replaced with sequences of calls to cm exe().
program sets up 2 vectors on cell 0 of various chips1 , with the 2 vectors separated by only
a dimension. This allows messages to be sent between same-indexed entries in each vector
with a relative address containing a single high bit. The program calculates a dot product,
by means of sending each value from vector B into the corresponding cell in vector A, then
performing an integer multiplication, before adding up all these values along vector A in
O(logn) using a tree-like structure as demonstrated in Figure 5.1, completing the addition in
O(logn).
The previous programs are in a sense “well behaved” - communication is either not present
or is very highly structured, and organised such that router congestion will never occur. This
Graphs can be implemented fairly naturally on the Connection Machine, with each cell
with. Algorithms requiring traversing edges, such as breadth first search, can be implemented
computation proceeds in rounds - there are sets of discovered and undiscovered vertices, and
discovered vertices send messages to all their undiscovered neighbours. These messages include
the relative address of the sender, so when the algorithm terminates, the back pointers can
be followed to find the shortest path between a node and the initial vertex.
This algorithm is fundamentally different from the vector operation described in Section
5.2, as graphs have less structure than vectors - edges connect two arbitrary vertices. This
1
Using more cells would cause router congestion, affecting performance.
17
gives rise to the issue of router congestion, as there are many messages in the router network
going in many different directions. Consequently, messages may not arrive when expected,
but several petit cycles later, and injecting messages may fail if the router is already full. It
The solution was to store a bitmap in each cell’s memories, indicating whether or not it
had yet sent a message along the corresponding edge. The vertices all have fixed out-degree,
allowing this to be constructed. The program iterates over the edges, sending along that
edge if its bit is high, and turning it low if the transmission succeeds. This repeats until all
messages have been sent, which can be detected by assertions on the Global Pin.
This unstructured communication can saturate certain routers extremely quickly. During
testing, even when increasing buffer size to a hundred messages, overflow occurred. Referral
For evaluation, the machine will be compared against a typical modern CPU running sequen-
tial implementations of the same algorithms on a single core. Results are given both in real
time and cycles per element, taking a 4MHz clock for the Connection Machine and a 3.6GHz
clock for the modern CPU. Note that the real time results for the sequential programs are
adjusted for the overheads of generating their inputs and running the testing loop. Each
sequential implementation operated for 10,000 cycles on a problem of size 65536, the size that
fits onto the Connection Machine, except the vectors program, which multiplies 2048-vectors
Results for the Connection Machine were taken using libcm, measured in cycles, taking
an average of around 10 independent runs for BFS due to its variable run time depending on
the graph structure. Cycles per element and estimated real time are provided, as well as an
estimate of the time required to process 10,000 problems2 for comparison to the sequential
18
Program Time (s) CPE
Feynman’s Unoptimised 22.0 120.6
Logarithm -O3 7.4 40.5
Program Total Cycles CPE Real Time (ms) Equivalent Time (s)
Feynman’s Logarithm 7196 0.1 1.8 18.0
Vectors 15787 7.7 0.4 3946.8
BFS 138705 2.116 34.7 346.8
The evaluative testing paints a very interesting picture of the Connection Machine. Despite
a time difference of nearly 40 years, it performs some 20% faster than the modern processor
running unoptimised code in the calculation of fixed point logarithms. This is, however, to be
taken with a pinch of salt - it is unsurprising that the connection machine would perform better
simply because it can calculate 65536 logarithms at once, and a GPU logarithm algorithm
with some parallelism would perhaps be a fairer comparison. The -O3 optimised executable
A more interesting result is that for breadth first search, which is difficult to parallelise
on modern systems. Using the time for the unoptimised search, the Connection Machine
runs around 20 times slower - very impressive, considering the slow speed of message passing
and the 1-bit word length. Such tasks are those at which the Connection Machine excels,
particularly due to the ability to send many messages in parallel to offset their latency. This
was surprising - it was expected that this algorithm would perform poorly due to router
congestion.
The only really disappointing result was the vector multiplication. The slowness is ex-
plained by the 700 cycle latency of message passing, creating a large constant on the theoretical
O(logn) algorithm. Dot product of vectors is also an operation that can be well parallelised on
19
Task Algorithm Average Case Worst Case
Vector Addition Textbook O∞ (1) O∞ (1)
Dot Product Textbook O∞ (logn) O∞ (logn)
Matrix Multiplication Textbook O∞ (nlogn) O∞ (nlogn)
modern processors using the Streaming SIMD Extension (SSE) and its successors for x86[10].
SSE is used for the -O3 optimised vectors program, explaining its large speedup.
Table 5.3 shows the asymptotic complexity of some algorithms, where O∞ (f (x)) denotes the
asymptotic run time on an idealised, infinitely large Connection Machine. “Assertion Search”
is a very simple algorithm where cells containing the requested element simply assert over
the Global Pin, and “Treelike Search” has cells communicate in a similar manner to the dot
Though obviously impossible to construct in real life, this provides useful results about
the potential speedup available to modestly sized problems. With modern manufacturing, a
Connection Machine with billions of cells is potentially feasible, providing significant speedup
The Connection Machine has been demonstrated to be an extremely powerful computer, able
to perform remarkably well against modern machines, with the massive parallelism able to
provide great speedup for modestly sized problems. What follows are opinions as to why it
20
The primary reason for the failure of the Connection Machine was likely the business
case. The machine was expensive and its target towards AI research resulted in low sales [17].
Though it did have some uses in scientific computing, such as in quantum chromodynamics
Economic factors may explain the failure of the machine, but less so the architecture,
which, as these results demonstrate, was also flawed. Hillis clearly intended to construct a
very general purpose, blank-slate machine that could be used for a wide range of tasks; in
doing so, he built a “jack of all trades, master of none.” One of the greatest limitations is
the 1-bit word length, meaning that operations that would typically be thought of as atomic,
such as addition and logical operations, take time relative to the word length of the values -
Message passing, a major mechanism responsible for the Machine’s power, has a very high
latency. In the test programs that made use of it, most (upwards of 95% of) processor time
was wasted waiting for messages to arrive, severely stunting the parallel advantage. It excels
in simple SIMD applications, but this is neither surprising nor interesting due to the high
thread count.
21
Figure 5.1: Communications made in vector multiplication.
22
Chapter 6
Conclusions
The primary contribution of this project is libcm, which is likely the only cycle accurate
simulator of the Connection Machine. Its existence is critical for the preservation of this rare
The evaluation of the machine also helps to explain why the innovative architecture failed,
despite its incredible power in some applications, and has been superseded by more specialised
would allow a full replica to be built, which will become important as original specimens
degrade.
6.1 Reflection
Being a great fan of historic computing, I really enjoyed researching the Connection Machine
and implementing libcm during this project. Though it is extremely difficult to find sources
on the machine, I feel I did a good job of recreating it as accurately as possible without access
to real hardware.
Having learnt about Verilog through Franzon’s course over the summer, I was eager to
make a start on the project, and able to demonstrate a mostly functional libcm by the
Christmas Vacation. Hilary Term was spent mostly writing the RTL description, writing
the example programs, and adding finishing touches to libcm. Due to my inexperience with
23
Verilog, I found that part of the project extremely challenging.
The implementation of CMFrames, used to analyse states of libcm to find bugs in pro-
grams, was very awkward due to the way dump data is gathered. Data is simply placed into
a large binary file which is then placed into a zip archive on every cycle of the run of the
program. This makes extraction of data in CMFrames very difficult and necessitated messy
code.
Programs for the Connection Machine were originally written in *Lisp [9]. libcm only adds
be run on libcm. It would be nice to see a front-end built for libcm that allows *Lisp programs
to be run, both for historical preservation purposes and to make developing for libcm easier.
This could be achieved by means of a transpiler, or perhaps with help from the existing *Lisp
simulator1 .
Seeing as libcm is intended to act as a definitional simulator for the Connection Machine,
be eliminated and better document the machine’s function. However, owing to the limited
number of machines produced, working examples are very scarce - at least one is located at
the Computer History Museum, Mountain View, California [15], alongside an example of the
empty casing [14]. I am unaware of any examples located outside of the United States, or
Of course, more investigation could always be done into applications of the Connection
Machine to better ascertain what tasks it provides advantages in. Perhaps a derivative archi-
tecture could be designed that fixes the problems outlined in Section 5.6. However, the ma-
chine’s fundamental problems, as well as the fact that nobody has tried to build a derivative,
make me sceptical that such architectures will ever be useful in solving real world problems.
1
https://www.softwarepreservation.org/projects/LISP/starlisp/sim/
2
CM-2 examples are more numerous, including at least one in Sweden [6]
24
Appendix A
Program Text
Connection Machine programs look very different from traditional sequential programs. Below
is source code for the example programs used in Chapter 5, and the interface header for libcm.
#include "connection_machine.h"
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
2147483648, // 0
1256197405, // 1
691335320, // 2
364911162, // 3
187825021, // 4
95335645, // 5
48034513, // 6
25
24110347, // 7
12078627, // 8
6045200, // 9
3024074, //10
1512406, //11
756295, //12
378171, //13
189091, //14
94547, //15
47274, //16
23637, //17
11819, //18
5909, //19
2955, //20
1477, //21
739, //22
369, //23
184, //24
92, //25
46, //26
23, //27
12, //28
6, //29
3, //30
1 //31
};
26
* (https://cstheory.stackexchange.com/users/2150/marcus-ritt)
* https://cstheory.stackexchange.com/q/3469
*/
int main()
/* First things first is to load the numbers to be processed into the cells
* of the connection machine. I’ll use 16 32 bit numbers, each one greater
* pregenerated (as well as the extreme cases). These will be placed in the
*/
cm *machine = cm_build();
srand(time(NULL));
Cell *proc;
uint32_t i;
27
/* For the sake of demonstration, we can transfer the 1024 bit log table
* into every cell one bit at a time, so the other processors can play
*/
/* Designate space for the other integers required. r will live at 1024-
* 1055. k will be stored here in the host. tempr will be stored at 1056-
*/
uint32_t k;
28
* backill with 0s if necessary.
*/
*/
/* Now, compare tempr to s. This can be done with 2 flags and iterating
* over the number big end first. Both flags are initially high, and the
* in s bigger than the bit in tempr, then flag a goes down, s is bigger,
29
*/
cycles();
return 0;
30
}
#include "connection_machine.h"
#include <stdio.h>
#include <time.h>
#include <stdlib.h>
int main()
* through 32 for the right vector. Note that we can really set these out
*/
*/
cm *machine = cm_build();
shouldntOr(machine);
slowMode(machine);
31
shouldntDump(machine);
srand(time(NULL));
Cell *proc;
uint32_t i;
proc = machine->chips[i]->cells[0];
proc->memory[0] = 1 << 6;
proc = machine->chips[2048+i]->cells[0];
* precisely the length of the 32 bit numbers we’re storing! First, we need
* to get whether we’re going to send or not into a flag. This is gonna be
*/
petit_sync(machine);
32
/* Now actually signal to send the message */
data flag*/
/* The address of the message should be set so that only bit 5 of the
*/
/* Now we can start to pass in the actual data over the next 32 cycles.
*/
while (! machine->globalPin)
/* Listen for the incoming message, and copy the contents into memory */
33
for (i = 0; i < 32; i++)
* However, only the lower 32 bits will be sent for addition due to test
*/
/* Decide whether to actually run this cycle, based on the bit in the
* original number
*/
uint32_t j;
* having each vec0 entry send on a dimension each turn its current running
34
* value, add them up, then repeat for all 4 dimensions relevant. Each one
* Due to the numbers being sufficiently small, the incoming value can be
* stored at 3968. Then they can be added to the running 32 bit total at
* First, it will be necessary to get vec0 to have flag 15 set high. This
*/
petit_sync(machine);
uint32_t j;
35
/* Now, we can drop cycles until the message is delivered */
while (! machine->globalPin)
cycles();
return 0;
* Each processor will function as the vertex of a graph. The first 1024 bit
* will make up 64 16 bit pointers to other nodes in the graph (if we want to
* make a less dense graph, we can just stop generating random pointers and
* replace the later pointers with self loops). The next 64 bits will act as
* a bitmap - initally all set to 1. The first pointer will be self loop for
* easiness
* all processors. Name these undiscovered (N), active (A), and done (D).
36
* These can be represented by bits in flags
* pseudocode:
* if successful: bitmap(i) = 0
* inbox(0) = 0
* if proc in N:
* D = D u A
* N = N\{A}
* next round
* else:
37
*
*/
#include "connection_machine.h"
#include <stdio.h>
#include <stdint.h>
#include <time.h>
#include <stdlib.h>
#define DEGREE 8
int main()
cm *machine = cm_build();
shouldntOr(machine);
slowMode(machine);
shouldntDump(machine);
srand(time(NULL));
Cell *proc;
uint32_t i, j;
proc->memory[0] = i >> 8;
38
proc->memory[1] = i & 255;
//Lets also put the processors into the undiscovered set here too
// printf("%u\n", i);
uint32_t d = 0;
uint32_t a = 1;
uint32_t n = 65535;
while (a)
39
{
petit_sync(machine);
//////////SENDING PHASE\\\\\\\\\\
//Copy the bit from the bitmap into the message - indicate we want to
//Format bit
//Comandeer bit 4095 as xor bit. Copy in the relative ROUTER address of
//the pointerplus the absolute CELL adress. But first, 1 then 15 0s.
40
for (j = 0; j < 4; j++)
//We now need to check the handshake and update the bitmap. The bitmap
//will be set to the and of the handshake and its current value.
//////////WAITING PHASE\\\\\\\\\\
//The inbox will be the last 32 bits, 4064-4095. The longstore will be
41
else
timeout++;
//////////RECEIVING PHASE\\\\\\\\\\
//Very simple - just copy the message directly into the inbox!
//////////CONTROL STAGE\\\\\\\\\\
//WE NEED TO COPY TO LONGSTORE HERE orelse the last one won’t send :(
42
//Now all messages have had a chance to send. We can evaluate to see if
//This walks over the bitmaps in As and ors them all. If any are 1, we
//We can now check the global pin to see if we’re done.
//The inbox will be the last 32 bits, 4064-4095. The longstore will be
43
cm_exe(machine, 4064, 0, 0, 0, 15, 1, SETZ, IDM, 0);
uint16_t timeout = 0;
else
timeout++;
//////////RECEIVING PHASE\\\\\\\\\\
//Very simple - just copy the message directly into the inbox!
continue;
//We now need to check for outstanding messages. I thought for a LONG
//time about the best way to do this, including stack and cascading
44
//machine just makes this VERY awkward. Instead, I’m going to cheat, and
//assume there’s a line that is set low when all routers are empty. I’ll
//machine libraries. This simply while there are still messages in the
while (network_empty(machine))
//The inbox will be the last 32 bits, 4064-4095. The longstore will be
uint16_t timeout = 0;
else
45
timeout++;
//////////RECEIVING PHASE\\\\\\\\\\
//Very simple - just copy the message directly into the inbox!
//Cool, we’re here, which means the round is actually done! That means we
//just need to redo our sets, check if we’re done, then move on to the
//Now, put correct processors from N with into A and remove from N
46
//else, some proc is not in D, so continue with the new round.
//abort();
d = 0;
a = 0;
n = 0;
else d++;
cycles();
A.4 libcm.h
#ifndef CM_CM_H_
#define CM_CM_H_
#include "chip.h"
47
typedef struct
uint32_t petitCounter;
uint8_t shouldOr;
uint8_t slowMode;
uint8_t globalPin;
uint8_t dump;
} cm;
cm *cm_build();
48
uint8_t shouldDump(cm *machine);
void cycles();
#define OR 0b01111111
#endif
49
Bibliography
com/q/3469.
[2] ATSC Standard: Video - HEVC. Standard. Accessed 11-5-23. Washington, D.C., US:
com/wp-content/uploads/2023/04/A341-2023-03-Video-HEVC.pdf.
[3] K. E. Batcher. “Sorting Networks and Their Applications”. In: Proceedings of the
April 30–May 2, 1968, Spring Joint Computer Conference. AFIPS ’68 (Spring). At-
lantic City, New Jersey: Association for Computing Machinery, 1968, pp. 307–314. isbn:
1468075.1468121.
[4] Joshua Benton. After 25 years, Brewster Kahle and the Internet Archive are still work-
years - brewster - kahle - and - the - internet - archive - are - still - working - to -
[5] B. Copeland. “The Manchester Computer: A Revised History Part 2: The Baby Com-
puter”. In: IEEE Annals of the History of Computing 33.1 (2011), pp. 22–37. doi:
10.1109/MAHC.2010.2.
50
[6] DigitalMuseum. Parallelldator. Accessed 01-05-23. url: https://digitaltmuseum.se/
021027765253/parallelldator.
[7] T. Alan Egolf. “Scientific Application of the Connection Machine at the United Tech-
of the Connection Machine. NASA Ames Research Centre, Moffett Field, California:
World Scientific Publishing Co. Pte. Ltd., Sept. 1988, pp. 38–63. isbn: 9971509695.
[8] W. Daniel Hillis. “Richard Feynman and the Connection Machine”. In: Physics Today
[9] W. Daniel Hillis. The Connection Machine. ACM Distinguished Theses. Cambridge,
[10] Intel. “Intel® 64 and IA-32 Architectures Software Developers Manual Volume 1: Basic
Architecture”. In: Accessed 24-3-23. Dec. 2022, pp. 5–22. url: https://cdrdv2.intel.
com/v1/dl/getContent/671200.
[11] Brewster A. Kahle and W. Daniel Hillis. “The Connection Machine Model CM-1 Ar-
chitecture”. In: IEEE Transactions on Systems, Man, and Cybernetics 19.4 (1989),
[12] Bruno Lopes et al. ISA aging: A X86 case study. Accessed 11-5-23. 2013. url: https:
//www.researchgate.net/profile/Rafael- Auler/publication/260112900_ISA_
X86-case-study.pdf.
[13] John Markoff. U.S. Awards Computer Contract: Thinking Machines Gets $12 Million
Company Nov 29, 1989; Last updated - 2010-05-22. Nov. 1989. url: https://www.
proquest.com/historical-newspapers/u-s-awards-computer-contract/docview/
110347811/se-2.
51
[14] Computing History Museum. Artifact Details - Connection machine 1 supercomputer
catalog/102691297.
[15] Computing History Museum. Artifact Details - Connection Machine CM-1. Accessed 01-
[16] Leonid Ryzhyk. “The ARM Architecture”. In: Chicago University, Illinois, EUA (2006).
[17] Gary A. Taubes. The Rise and Fall of Thinking Machines. https://www.inc.com/
[18] TOP500 LIST - JUNE 1993. Accessed 11-5-23. June 1993. url: https://top500.org/
lists/top500/list/1993/06/.
52