Advances in technology over the past few years have brought us to the brink of an exciting new discovery - a completely new type of computer with characteristics quite unlike anything which has gone before.
For many years the field of computing has been centred around general purpose processors. Great advances in technology have been made, with current general purpose CPUs (Central Processing Units) being many orders of magnitude more powerful than the first ones. The great flexibility of these processors encouraged investigation of widely varying applications and fostered great advances in software engineering, with many new leaps being made possible by the continual introduction of ever-more-powerful processors. Yet all this time both hardware and software fields were being guided by a Von Neumann-derived approach to computing - and naturally the people involed with computing have developed a way of thinking which corresponds to the technology which they are accustomed to using.
In recent years great advances have been made in the field of programmable logic. Hardware designers have found ways to make the hardware design process easier and cheaper by making it more like a software design process - by creating increasingly dense programmable logic devices they became able to construct sophisticated circuits without ever touching a soldering iron. Even better, they could develop new designs much more quickly than before - by eliminating the slow prototyping step they could test their ideas much more rapidly. This trend to ever quicker prototyping led to static random access memory (SRAM) configurable field-programmable gate arrays (FPGAs). With the ability to change the circuit on a chip practically at will, the hardware designers discovered a flexibility never before available to them.
The potential of these devices as computing engines in their own right was not lost on programmers. Researchers of custom computing machines were soon employing FPGAs as a convenient re-usable alternative to their previous hardware endeavours and quickly obtained results which demonstrated the power of this new technology. Even so, programming the devices was still a painstaking process very reminiscent of the hardware design processes which had preceded it.
There is a large gap between the comfortable programming environment which computing people have created for themselves and this new computing technology of reconfigurable logic devices - yet the immense parallel processing power of reconfigurable logic makes it a very compelling medium indeed. However so much of conventional programming wisdom is based around what we have learned from decades of Von Neumann-style sequential processors that many of our techniques simply cannot be applied to reconfigurable logic.
The basic problem in computing is to perform as much computation as quickly as possible. Sequential ``general purpose'' processors are an excellent solution to one side of the computing problem - they are extremely well suited to rapid execution of complex algorithms which in turn makes them very flexible in application. This flexibility is achieved by focussing on the ability of a processor to execute single instructions as quickly as possible in succession. For the instant that a single instruction is being executed the whole processor is devoted to completing that instruction as quickly as possible - at least conceptually anyway. Realistically though most of the processor is unnecessary for most instructions, and so at any given instant most of the processor will be essentially wasted. Although it exists, this wastage is largely masked by rapid internal execution, optimisations such as pipelining and other limitations such as memory bandwidth. Despite these inherent limitations general purpose processors have no peer in their speed of executing complex algorithms.
Reconfigurable logic arrays on the other hand tackle the
computing problem from an entirely different angle - rather than
processing a large number of relatively simple instructions in
sequential order very quickly they can be considered to be programmed
with a single very complex instruction which can only be changed
infrequently. The great advantage that logic arrays have is that their
single instruction can describe a great deal of parallelism - and it
is no surprise that they excel in applications which parallelise well.
This parallelism isn't the coarse-grained parallelism tackled by
conventional general purpose multiprocessors, rather it's an
extremely fine-grained parallelism where thousands or millions of
bit-level parallel operations can be occurring simultaneously.
Figure1.1 shows where these
processors lie in terms of instruction speed and complexity.
Expressed simply the computing power of a processor is the product of
the number of instructions it can execute per second and the useful
complexity of these instructions. Ideally a processor would be able
to execute very complicated operations in rapid succession.
Realistically it is difficult to achieve this ideal. Conventional
sequential processors concentrate on instruction execution rate but
keep the average complexity of the instructions quite low. Computers
based around FPGAs can be configured to perform very complex
individual operations but are very much slower to change between
these very complicated ``instructions''. They are also subject to
technology limitations which currently leave them unable to execute
programs of the same level of complexity as sequential processors,
hence their designation as ``special purpose'' in the diagram.
Clearly the gap between general purpose computers and a new breed
of computers based on reconfigurable logic arrays remains to be
bridged. Yet the technology of reconfigurable logic arrays is still
very immature - by analogy to general purpose processors,
reconfigurable devices are the equivalent of the earliest
microcontrollers; the basic technology is there but there's a great
deal of refinement still remaining to be done. In software tools
there is also an enormous step to take. Current programming tools and
programming techniques are poorly suited to reconfigurable logic. A
great deal of work is being done both on the hardware aspect and the
software aspect, and gradually the edges of the gap are being closed.
This thesis investigates the unexplored area between these two
technologies. It attempts to lay foundations for one possible future
of reconfigurable logic - a complete general-purpose, multitasking,
virtualised computer system built entirely around reconfigurable
logic.
The intent of this project was to see if it was possible to
create a computer system based around reconfigurable logic, which
could replace all the functions of a conventional computer system.
Existing reconfigurable logic array architectures are capable of
performing limited computing tasks at high speed. They are however
very limited in their scope - both in the types of task which they
perform well and in the size of tasks which can be usefully
attempted. For this reason most researchers to date have seen
reconfigurable logic array processors as add-ons to conventional
processors, the conventional processor dealing with the more complex
sequencing aspects of the computation and the logic array acting as a
configurable special-purpose coprocessor.
This limited application of logic arrays is due in part to fact
that they are being used for tasks they were never intended for -
designed originally as a quick and easy alternative to
mask-programmed gate arrays they are now being used in custom
computing applications well outside their original design brief. In
recent times many advances have been made in adapting these designs to
the new computing paradigm, but generally these advances have been
focused at a relatively low level.
This project and the Switchblade architecture which grew
out of it address the entire scope of computing tasks addressed by
conventional processors. In addition to the low-level computing tasks
attempted by other reconfigurable architectures, Switchblade
is capable of performing all the tasks of a full multitasking,
multiuser computer system - all in reconfigurable logic. Essentially
this is a completely new system design for an entirely new type of
computer system.
To make this step it is necessary to make some advances upon
existing architectures. Current logic arrays fall short of the
demands of this new application in their ability to efficiently
handle high-speed context switching and rapid reconfiguration. They
also offer no hardware support for important features such as
virtualisation and security. The primary aim of this thesis is to
address these issues, culminating in a new architecture which
provides all the features expected from a modern multitasking,
multiuser computer system. At the same time this architecture must
exploit the potential of the new reconfigurable medium.
What this thesis does not attempt is to present
a complete solution. Figure1.2 shows one
view of how computer architectures are developed over time. A great
deal of work still remains to be done in developing compilers and
software tools for reconfigurable architectures. Excellent work on
these problems is being done by others (as described in section 2.4),
and is beyond the scope of this thesis.
Another area which invites exploration is operating systems for
self-reconfiguring computer systems. This thesis touches on the
operating system only where it is important to the low-level system
design, leaving the higher level operating system issues for later
investigation.
This thesis investigates the potential of a new type of computing
device - the reconfigurable logic array. It then describes a computer
architecture for a complete general purpose computer based around
these devices.
The thesis provides:
Chapter 2 reviews the current state of research into programmable
logic, from the advent of programmable logic devices (PLDs) through
to recent application of logic arrays to general-purpose computing
tasks.
An investigation of the potential of reconfigurable logic arrays
as computing devices follows in chapter 3. A number of new
architectural features are explored and DRAMA, a generalised
reconfigurable logic array architecture, is described as a test-bed for
the architectural ideas which were later developed for
Switchblade.
The fourth chapter is an investigation into new programming
techniques. These techniques are made possible by the architectural
features
provided by DRAMA. A number of techniques are described and
their usefulness is discussed. A detailed case study of a prime
number algorithm which demonstrates these techniques is provided.
The remainder of the thesis takes the ideas garnered from the
previous two chapters and develops them into a detailed computer
architecture. The architectural features described in chapter 3 and
the programming techniques described in chapter 4 act as a basis for
the design. Chapter 5 describes a low-level logic array architecture
based on the experience gained from the DRAMA abstract
architecture, but augmented with new features designed to optimise
its use of logic resources and enhanced to provide features important
to a complete general-purpose computer system.
A genetic algorithm used to determine favourable design parameters
for the logic array is described. A flexible routing scheme using
multiplexed communication to optimise use of chip resources is also
described, along with other optimisations including local directed
broadcasts. Some other features specifically intended to adapt the
logic array for use in a general-purpose computer system are also
described, including security, programming interfaces and context
switching.
Chapter 6 describes a higher-level on-chip network communication
structure used to transform the logic array processor into a
virtualised array. The virtualised array described is capable of
demand-loading sections of code when required and provides
location-independent addressing. The message passing mechanism and
synchronisation primitives for parallel computation are also
detailed.
The final chapter presents the conclusions of the thesis and sums
up the work in brief. A bibliography follows for reference purposes.
Appendix A is a glossary of the abbreviations included in the thesis.
Appendix B contains three papers which were published as a result of
the thesis.
Figure 1.1 - Instruction complexity and rate
1.2 Thesis aims and scope
Figure 1.2 - Architectures gaining maturity over time
1.3 Thesis overview
1.4 Design parameters
The rather different goals of the Switchblade architecture
imply a set of design parameters quite different from most
reconfigurable logic arrays.
The Switchblade project was conceived to address the above
design parameters and in doing so produce a new logic array
architecture which was suited to general purpose computing. The
resultant design has some interesting attributes -
The ability to context switch on a logic array gives multiple levels
of multitasking, all of which are orthogonal and so available simultaneously -
Figure 1.3 - Virtualising the logic resource
Switchblade uses a network-routed communication system to perform long-distance communication. This routes long-range signals from any part of the array to any other through a unified interconnection system.
As well as dramatically reducing the die area devoted to long-distance communication, this interconnection system has an important part to play in logic array virtualisation. Sections of virtual circuit can be moved to any physical location in the array and the routing system can transparently remap their interconnections, meaning that placement and routing need no longer be entirely compile-time tasks.
The other feature provided by the network communication model is that it acts as a basis for interprocess communication and synchronisation between processes. These provide the primitives which allow a fully-fledged message passing operating system to be built.
Figure 1.4 - Four types of non-virtual interconnect
The short-distance interconnect also provides the ability to trade off array resources against interconnect. Areas may be configured with the level of interconnect appropriate to their needs, unlike existing architectures which often have problems with local starvation of either interconnect or logic resources. See figure1.4.