1. Introduction

1.1 Reconfigurable logic array processors

Advances in technology over the past few years have brought us to the brink of an exciting new discovery - a completely new type of computer with characteristics quite unlike anything which has gone before.

For many years the field of computing has been centred around general purpose processors. Great advances in technology have been made, with current general purpose CPUs (Central Processing Units) being many orders of magnitude more powerful than the first ones. The great flexibility of these processors encouraged investigation of widely varying applications and fostered great advances in software engineering, with many new leaps being made possible by the continual introduction of ever-more-powerful processors. Yet all this time both hardware and software fields were being guided by a Von Neumann-derived approach to computing - and naturally the people involed with computing have developed a way of thinking which corresponds to the technology which they are accustomed to using.

1.1.1 Programmable logic

In recent years great advances have been made in the field of programmable logic. Hardware designers have found ways to make the hardware design process easier and cheaper by making it more like a software design process - by creating increasingly dense programmable logic devices they became able to construct sophisticated circuits without ever touching a soldering iron. Even better, they could develop new designs much more quickly than before - by eliminating the slow prototyping step they could test their ideas much more rapidly. This trend to ever quicker prototyping led to static random access memory (SRAM) configurable field-programmable gate arrays (FPGAs). With the ability to change the circuit on a chip practically at will, the hardware designers discovered a flexibility never before available to them.

The potential of these devices as computing engines in their own right was not lost on programmers. Researchers of custom computing machines were soon employing FPGAs as a convenient re-usable alternative to their previous hardware endeavours and quickly obtained results which demonstrated the power of this new technology. Even so, programming the devices was still a painstaking process very reminiscent of the hardware design processes which had preceded it.

There is a large gap between the comfortable programming environment which computing people have created for themselves and this new computing technology of reconfigurable logic devices - yet the immense parallel processing power of reconfigurable logic makes it a very compelling medium indeed. However so much of conventional programming wisdom is based around what we have learned from decades of Von Neumann-style sequential processors that many of our techniques simply cannot be applied to reconfigurable logic.

1.1.2 The computing problem

The basic problem in computing is to perform as much computation as quickly as possible. Sequential ``general purpose'' processors are an excellent solution to one side of the computing problem - they are extremely well suited to rapid execution of complex algorithms which in turn makes them very flexible in application. This flexibility is achieved by focussing on the ability of a processor to execute single instructions as quickly as possible in succession. For the instant that a single instruction is being executed the whole processor is devoted to completing that instruction as quickly as possible - at least conceptually anyway. Realistically though most of the processor is unnecessary for most instructions, and so at any given instant most of the processor will be essentially wasted. Although it exists, this wastage is largely masked by rapid internal execution, optimisations such as pipelining and other limitations such as memory bandwidth. Despite these inherent limitations general purpose processors have no peer in their speed of executing complex algorithms.

Reconfigurable logic arrays on the other hand tackle the computing problem from an entirely different angle - rather than processing a large number of relatively simple instructions in sequential order very quickly they can be considered to be programmed with a single very complex instruction which can only be changed infrequently. The great advantage that logic arrays have is that their single instruction can describe a great deal of parallelism - and it is no surprise that they excel in applications which parallelise well. This parallelism isn't the coarse-grained parallelism tackled by conventional general purpose multiprocessors, rather it's an extremely fine-grained parallelism where thousands or millions of bit-level parallel operations can be occurring simultaneously.

Figure 1.1 - Instruction complexity and rate

Figure1.1 shows where these processors lie in terms of instruction speed and complexity. Expressed simply the computing power of a processor is the product of the number of instructions it can execute per second and the useful complexity of these instructions. Ideally a processor would be able to execute very complicated operations in rapid succession. Realistically it is difficult to achieve this ideal. Conventional sequential processors concentrate on instruction execution rate but keep the average complexity of the instructions quite low. Computers based around FPGAs can be configured to perform very complex individual operations but are very much slower to change between these very complicated ``instructions''. They are also subject to technology limitations which currently leave them unable to execute programs of the same level of complexity as sequential processors, hence their designation as ``special purpose'' in the diagram.

Clearly the gap between general purpose computers and a new breed of computers based on reconfigurable logic arrays remains to be bridged. Yet the technology of reconfigurable logic arrays is still very immature - by analogy to general purpose processors, reconfigurable devices are the equivalent of the earliest microcontrollers; the basic technology is there but there's a great deal of refinement still remaining to be done. In software tools there is also an enormous step to take. Current programming tools and programming techniques are poorly suited to reconfigurable logic. A great deal of work is being done both on the hardware aspect and the software aspect, and gradually the edges of the gap are being closed.

This thesis investigates the unexplored area between these two technologies. It attempts to lay foundations for one possible future of reconfigurable logic - a complete general-purpose, multitasking, virtualised computer system built entirely around reconfigurable logic.

1.2 Thesis aims and scope

The intent of this project was to see if it was possible to create a computer system based around reconfigurable logic, which could replace all the functions of a conventional computer system.

Existing reconfigurable logic array architectures are capable of performing limited computing tasks at high speed. They are however very limited in their scope - both in the types of task which they perform well and in the size of tasks which can be usefully attempted. For this reason most researchers to date have seen reconfigurable logic array processors as add-ons to conventional processors, the conventional processor dealing with the more complex sequencing aspects of the computation and the logic array acting as a configurable special-purpose coprocessor.

This limited application of logic arrays is due in part to fact that they are being used for tasks they were never intended for - designed originally as a quick and easy alternative to mask-programmed gate arrays they are now being used in custom computing applications well outside their original design brief. In recent times many advances have been made in adapting these designs to the new computing paradigm, but generally these advances have been focused at a relatively low level.

This project and the Switchblade architecture which grew out of it address the entire scope of computing tasks addressed by conventional processors. In addition to the low-level computing tasks attempted by other reconfigurable architectures, Switchblade is capable of performing all the tasks of a full multitasking, multiuser computer system - all in reconfigurable logic. Essentially this is a completely new system design for an entirely new type of computer system.

Figure 1.2 - Architectures gaining maturity over time

To make this step it is necessary to make some advances upon existing architectures. Current logic arrays fall short of the demands of this new application in their ability to efficiently handle high-speed context switching and rapid reconfiguration. They also offer no hardware support for important features such as virtualisation and security. The primary aim of this thesis is to address these issues, culminating in a new architecture which provides all the features expected from a modern multitasking, multiuser computer system. At the same time this architecture must exploit the potential of the new reconfigurable medium.

What this thesis does not attempt is to present a complete solution. Figure1.2 shows one view of how computer architectures are developed over time. A great deal of work still remains to be done in developing compilers and software tools for reconfigurable architectures. Excellent work on these problems is being done by others (as described in section 2.4), and is beyond the scope of this thesis.

Another area which invites exploration is operating systems for self-reconfiguring computer systems. This thesis touches on the operating system only where it is important to the low-level system design, leaving the higher level operating system issues for later investigation.

1.3 Thesis overview

This thesis investigates the potential of a new type of computing device - the reconfigurable logic array. It then describes a computer architecture for a complete general purpose computer based around these devices.

The thesis provides:

  1. A study and development of new architectural features made possible in reconfigurable logic arrays
  2. A study of new programming techniques which can be used with reconfigurable logic arrays
  3. An architecture which builds on the above investigations to develop a complete computer system based on reconfigurable logic

Chapter 2 reviews the current state of research into programmable logic, from the advent of programmable logic devices (PLDs) through to recent application of logic arrays to general-purpose computing tasks.

An investigation of the potential of reconfigurable logic arrays as computing devices follows in chapter 3. A number of new architectural features are explored and DRAMA, a generalised reconfigurable logic array architecture, is described as a test-bed for the architectural ideas which were later developed for Switchblade.

The fourth chapter is an investigation into new programming techniques. These techniques are made possible by the architectural features provided by DRAMA. A number of techniques are described and their usefulness is discussed. A detailed case study of a prime number algorithm which demonstrates these techniques is provided.

The remainder of the thesis takes the ideas garnered from the previous two chapters and develops them into a detailed computer architecture. The architectural features described in chapter 3 and the programming techniques described in chapter 4 act as a basis for the design. Chapter 5 describes a low-level logic array architecture based on the experience gained from the DRAMA abstract architecture, but augmented with new features designed to optimise its use of logic resources and enhanced to provide features important to a complete general-purpose computer system.

A genetic algorithm used to determine favourable design parameters for the logic array is described. A flexible routing scheme using multiplexed communication to optimise use of chip resources is also described, along with other optimisations including local directed broadcasts. Some other features specifically intended to adapt the logic array for use in a general-purpose computer system are also described, including security, programming interfaces and context switching.

Chapter 6 describes a higher-level on-chip network communication structure used to transform the logic array processor into a virtualised array. The virtualised array described is capable of demand-loading sections of code when required and provides location-independent addressing. The message passing mechanism and synchronisation primitives for parallel computation are also detailed.

The final chapter presents the conclusions of the thesis and sums up the work in brief. A bibliography follows for reference purposes. Appendix A is a glossary of the abbreviations included in the thesis. Appendix B contains three papers which were published as a result of the thesis.

1.4 Design parameters

The rather different goals of the Switchblade architecture imply a set of design parameters quite different from most reconfigurable logic arrays.
Suitable for large tasks
Many current logic array architectures are quite limited in the size of tasks which may be performed on the logic array. The arrays are limited in size and slow to reconfigure to new tasks so they are inherently better suited to small tasks. The Switchblade architecture should not have this limitation.

Suitable for highly sequential tasks
Where current logic array architectures focus on moving intensive, parallel computation into the logic array and doing the more sequential parts of the task in a conventional processor, this architecture should be able to execute even sequential code in an efficient way.

Multitasking
The architecture must be capable of running multiple independent programs simultaneously without unacceptable performance degradation.

Interactive
A computer built around this architecture should be usable by humans in much the same way as conventional computers are - through the use of interactive shells and graphical user interfaces with acceptable response times, rather than having only a batch mode as most current logic array architectures do.

Efficient use of array space
Logic array space is expensive and scarce. A resource which is at such a premium should be used efficiently, obtaining as much performance from the available logic as possible within the bounds of the other design requirements. This said, the intent was not to design an architecture around today's technical limitations but on the assumption that technology improvements will make much larger logic arrays available in the near future. The system need not necessarily be viable right now, but should plausibly be able to be implemented with silicon foundry advances which are likely to be available within the next few years.

1.5 Switchblade overview

The Switchblade project was conceived to address the above design parameters and in doing so produce a new logic array architecture which was suited to general purpose computing. The resultant design has some interesting attributes -

Figure 1.3 - Virtualising the logic resource
Virtualised logic
An important feature of most multi-tasking, multi-user computer systems is virtual memory. This allows many programs apparent simultaneous access to a limited memory pool. The same basic idea is applied here to the limited logic array resource - sections of the logic array can be allocated to tasks on a demand-loaded basis as shown in figure1.3. Location-independent addressing makes it possible to relocate circuits to any part of the array without the programs being aware of it.

The ability to context switch on a logic array gives multiple levels of multitasking, all of which are orthogonal and so available simultaneously -

  • Ultra fine-grained gate-by-gate parallelism within a task
  • Coarse-grained circuit-by-circuit parallel execution
  • Time-division multitasking by context switching

Network routed long-distance communication
Most existing logic array architectures are based on conventional methods of mapping algorithms to hardware. The nature of these circuits is to have a proportion of long interconnect lines which use large amounts of array space, yet transmit relatively little information. Some architectures [Dehon94] save space by multiplexing a number of these signals on the same physical lines. This has relatively limited application however as all the long lines to be multiplexed must originate and terminate in the same vicinity as each other.

Switchblade uses a network-routed communication system to perform long-distance communication. This routes long-range signals from any part of the array to any other through a unified interconnection system.

As well as dramatically reducing the die area devoted to long-distance communication, this interconnection system has an important part to play in logic array virtualisation. Sections of virtual circuit can be moved to any physical location in the array and the routing system can transparently remap their interconnections, meaning that placement and routing need no longer be entirely compile-time tasks.

The other feature provided by the network communication model is that it acts as a basis for interprocess communication and synchronisation between processes. These provide the primitives which allow a fully-fledged message passing operating system to be built.

Figure 1.4 - Four types of non-virtual interconnect

Time-multiplexed short-distance communication
Even on a local scale, a great deal of array resources are devoted to interconnect. Some of these resources are wasted on lengthy low-bandwidth lines while others are expended on allowing flexibility of routing. Switchblade employs a scheme which offers the space benefits of multiplexed transmission while simultaneously providing good flexibility of connection - and without the expenses normally associated with flexible routing schemes.

The short-distance interconnect also provides the ability to trade off array resources against interconnect. Areas may be configured with the level of interconnect appropriate to their needs, unlike existing architectures which often have problems with local starvation of either interconnect or logic resources. See figure1.4.

Fast context switching
Multiple configurations stores are employed to maximise context switching speed. Configuration changes can be targeted at specific subsections of the array so individual tasks can be context switched without affecting the remainder of the logic array.

Self-reprogrammability
An interface is provided to allow the logic array to directly reconfigure itself. This permits the operating system (which itself runs on the logic array) to manipulate the configuration of the logic array and also allows tasks to dynamically self-reconfigure their own circuitry if this technique can be used to improve performance.

Fast configuration loading
A high speed configuration interface is provided which allows copying of existing configurations from within the array or loading of new contexts to be carried out rapidly. Multiple sections of the array can be simultaneously and independently reconfigured.

Security
A capability-based security model is provided to restrict tasks to within their own security domain. Hardware and operating system features combine to ensure that tasks can't intrude on the operation of other tasks, either by accident or maliciously.

Genetically evolved array parameters
Rather than make arbitrary design decisions about logic array design parameters, a genetic algorithm is used to semi-automatically design the array. This approach may provide a more realistic set of parameters and deliver a more efficient logic array.

Sequential execution units
The logic array is specifically designed to be programmed with sequential execution units in situations where a task or part of a task is more conveniently executed in sequential code than in parallel circuitry. Since these sequential execution units are dynamically configured, many can be operating at once and units may be customised to their tasks in terms of word-length and the functional units available to them.

Real-time I/O system
The I/O system provides a simple interface to powerful internal features. ``Faults'' provide an interrupt-like ability to context switch in device driver circuitry and demand-load higher level device drivers. Time-critical device drivers can be ``wired'' on to the logic array to ensure rapid response.