3. Abstract architecture

This chapter describes some of the basic ideas behind the Switchblade architecture and then presents an abstract logic array architecture which is capable of testing these ideas. This abstract logic array architecture was the test bed for development of some of the architectural features around which this thesis is based. This DRAMA architecture (``Dynamically Reconfigurable Abstract Machine Architecture'') was first published in 1995 [Saleeba95].

3.1 Architectural features

At the time this initial research was carried out the field of configurable computing was young. Almost no research on run-time reconfiguration existed, the exception being a conference paper which related specifically to the Plessey ERA architecture [HastieCliff90]. This paper discussed run-time reconfiguration, partial configuration and the idea of the ``hardware subroutine''. Unfortunately the idea was before its time and the work went largely unnoticed.

In 1993 the preliminary work for this thesis resulted in a paper which discussed run-time reconfiguration, partial configuration, multiple configuration and self-reconfiguration [Saleeba93]. This was soon followed by a number of other papers by various researchers related to run-time reconfiguration.

3.1.1 Run-time reconfiguration

Run-time reconfiguration implies an ability to change configuration during execution of a program. Early logic array processors used a fixed configuration for the duration of execution of a program. Run-time reconfiguration allows the configuration to change with the needs of the program over time [Foulk93] [Algotronix94].

This scheme provides better utilisation of the reconfigurable logic than systems which can only set the configuration once per program. An algorithm can use this feature to select what functions might be most useful to it at any given point during execution. This can be a big advantage in large programs which change computational characteristics frequently.

Special architectures are required to support run-time reconfiguration. Some method must be provided to quickly modify the contents of the configuration store, and various architectures to support this have been presented by Saleeba [Saleeba93], Vandenbout and Morris [VandenboutMorris92], and Athanas [Athanas92].

3.1.2 Partial reconfiguration

Partial reconfiguration enables a logic array to change only part of its area during reconfiguration. A system can use this to discard an old section of circuit and replace it with something more useful, while retaining most of the existing circuit. In most cases it will be possible for computation to proceed while reconfiguration takes place. More subtly, this can be used by the host to direct the mode of computation, for instance by modifying ``hardwired'' constants on the fly [Foulk93]. This approach can improve utilisation of array space.

Double-buffered reconfiguration is also made possible by the partial reconfiguration feature. While a program is running on one part of the array the system could anticipate that a new configuration will be needed soon and be preloading it while the existing configuration is still running. This can eliminate the effects of configuration loading time completely as long as the need for new configurations can be anticipated accurately and they can be loaded in quickly enough.

3.1.3 Multiple configuration stores

Multiple configurations take double-buffering a step further. The system has a set of configuration stores available to it, only one of which is controlling the configuration of the logic array at any given time. Rather than having an area of the array unusable while it is being configured, the system can have one configuration in operation while another is being configured. In fact if sufficient configurations are available, it may be possible to simply load up all the configurations initially, and then flip to each one as desired, eliminating the expense of reloading configurations altogether.

Logic arrays suffer from the problem that most of their die area is devoted to interconnect rather than logic function. This imbalance has led to designs where a relatively small number of quite powerful logic blocks sit in a sea of interconnect. Since the logic blocks use roughly one order of magnitude less die area than the interconnect it makes sense to increase the complexity of the logic blocks if there is any performance advantage in doing so. Duplicating the configuration store available in each logic block - perhaps several times - provides multiple configurations which can be quickly selected between. Multiple configuration stores greatly reduce the number of data transfers necessary to keep a logic array ``fed'' with configurations in a run-time reconfigurable system and allow much faster switching between configurations, reducing ``dead time'' while the system waits on a configuration becoming available.

The idea of multiple configuration stores in FPGAs was first developed as part of the early work for this thesis and was published first in [Saleeba93]. DeHon later based his DPGA architecture around this idea [Dehon94] [Tau95] and did an excellent study of the benefits of multiple configuration stores in various applications [Dehon96]. An earlier use of multiple configuration stores in a different context was described by Snyder [Snyder82]. Attempts to map a single algorithm on to multiple configurations have met with varying degrees of success. In some cases the algorithms simply aren't able to take any great advantage from this feature. In other cases the number of configuration stores was much higher than that which was optimal - such as VEGA [JonesLewis95]. Most systems which support multiple configuration stores also suffer from a lack of flexibility in the way that their multiple configurations can be switched - often the entire array must be switched simultaneously.

3.1.4 Self-reconfiguration

A step beyond the idea of run-time reconfiguration is ``self reconfiguration''. In a system with multiple configuration stores allowing the choice of configuration store to be controlled by the circuit itself is a simple form of self-reconfiguration. In many circuits much of the preconfigured logic spends most of its time waiting for the rare circumstance where it is needed. Self-reconfiguration can dramatically reduce this. Hardware triggers can be used to determine which configuration is expressed, allowing the circuit itself to instantly select between any of those available. Designing circuits which can use this ``bank-switched'' technique requires some unconventional programming techniques, but can yield large performance gains in some situations.

Another method of self-reconfiguration is to allow outputs of the logic array random access to the configuration store as if it were a random access memory. This allows it to rewrite its configuration in more detailed ways. Having the ability to reprogram itself in this way implies a high level of computational complexity in the logic array itself, but has the potential to not only increase the utilisation of conventional designs - it could allow complete machines to be built from dynamic logic alone. These machines would be quite different from conventional machines, and would require entirely new forms of programming language and operating system to fully exploit their potential. This is precisely the type of machine which this thesis describes.

3.2 DRAMA - an abstract architecture

In developing the architecture for a logic array processor, it was desirable to not only consider existing architectures but to consider the problem anew. An abstract architecture was envisaged which could be used as a vehicle for exploring new architectural ideas and testing them - independent of implementation issues. The architecture was implemented as a simulation and as each new architectural feature was added it was possible to test it against other versions of the architecture which lacked the feature. This kind of comparative study is normally difficult to perform since the individual quirks of a realistic hardware architecture often make it difficult to form an unbiased assessment of an idea.

The abstract logic array architecture can be used as a basis for experiments without imposing the practical limitations of a specific hardware architecture on the experimental results. It can also be used to demonstrate the effect that various types of hardware constraints will impose on a logic array architecture. In this way it provides a suitable platform for experiments not only in algorithms for execution on logic array processors, but also in the design of new logic arrays.

This work came about as the result of related work in dynamic reconfiguration [Saleeba93]. It proved quite difficult to make comparative performance studies between different logic array architectures because there was simply no reliable basis for comparison between the architectures. All the existing architectures were heavily constrained by various hardware limitations, and any modified version would be unable to use the same proprietary software tools available for the device. The DRAMA architecture, simulator and associated tools now fill this role, and the experiments resulting in the final logic array processor design are based around them.

3.2.1 Logic array architectures as computing devices

Computing elements based on logic arrays are the result of gradual evolution from much simpler programmable logic devices. Simple fusible-link programmable logic gate devices such as PALs merged with the more dense mask-programmable gate arrays, giving birth to PGAs - programmable gate arrays. Later Xilinx Inc. produced the first commecial device capable of being programmed and reprogrammed in-circuit, the FPGA or ``field programmable gate array'' [Xilinx94].

While these devices were intended as a platform for rapid prototyping of circuits, it did not take long for people to realise that they had potential as computing elements in their own right [GrayKean89]. Early experiments in using FPGAs as computing elements showed that they had strong potential [BertinRoncin89] [GokhaleHolmes91]. In particular, certain classes of application which weren't well adapted to conventional computers achieved huge performance gains on logic array processors.

Currently most logic array processors operate as coprocessors to conventional host computers, or sometimes in a more tightly-coupled mode [Athanas92]. Pairing a logic array with a host allows the non-critical or overly complicated parts of an algorithm to be executed on the host, while the logic array can executed certain selected operations at high speed.

As work in the area has progressed, however, it has become increasingly obvious that devices intended as rapid prototyping circuit elements have architectural features at variance with the needs of a computing device. Slow reconfiguration times and harsh restrictions on the possible algorithmic complexity have limited their progress.

3.2.2 The abstract machine

The idea of the abstract architecture was to generalise these features to the point where implementation specific design features could be ignored. In order to emulate the architectural and performance features of practical designs the abstract machine was based around the two essential features of programmable logic arrays:

In addition to these basic features, support was added for I/O, run-time reconfiguration, and self-reconfiguration:

Reconfigurability
The configuration of either the logic elements or the interconnect must be able to be changed. This must allow for partial reconfiguration and it must be possible to reconfigure at run-time.

Multiple configurations
As a special case of reconfigurability, each element may have several predefined configurations which may be conveniently selected between.

I/O
Interconnect may be routed to external devices.

Self-reconfiguration circuit
An interface exists between the logic array and a circuit which can be used to reconfigure it simultaneous with normal operation.

3.2.3 Logic function

Figure [3.1 - "Block diagram of DRAMA"]

The general layout is shown in figure 3.1. The logic layer is composed of a number of ``logic elements'', each of which performs a simple logic function. The function is essentially a table lookup - an n-input element has 2n table entries, each of which specifies a boolean output value.

The number of inputs to each element may vary, and in fact the architecture defines no upper limit on the number of inputs that an element may have. This is consistent with the goal of making the architecture as general as possible - no arbitrary static limits are imposed on the abstract system. The same flexibility applies to the total number of logic elements in the logic layer. This permits circuits of any scale to be created. The DRAMA cell is shown in figure 3.2.

Figure [3.2 - "DRAMA cells and their interconnections"]

Of course this type of unlimited resources is not practical in real life - however for an experimental tool such as DRAMA it is useful to be able to operate independent of such variables if desired. This provides the flexibility to later isolate their individual effect on the outcome of simulations.

One of the most compelling features of a system such as DRAMA is that it allows experimentation to be more scientific. Simulations based on more practical systems are fraught with problems since the placement and routing of circuits on real gate arrays is greatly influenced by the arbitrary features and limitations of the hardware. These spurious factors often affect results to the extent that the effect of the specific modification being tested may well be obscured.

The ``island-style'' tradition of logic array design has tended to pack increasing amounts of complexity into each logic element. For example, Xilinx devices even allow the configuration store to be used as a small RAM. DRAMA doesn't attempt to emulate any of these - they are very much implementation-specific features. Instead, a number of normal DRAMA circuit elements may be combined into an aggregate circuit element for these special functions. The placement properties of aggregate circuit elements can be adjusted via a ``size'' parameter, allowing the placement and routing software to more accurately emulate other architectures.

3.2.4 Interconnect

In most gate arrays the vast majority of die space is devoted to interconnect. While much of this space is used for buses, quite a lot of circuitry is also devoted to routing other signals. It is a simple fact of life that reconfigurable interconnect is extremely expensive of die space.

For this reason most reconfigurable logic arrays have great restrictions on what interconnect resources are available. They also frequently employ clever architectural features to try to gain as much advantage as possible from the available resources. This may make it possible to obtain better results from a given device, but unfortunately it makes circuit synthesis a very complicated and compute-intensive process indeed.

The interconnect model used in DRAMA is as simple as possible. An input can come from any output - which may be either the output of an array element or an I/O stimulus of some kind. This totally bypasses the implementation-dependent issues involved in circuit synthesis. Note that this also removes the need to regard the device as having any particular physical form - it could be a standard two-dimensional array or it might be something entirely different.

Simulations which need to take into account routing issues can do so by adding a cost function to each interconnection, where the cost is a function of the source and destination connection locations. This means that a physical layout and a routing scheme must be chosen. The cost function is complicated by the need to take into account other interconnections - for instance if eight connections already go through an area with only eight connections available it is not possible to add another. These types of issues are beyond the scope of this thesis, and DRAMA itself takes no part in these issues. Such constraints can be added externally if desired.

3.2.5 Sequential logic

The logic described so far is purely combinatorial. To provide support for circuits with state, inputs may be dependent on a previous state. This is achieved by permitting inputs to take their value from an earlier value of an output than the current one. In fact only one previous value is available - the one determined directly prior to the current one. How much earlier is not explicitly defined, except that it is consistent across all the outputs and that it is the minimum time quantum of the simulation.

In the implementation of DRAMA this meant taking the current output values ot from the last simulation iteration and the earlier values ot-1 from their values at the next earlier time quantum.

This simple mechanism allows all forms of clocked logic to be created. By using a logic element's previous output as an input to its own lookup table all the basic types of flip-flip may be implemented. Table 3.1 shows an example. More complex circuits can be derived from these building blocks.

Input Output
Q(t-1) D    CLK CLK(t-1) Q
0x0x0
1x0x1
0x110
1x111
x0100
x1101
Table [3.1 - "Truth table of a D-type flip-flop"]

3.2.6 Multiple configurations

Figure [3.3 - "Block diagram of an element"]

DRAMA supports the idea of multiple configurations per logic element. This conceptually allows areas of the logic array of indeterminate size and shape to instantly acquire different operational characteristics, under the control of the logic array itself.

Each logic element has a number of independent configurations available to it. Any one of these configurations may be active at a given time. Figure 3.3 shows the layout of an element, containing both the logic element and the corresponding configuration layer circuitry.

Each configuration is essentially a small area of RAM which has a special parallel interface to the logic element, allowing it to output its entire configuration information to the logic layer if it's selected. A more conventional address multiplexed interface allows the configuration circuitry to modify the state of each of the configurations.

3.2.6.1 Selecting a configuration

Configurations are selected by trigger inputs which are derived from the same interconnect interface which generates the logic elements' inputs. The configuration trigger inputs are positive edge triggered and as with other inputs they may be connected to the output of any element or an I/O pin.

Often reconfigurations will require fairly large-scale changes - a whole section of logic may be made to take a completely different function. This is achieved by having a large number of associated logic elements with configurations all triggered by the one state change. An ALU for example could flip between being an adder, comparator or a multiplier, all without using any more circuit space than the multiplier alone.

Alternatively small-scale changes may be all that is required. For example a circuit which requires selectable endian-ness can be implemented by selecting between one of two input configurations.

Figure 3.4 shows the circuit of a cell and its reconfiguration logic.

Figure [3.4 - "An element and its reconfiguration logic"]

3.2.7 External configuration interface

The configuration layer interfaces to a host system via an external configuration interface. Accessed just like a RAM, this interface allows fast bootstrap loading of the configuration information. In fact the configuration layer acts more like a dual-ported RAM - all the current configuration information is accessible to the logic layer simultaneous with independent access via the external interface.

3.2.7.1 Saving and restoring state

Not only does the external interface provide the capability to store configurations, it provides an interface to the state information of the logic array. This information consists of the outputs of each logic element, so the interface may be considered a ``test probe'' which allows examination of the internal state of the logic array by an external host. Debugging of a physical implementation would be greatly simplified by this feature.

This state information may also be written, allowing an external host to set up an initial state for a circuit before starting it. This is useful in itself, but as a side effect provides a powerful tool for context switching. Potentially a ``job'' executing on a reconfigurable machine could be allowed to execute for a period of time and then ``context switched'' by saving its configuration and state, and loading that of another job. At a later time the configuration could be restored and the original job could continue processing as if uninterrupted.

Note that this ability implies some method of ``stopping'' the circuit from running while its state is saved and restored. DRAMA provides no special facility for this - the circuit must provide some way of stopping clock signals or otherwise freezing its state while reconfiguration occurs. As an abstract architecture this is not an issue in any case - the simulator can easily ``stop time'' to modify the configuration if it wants to.

3.2.7.2 Self-reconfiguration

The reconfiguration logic also provides a self-reconfiguration interface. This allows the circuit itself to modify its own configuration, independent of the control of a host processor. This is not to be confused with the logic array's ability to flip between predefined configurations - self-reconfiguration permits changes to the actual configurations themselves, rather than just selecting between them.

At the simplest level, self-reconfiguration permits complex reconfigurations to be performed dynamically and without host intervention. An example of such an application would be a barrel shifter which allowed flexible routing of a set of bits, allowing any bit to be directed to any input. If the bit-select signals were used to choose between configurations the self-reconfiguration logic could modify the routing arrangement to perform the appropriate barrel shift. Following the reconfiguration the circuit could run at full speed without the overheads other methods would demand.

The self-reconfiguration interface is quite similar to that of the external configuration interface - configuration bits could be addressed individually and read or modified at will just like a RAM. At the time DRAMA was designed is wasn't clear whether this was the most efficient form of interface - self-reconfiguration was still very much an infant technique. As will be seen later Switchblade opted for a more advanced self-reconfiguration interface.

Self-reconfiguration makes it possible for an entire computer to be developed using reconfigurable logic without being dependent on a conventional host processor for reconfiguration functions. Self-reconfiguring computers are able to adapt their circuitry dynamically to the task at hand and are a rather different type of machine from conventional Von Neumann architectures.

3.3 Constraints

As a research tool, one of DRAMA's most useful features is its ability to use a set of constraints to govern its characteristics. In its pure form, DRAMA is very flexible and has very few static limits. Factors such as the number of inputs to a logic block or the ways in which interconnect can be routed are not limited in the ways that a physical implementation must necessarily be. A researcher investigating the effect of various limitations on these characteristics can easily specify constraints and perform simulations based on these new parameters. This can be used to fine-tune logic array architectures to certain applications or certain ranges of application.

Every commercial logic array is designed with a set of quite limiting constraints in mind. Much of the skill in designing circuits for these architectures involves making the circuit design match the capabilities and constraints of the target device. The DRAMA simulator allows quite complicated constraints to be specified, permitting it to closely emulate the characteristics of existing logic arrays or proposed ones. More complex constraints can be added by means of add-in code modules.

Some common constraints:

  • Logic blocks are placed in two-dimensional arrays
  • Interconnect is limited to a set number of lines in a given area
  • Interconnect can only be routed in certain ways
  • Long interconnect may incur large propagation delays
  • Edge effects exist at the periphery of the array
  • Not all logic blocks are identical

3.4 Traits

DRAMA is fundamentally an abstract architecture - there is no intention that it ever be implemented in silicon or that it should provide the most efficient design for a practical reconfigurable logic device. What is important is that it provides an easily abstracted design which exhibits the same essential properties as realistic logic arrays.

Ultimately, a reconfigurable logic array can be described as a collection of logic elements which may take different logic functions and may be connected together in different ways. The aim of DRAMA was to provide a useful abstraction without creating the complexity which makes practical designs so difficult to study.

3.4.1 Simulating reality

The DRAMA architecture is intended to provide a strict superset of the architectural features available in current-generation reconfigurable logic arrays. By constraining selected features of DRAMA in various ways it can be made to closely approximate existing architectures. The traits method provides a convenient way of characterising architectures.

An example of the application of real-world constraints can be seen in the design of commercial FPGAs. Most existing FPGA devices are organised as a two dimensional array of logic blocks interconnected with quite limited interconnect resources. The cost of routing signals over long distances is quite high and incurs considerable expense in the form of propagation delays. These types of constraints can optionally be added to a DRAMA simulation to determine their effect on the final circuit.

3.4.2 Traits

More important than DRAMA's ability to simulate existing architectures is its ability to provide similar characteristics to real architectures - yet without actually limiting circuits to the capabilities provided by specific chip manufacturers. The intent was to provide a general platform which would allow the study of reconfigurable logic independent of these restrictions. A researcher should be able to test new architectural ideas without their results being skewed by the design attributes of the particular logic array the experiments are carried out on.

The success of DRAMA in modelling realistic circuits is heavily dependent on it displaying similar characteristics to real devices in the way that circuits can be mapped on to it. This in turn depends heavily on many factors such as the type of circuit being simulated and the effectiveness of placement and routing software on each platform. For this purpose a set of traits are described which quantify the resource requirements of a circuit in some way. Essentially the traits of a circuit are a set of metrics which quantify in some way the ``important'' aspects of its resource usage.

Alternatively the ``traits'' of a logic array can be used to express what resources it provides. This can be used to compare how effectively two architectures will provide the resources required by a circuit. DRAMA uses traits as a convenient characterisation of a logic array, and converts a set of traits into a set of constraints to allow it to more accurately reflect a real or proposed architecture.

A few main factors are sufficient to get an overview of logic array resources - interconnect distance, congestion, functionality and placement.

3.4.2.1 Interconnect distance

One of the greatest limiting factors in current logic array architectures is the very scarce interconnect resource. Despite being such a ``scarce'' resource, the interconnect still consumes the lion's share of the die area. Clearly then it is of vital importance to use this resource as efficiently as possible, performing as much interconnection locally as possible.

As an indication of the interconnect resource usage of a circuit, graphs can be plotted of the number of interconnections used versus the length of each interconnection. One complication with using this measure is that a circuit must already have gone through a placement and routing process on a specific architecture before an accurate measurement may be made. Rather than restricting the measure in this way, a suite of programs was written which performed an approximation for this graph in combination with the ``placement'' trait.

The graph of interconnect distance for an architecture is based directly on the interconnect resources which the logic array provides. Some arrays provide a variety of interconnect resources of varying lengths, others rely on using the same type of interconnect resources to create connections of all lengths. An example graph is given in figure 3.5.

Figure [3.5 - "A sample interconnect graph"]

3.4.2.2 Congestion

As a circuit approaches the limits of resource availability in a device, the efficiency of its implementation degrades. This is due to a tradeoff being made of one resource for another to try to fit the circuit in the available space. An example of this sort of tradeoff is a situation where local interconnect becomes so scarce that a signal has to be routed through a logic block. This may leave the logic block unusable for any other purpose, possibly causing other distortions of the layout of the circuit and further inefficiencies.

A measure of this effect is to reduce the size of the available logic array by steps, taking note of what proportion of the array is in use. At some point the circuit will no longer be able to be implemented using such a small logic array. Plotting a graph of the size of the array against the percentage of free logic should show a fairly linear trend until the system starts to degrade, at which point the amount of spare logic will begin to fall rapidly until none is available. At this point the maximum number of usable logic elements may be read off the axis.

A ``perfect'' logic array shows no degradation due to congestion, demonstrating a linear reduction in available space until it is fully utilised. In general, a later-appearing and slower-increasing degradation curve is indicative of an architecture which better meets the circuit's resource requirements. A disadvantage of this measure is that the curves are heavily dependent on the type of circuit being used to fill the array, as well as the array itself. A suite of benchmarks using different circuits is useful here to obtain an ``average'' curve.

Note that the congestion curve is characteristic only of a logic array architecture - there is no equivalent curve for a circuit. Rather, a circuit has simply a single number - the number of circuit elements which it requires. The ability of a logic array to implement a circuit can be determined by looking up this value on a logic array's congestion graph. This is a fairly simplistic view but is adequate for the abstraction we require in DRAMA.

3.4.2.3 Functionality

Logic arrays differ greatly in the types of logic element which they provide. Some provide a large number of very simple elements, others provide a smaller number of more complex elements, and some provide a range of logic elements with varying capabilities. The aim of the logic blocks used in DRAMA was not to provide a device capable of doing everything that any other conceivable logic block could do, but rather to provide a versatile building block which could provide much the same facilities as most common logic blocks without building in too many architectural tradeoffs.

The measure used here is to classify the level of functionality in an architecture's logic blocks by the number of logic blocks required to perform common functions such as demultiplexing, addition, counting and latching.

Logic array type Functionality traits Placement trait
Placement trait 16-bit Adder 1:16 Multiplexer 16-bit Counter 16-bit Latch (dimensionality)
Xilinx59842.2
DRAMA16191616infinity

Table [3.2 - "Sample traits values"]

Table 3.3 shows some traits comparing DRAMA to Xilinx 4000-series parts. The Xilinx parts generally require fewer circuit elements (``CLBs'' in Xilinx terminology) than DRAMA as they have multiple outputs. One application of this sort of table is to provide a basis of conversion between architectures - given a knowledge of how many DRAMA circuit elements are devoted to particular tasks in a circuit you can make an estimate of the equivalent resources required in a different architecture.

Aggregate circuit elements provide another mechanism for equating the capabilities of different architectures. Higher order functions can be created from a number of the native circuit elements, and assigned a ``size'' value to define the comparative complexity of the aggregate element. As far as the circuit synthesis software is concerned, this ``size'' value is effectively a scaling factor on the size of the aggregate device's component elements. Architectures which provide complex functions such as RAMs in a single circuit element may require a large number of component elements, but will be scaled so the total size is no larger than a normal element.

3.4.2.4 Placement and dimensionality

Synthesis of a circuit on to a given logic array is a difficult task - so far it appears to be an np-hard problem. Some circuits are better adapted to layout on a normal two-dimensional logic array than others. As a simple yardstick of compatibility between a circuit and an architecture, a dimensionality measure is quite effective.

The definition of dimensionality used in this thesis is the number of dimensions of interconnect that the logic array must possess in order to easily model the circuit. For instance a circuit with a single input and output and some simple logic function in between would be one-dimensional, a circuit which can be drawn on a sheet of paper with no crossovers is two dimensional or less, and circuits which can't be drawn without many crossovers occurring probably have a dimensionality greater than two. If a circuit is implemented on an array with lower dimensionality than itself, the routing resources needed to interconnect the circuit elements will escalate quite heavily.

Not all areas of the circuit will have the same dimensionality. For the tests done so far an aggregate value was obtained by a weighted average function which favoured larger sections of circuit.

Programs to automatically determine the dimensionality of a circuit are far from trivial, but in practise some relatively simple algorithms suffice to make a first approximation for both the placement and interconnect distance traits.

In this case the method used to calculate dimensionality was based on a fairly simpleminded graph-theoretical placement algorithm. This used an A* optimisation algorithm to place each of the components of the circuit in an n-dimensional space while minimising the number of crossed interconnections. No particular effort was made to make the size or spacing of components in the model reflect reality as this placement algorithm was intended as part of a measure of the dimensionality of the circuit and not as a serious realistic placement algorithm.

The dimensionality measure uses the result of a number of runs of the placement algorithm with successively increasing dimensions until finally the circuit can be entirely placed in an n-dimensional space of the given order. At worst this order will be equal to the number of circuit components minus one in a fully-connected circuit where every component is wired to every other component. More commonly nearly all of the connections can be made in the very lowest dimensions, with only a few needing higher dimensions to be directly connected.

A special weighted average function is used to convert the percentage success results of all of these runs into a single overall dimensionality value. Before the averaging function may be used a correction factor must be applied to all the values to avoid run-on effects - a certain proportion of all higher-dimensional circuits will be able to be placed in lower dimensions. For instance a three dimensional grid will place all its connections in three dimensions, two thirds of them in two dimensions and a third in one dimension. To counteract this effect an algorithm is passed over the which starting with the highest dimension takes the percentage of connections which do not fit in that dimension, assumes that they are of the next higher dimension and reduces all the lower dimensions accordingly. Given an n-dimensional set of results R(d) each adjusted result A(d) is adjusted by -

The resulting adjusted results will generally have a lesser emphasis on the lower dimensions than the unadjusted results, though the nature of circuit dimensionality is to favour the lower dimensions in any case. The dimensionality is the weighted average obtained by normalising the adjusted results and applying a standard weighted average function using the standard formula -

3.4.3 A merge sort example

MergeSort(OrigList) =
    
    UnsortedList <- OrigList
    do
        UnsortedList1 <- UnsortedList.sublist(1, UnsortedList.length/2)
        UnsortedList2 <- UnsortedList.sublist(UnsortedList.length/2 + 1,
                                                    UnsortedList.length)
        SortedList <- ()
        PerfectOrder <- true

        while (not UnsortedList1.empty) or (not UnsortedList2.empty)
            if not UnsortedList2.empty and (UnsortedList1.empty or 
                                 UnsortedList1.first > UnsortedList2.first)
                SortedList <- SortedList + UnsortedList2.first
                UnsortedList2.remove(UnsortedList2.first)
            else
                SortedList <- SortedList + UnsortedList1.first
                UnsortedList1.remove(UnsortedList1.first)
            if SortedList.length > 1 and SortedList.secondlast >
                                                    SortedList.last
                PerfectOrder <- false

        UnsortedList <- SortedList
    while not PerfectOrder

    return SortedList
Figure [3.6 - A merge sort algorithm]

Figure [3.7 - "Block diagram of the merge circuit"]

Given in figure 3.7 is an algorithm for a merge sort, written in pseudocode. The inner loop of this algorithm was converted to circuit form, and a list of resource requirements gathered.

The algorithm describes an iterative merge sort. A list of items is split in two and a standard sorting merge is performed on it. This process repeats until all the items are in sorted order. The version of the algorithm as implemented in hardware uses four RAMs external to the logic array to store intermediate results - each pass of the merge function takes values from two RAMs and stores the merged results in the two other RAMs, sending the first half of the results to one RAM and the second half to the other.

After the pass described the circuit would ideally undergo a reconfiguration to swap the roles of the source and destination RAMs. However since the Xilinx parts don't provide quick reconfiguration this step was ignored for this experiment.

Functionality traits Placement trait
Demuxes Comparators 8-bit Counters 16-bit Latches Gates (dimensionality)
125262.1
Table [3.3 - "Trait values for ``Merge''"]

The circuit is shown in figure 3.7. A list of traits is given in table 3.3. Comparison of this table against tables for Xilinx parts shows that roughly 110 CLBs were required, so a 144-CLB Xilinx 4004A was sufficient to contain the circuit. When initialised with a corresponding set of traits, DRAMA placement and simulations of the merge sort gave quite similar characteristics of array usage. With improved tools and more accurate calculation of traits, more accurate results could probably be obtained.

3.5 Conclusion

The DRAMA architecture provides a way of performing experiments on reconfigurable logic without being tied down to the quirks of a specific hardware architecture. It provides a more general form of logic array than any current architecture, and also provides some architectural features which may prove important in the design of reconfigurable systems in the future. An example fast prime number generator simulated under DRAMA which uses these architectural features is described in in [SaleebaPose96] and [SaleebaPose96a].

Traits provide a convenient method of characterising both logic arrays and circuits. These metrics can be used for a variety of purposes, including comparison of logic arrays, matching of circuits to logic arrays, and customising DRAMA to behave like other forms of logic array - either real or proposed.

While DRAMA does not attempt to provide an environment for detailed simulations with nanosecond timing measurements, it does provide a convenient test-bed for circuit design which leaves the designer free of many of the implementation issues which plague practical designs. Most importantly, the results of any experiments performed using DRAMA are designed to reflect the effects of the variables on the circuit, rather than clouding the results with side-effects of the architecture.