This chapter describes some of the basic ideas behind the Switchblade architecture and then presents an abstract logic array architecture which is capable of testing these ideas. This abstract logic array architecture was the test bed for development of some of the architectural features around which this thesis is based. This DRAMA architecture (``Dynamically Reconfigurable Abstract Machine Architecture'') was first published in 1995 [Saleeba95].
At the time this initial research was carried out the field of configurable computing was young. Almost no research on run-time reconfiguration existed, the exception being a conference paper which related specifically to the Plessey ERA architecture [HastieCliff90]. This paper discussed run-time reconfiguration, partial configuration and the idea of the ``hardware subroutine''. Unfortunately the idea was before its time and the work went largely unnoticed.
In 1993 the preliminary work for this thesis resulted in a paper which discussed run-time reconfiguration, partial configuration, multiple configuration and self-reconfiguration [Saleeba93]. This was soon followed by a number of other papers by various researchers related to run-time reconfiguration.
Run-time reconfiguration implies an ability to change configuration during execution of a program. Early logic array processors used a fixed configuration for the duration of execution of a program. Run-time reconfiguration allows the configuration to change with the needs of the program over time [Foulk93] [Algotronix94].
This scheme provides better utilisation of the reconfigurable logic than systems which can only set the configuration once per program. An algorithm can use this feature to select what functions might be most useful to it at any given point during execution. This can be a big advantage in large programs which change computational characteristics frequently.
Special architectures are required to support run-time reconfiguration. Some method must be provided to quickly modify the contents of the configuration store, and various architectures to support this have been presented by Saleeba [Saleeba93], Vandenbout and Morris [VandenboutMorris92], and Athanas [Athanas92].
Partial reconfiguration enables a logic array to change only part of its area during reconfiguration. A system can use this to discard an old section of circuit and replace it with something more useful, while retaining most of the existing circuit. In most cases it will be possible for computation to proceed while reconfiguration takes place. More subtly, this can be used by the host to direct the mode of computation, for instance by modifying ``hardwired'' constants on the fly [Foulk93]. This approach can improve utilisation of array space.
Double-buffered reconfiguration is also made possible by the partial reconfiguration feature. While a program is running on one part of the array the system could anticipate that a new configuration will be needed soon and be preloading it while the existing configuration is still running. This can eliminate the effects of configuration loading time completely as long as the need for new configurations can be anticipated accurately and they can be loaded in quickly enough.
Multiple configurations take double-buffering a step further. The system has a set of configuration stores available to it, only one of which is controlling the configuration of the logic array at any given time. Rather than having an area of the array unusable while it is being configured, the system can have one configuration in operation while another is being configured. In fact if sufficient configurations are available, it may be possible to simply load up all the configurations initially, and then flip to each one as desired, eliminating the expense of reloading configurations altogether.
Logic arrays suffer from the problem that most of their die area is devoted to interconnect rather than logic function. This imbalance has led to designs where a relatively small number of quite powerful logic blocks sit in a sea of interconnect. Since the logic blocks use roughly one order of magnitude less die area than the interconnect it makes sense to increase the complexity of the logic blocks if there is any performance advantage in doing so. Duplicating the configuration store available in each logic block - perhaps several times - provides multiple configurations which can be quickly selected between. Multiple configuration stores greatly reduce the number of data transfers necessary to keep a logic array ``fed'' with configurations in a run-time reconfigurable system and allow much faster switching between configurations, reducing ``dead time'' while the system waits on a configuration becoming available.
The idea of multiple configuration stores in FPGAs was first developed as part of the early work for this thesis and was published first in [Saleeba93]. DeHon later based his DPGA architecture around this idea [Dehon94] [Tau95] and did an excellent study of the benefits of multiple configuration stores in various applications [Dehon96]. An earlier use of multiple configuration stores in a different context was described by Snyder [Snyder82]. Attempts to map a single algorithm on to multiple configurations have met with varying degrees of success. In some cases the algorithms simply aren't able to take any great advantage from this feature. In other cases the number of configuration stores was much higher than that which was optimal - such as VEGA [JonesLewis95]. Most systems which support multiple configuration stores also suffer from a lack of flexibility in the way that their multiple configurations can be switched - often the entire array must be switched simultaneously.
A step beyond the idea of run-time reconfiguration is ``self reconfiguration''. In a system with multiple configuration stores allowing the choice of configuration store to be controlled by the circuit itself is a simple form of self-reconfiguration. In many circuits much of the preconfigured logic spends most of its time waiting for the rare circumstance where it is needed. Self-reconfiguration can dramatically reduce this. Hardware triggers can be used to determine which configuration is expressed, allowing the circuit itself to instantly select between any of those available. Designing circuits which can use this ``bank-switched'' technique requires some unconventional programming techniques, but can yield large performance gains in some situations.
Another method of self-reconfiguration is to allow outputs of the logic array random access to the configuration store as if it were a random access memory. This allows it to rewrite its configuration in more detailed ways. Having the ability to reprogram itself in this way implies a high level of computational complexity in the logic array itself, but has the potential to not only increase the utilisation of conventional designs - it could allow complete machines to be built from dynamic logic alone. These machines would be quite different from conventional machines, and would require entirely new forms of programming language and operating system to fully exploit their potential. This is precisely the type of machine which this thesis describes.
In developing the architecture for a logic array processor, it was desirable to not only consider existing architectures but to consider the problem anew. An abstract architecture was envisaged which could be used as a vehicle for exploring new architectural ideas and testing them - independent of implementation issues. The architecture was implemented as a simulation and as each new architectural feature was added it was possible to test it against other versions of the architecture which lacked the feature. This kind of comparative study is normally difficult to perform since the individual quirks of a realistic hardware architecture often make it difficult to form an unbiased assessment of an idea.
The abstract logic array architecture can be used as a basis for experiments without imposing the practical limitations of a specific hardware architecture on the experimental results. It can also be used to demonstrate the effect that various types of hardware constraints will impose on a logic array architecture. In this way it provides a suitable platform for experiments not only in algorithms for execution on logic array processors, but also in the design of new logic arrays.
This work came about as the result of related work in dynamic reconfiguration [Saleeba93]. It proved quite difficult to make comparative performance studies between different logic array architectures because there was simply no reliable basis for comparison between the architectures. All the existing architectures were heavily constrained by various hardware limitations, and any modified version would be unable to use the same proprietary software tools available for the device. The DRAMA architecture, simulator and associated tools now fill this role, and the experiments resulting in the final logic array processor design are based around them.
Computing elements based on logic arrays are the result of gradual evolution from much simpler programmable logic devices. Simple fusible-link programmable logic gate devices such as PALs merged with the more dense mask-programmable gate arrays, giving birth to PGAs - programmable gate arrays. Later Xilinx Inc. produced the first commecial device capable of being programmed and reprogrammed in-circuit, the FPGA or ``field programmable gate array'' [Xilinx94].
While these devices were intended as a platform for rapid prototyping of circuits, it did not take long for people to realise that they had potential as computing elements in their own right [GrayKean89]. Early experiments in using FPGAs as computing elements showed that they had strong potential [BertinRoncin89] [GokhaleHolmes91]. In particular, certain classes of application which weren't well adapted to conventional computers achieved huge performance gains on logic array processors.
Currently most logic array processors operate as coprocessors to conventional host computers, or sometimes in a more tightly-coupled mode [Athanas92]. Pairing a logic array with a host allows the non-critical or overly complicated parts of an algorithm to be executed on the host, while the logic array can executed certain selected operations at high speed.
As work in the area has progressed, however, it has become increasingly obvious that devices intended as rapid prototyping circuit elements have architectural features at variance with the needs of a computing device. Slow reconfiguration times and harsh restrictions on the possible algorithmic complexity have limited their progress.
The idea of the abstract architecture was to generalise these features to the point where implementation specific design features could be ignored. In order to emulate the architectural and performance features of practical designs the abstract machine was based around the two essential features of programmable logic arrays:
In addition to these basic features, support was added for I/O, run-time reconfiguration, and self-reconfiguration:
The general layout is shown in figure 3.1.
The logic layer is composed of a number of ``logic elements'', each
of which performs a simple logic function. The function is
essentially a table lookup - an n-input element has
The number of inputs to each element may vary, and in fact the
architecture defines no upper limit on the number of inputs that an
element may have. This is consistent with the goal of making the
architecture as general as possible - no arbitrary static limits are
imposed on the abstract system. The same flexibility applies to the
total number of logic elements in the logic layer. This permits
circuits of any scale to be created. The DRAMA cell is shown in
figure 3.2.
Of course this type of unlimited resources is not practical in
real life - however for an experimental tool such as DRAMA it
is useful to be able to operate independent of such variables if
desired. This provides the flexibility to later isolate their
individual effect on the outcome of simulations.
One of the most compelling features of a system such as
DRAMA is that it allows experimentation to be more scientific.
Simulations based on more practical systems are fraught with problems
since the placement and routing of circuits on real gate arrays is
greatly influenced by the arbitrary features and limitations of the
hardware. These spurious factors often affect results to the extent
that the effect of the specific modification being tested may well be
obscured.
The ``island-style'' tradition of logic array design has tended to
pack increasing amounts of complexity into each logic element. For
example, Xilinx devices even allow the configuration store to be used
as a small RAM. DRAMA doesn't attempt to emulate any of these
- they are very much implementation-specific features. Instead, a
number of normal DRAMA circuit elements may be combined into
an aggregate circuit element for these special functions. The
placement properties of aggregate circuit elements can be adjusted
via a ``size'' parameter, allowing the placement and routing software
to more accurately emulate other architectures.
In most gate arrays the vast majority of die space is devoted to
interconnect. While much of this space is used for buses, quite a lot
of circuitry is also devoted to routing other signals. It is a simple
fact of life that reconfigurable interconnect is extremely expensive
of die space.
For this reason most reconfigurable logic arrays have great
restrictions on what interconnect resources are available. They also
frequently employ clever architectural features to try to gain as
much advantage as possible from the available resources. This may
make it possible to obtain better results from a given device,
but unfortunately it makes circuit synthesis a very complicated and
compute-intensive process indeed.
The interconnect model used in DRAMA is as simple as
possible. An input can come from any output - which may be either the
output of an array element or an I/O stimulus of some kind. This
totally bypasses the implementation-dependent issues involved in
circuit synthesis. Note that this also removes the need to regard the
device as having any particular physical form - it could be a
standard two-dimensional array or it might be something entirely
different.
Simulations which need to take into account routing issues can do so
by adding a cost function to each interconnection, where the cost is
a function of the source and destination connection locations. This
means that a physical layout and a routing scheme must be chosen. The
cost function is complicated by the need to take into account other
interconnections - for instance if eight connections already go
through an area with only eight connections available it is not
possible to add another. These types of issues are beyond the scope
of this thesis, and DRAMA itself takes no part in these issues.
Such constraints can be added externally if desired.
The logic described so far is purely combinatorial. To provide support
for circuits with state, inputs may be dependent on a previous state.
This is achieved by permitting inputs to take their value from an
earlier value of an output than the current one. In fact only one
previous value is available - the one determined directly prior to the
current one. How much earlier is not explicitly defined, except that
it is consistent across all the outputs and that it is the minimum
time quantum of the simulation.
In the implementation of DRAMA this meant taking the
current output values This simple mechanism allows all forms of clocked logic to be
created. By using a logic element's previous output as an input to
its own lookup table all the basic types of flip-flip may be
implemented.
Table 3.1
shows an example. More complex circuits can be derived from these
building blocks.
DRAMA supports the idea of multiple configurations per
logic element. This conceptually allows areas of the logic array of
indeterminate size and shape to instantly acquire different
operational characteristics, under the control of the logic array
itself.
Each logic element has a number of independent configurations
available to it. Any one of these configurations may be active at a
given time.
Figure 3.3
shows the layout of an
element, containing both the logic element and the corresponding
configuration layer circuitry.
Each configuration is essentially a small area of RAM which has a
special parallel interface to the logic element, allowing it to
output its entire configuration information to the logic layer if
it's selected. A more conventional address multiplexed interface
allows the configuration circuitry to modify the state of each of the
configurations.
Configurations are selected by trigger inputs which are derived
from the same interconnect interface which generates the logic
elements' inputs. The configuration trigger inputs are positive edge
triggered and as with other inputs they may be connected to the
output of any element or an I/O pin.
Often reconfigurations will require fairly large-scale changes
- a whole section of logic may be made to take a completely different
function. This is achieved by having a large number of associated
logic elements with configurations all triggered by the one state
change. An ALU for example could flip between being an adder,
comparator or a multiplier, all without using any more circuit space
than the multiplier alone.
Alternatively small-scale changes may be all that is required.
For example a circuit which requires selectable endian-ness can be
implemented by selecting between one of two input configurations.
Figure 3.4 shows the circuit of a
cell and its reconfiguration logic.
The configuration layer interfaces to a host system via an
external configuration interface. Accessed just like a RAM, this
interface allows fast bootstrap loading of the configuration
information. In fact the configuration layer acts more like a
dual-ported RAM - all the current configuration information is
accessible to the logic layer simultaneous with independent access
via the external interface.
Not only does the external interface provide the capability to
store configurations, it provides an interface to the state
information of the logic array. This information consists of the
outputs of each logic element, so the interface may be considered a
``test probe'' which allows examination of the internal state of the
logic array by an external host. Debugging of a physical
implementation would be greatly simplified by this feature.
This state information may also be written, allowing an external
host to set up an initial state for a circuit before starting it.
This is useful in itself, but as a side effect provides a powerful
tool for context switching. Potentially a ``job'' executing on a
reconfigurable machine could be allowed to execute for a period of
time and then ``context switched'' by saving its configuration and
state, and loading that of another job. At a later time the
configuration could be restored and the original job could continue
processing as if uninterrupted.
Note that this ability implies some method of ``stopping'' the
circuit from running while its state is saved and restored.
DRAMA provides no special facility for this - the circuit must
provide some way of stopping clock signals or otherwise freezing its
state while reconfiguration occurs. As an abstract architecture this
is not an issue in any case - the simulator can easily ``stop time''
to modify the configuration if it wants to.
The reconfiguration logic also provides a self-reconfiguration
interface. This allows the circuit itself to modify its own
configuration, independent of the control of a host processor. This is
not to be confused with the logic array's ability to flip between
predefined configurations - self-reconfiguration permits changes to
the actual configurations themselves, rather than just selecting
between them.
At the simplest level, self-reconfiguration permits complex
reconfigurations to be performed dynamically and without host
intervention. An example of such an application would be a barrel
shifter which allowed flexible routing of a set of bits, allowing any
bit to be directed to any input. If the bit-select signals were used
to choose between configurations the self-reconfiguration logic
could modify the routing arrangement to perform the appropriate
barrel shift. Following the reconfiguration the circuit could run at
full speed without the overheads other methods would demand.
The self-reconfiguration interface is quite similar to that of
the external configuration interface - configuration bits could be
addressed individually and read or modified at will just like a RAM.
At the time DRAMA was designed is wasn't clear whether this
was the most efficient form of interface - self-reconfiguration was
still very much an infant technique. As will be seen later
Switchblade opted for a more advanced self-reconfiguration
interface.
Self-reconfiguration makes it possible for an entire computer
to be developed using reconfigurable logic without being dependent
on a conventional host processor for reconfiguration functions.
Self-reconfiguring computers are able to adapt their circuitry
dynamically to the task at hand and are a rather different type of
machine from conventional Von Neumann architectures.
As a research tool, one of DRAMA's most useful features is
its ability to use a set of constraints to govern its
characteristics. In its pure form, DRAMA is very flexible and
has very few static limits. Factors such as the number of inputs to
a logic block or the ways in which interconnect can be routed are not
limited in the ways that a physical implementation must necessarily
be. A researcher investigating the effect of various limitations on
these characteristics can easily specify constraints and perform
simulations based on these new parameters. This can be used to
fine-tune logic array architectures to certain applications or
certain ranges of application.
Every commercial logic array is designed with a set of quite limiting
constraints in mind. Much of the skill in designing circuits for
these architectures involves making the circuit design match the
capabilities and constraints of the target device. The DRAMA
simulator allows quite complicated constraints to be specified,
permitting it to closely emulate the characteristics of existing
logic arrays or proposed ones. More complex constraints can be added
by means of add-in code modules.
Some common constraints:
DRAMA is fundamentally an abstract architecture - there is
no intention that it ever be implemented in silicon or that it should
provide the most efficient design for a practical reconfigurable
logic device. What is important is that it provides an
easily abstracted design which exhibits the same essential
properties as realistic logic arrays.
Ultimately, a reconfigurable logic array can be described as a
collection of logic elements which may take different logic functions
and may be connected together in different ways. The aim of
DRAMA was to provide a useful abstraction without creating the
complexity which makes practical designs so difficult to study.
The DRAMA architecture is intended to provide a strict
superset of the architectural features available in
current-generation reconfigurable logic arrays. By constraining
selected features of DRAMA in various ways it can be made to
closely approximate existing architectures. The traits
method provides a convenient way of characterising architectures.
An example of the application of real-world constraints can be
seen in the design of commercial FPGAs. Most existing FPGA devices
are organised as a two dimensional array of logic blocks
interconnected with quite limited interconnect resources. The cost of
routing signals over long distances is quite high and incurs
considerable expense in the form of propagation delays. These types of
constraints can optionally be added to a DRAMA simulation to
determine their effect on the final circuit.
More important than DRAMA's ability to simulate existing
architectures is its ability to provide similar characteristics to
real architectures - yet without actually limiting circuits to the
capabilities provided by specific chip manufacturers. The intent was
to provide a general platform which would allow the study of
reconfigurable logic independent of these restrictions. A researcher
should be able to test new architectural ideas without their results
being skewed by the design attributes of the particular logic array
the experiments are carried out on.
The success of DRAMA in modelling realistic circuits is heavily
dependent on it displaying similar characteristics to real devices in
the way that circuits can be mapped on to it. This in turn depends
heavily on many factors such as the type of circuit being simulated
and the effectiveness of placement and routing software on each
platform. For this purpose a set of traits are described which
quantify the resource requirements of a circuit in some
way. Essentially the traits of a circuit are a set of metrics which
quantify in some way the ``important'' aspects of its resource usage.
Alternatively the ``traits'' of a logic array can be used to
express what resources it provides. This can be used to compare
how effectively two architectures will provide the resources required
by a circuit. DRAMA uses traits as a convenient characterisation
of a logic array, and converts a set of traits into a set of
constraints to allow it to more accurately reflect a real or proposed
architecture.
A few main factors are sufficient to get an overview of
logic array resources - interconnect distance, congestion,
functionality and placement.
One of the greatest limiting factors in current logic array
architectures is the very scarce interconnect resource. Despite being
such a ``scarce'' resource, the interconnect still consumes the lion's
share of the die area. Clearly then it is of vital importance to use
this resource as efficiently as possible, performing as much
interconnection locally as possible.
As an indication of the interconnect resource usage of a circuit,
graphs can be plotted of the number of interconnections used versus
the length of each interconnection. One complication with using this
measure is that a circuit must already have gone through a placement
and routing process on a specific architecture before an accurate
measurement may be made. Rather than restricting the measure in this
way, a suite of programs was written which performed an
approximation for this graph in combination with the ``placement''
trait.
The graph of interconnect distance for an architecture is based
directly on the interconnect resources which the logic array provides.
Some arrays provide a variety of interconnect resources of varying
lengths, others rely on using the same type of interconnect resources
to create connections of all lengths. An example graph is given in
figure 3.5.
As a circuit approaches the limits of resource availability in a
device, the efficiency of its implementation degrades. This is due to
a tradeoff being made of one resource for another to try to fit the
circuit in the available space. An example of this sort of tradeoff
is a situation where local interconnect becomes so scarce that a
signal has to be routed through a logic block. This may leave the
logic block unusable for any other purpose, possibly causing other
distortions of the layout of the circuit and further inefficiencies.
A measure of this effect is to reduce the size of the available
logic array by steps, taking note of what proportion of the array is
in use. At some point the circuit will no longer be able to be
implemented using such a small logic array. Plotting a graph of the
size of the array against the percentage of free logic should show a
fairly linear trend until the system starts to degrade, at which
point the amount of spare logic will begin to fall rapidly until none
is available. At this point the maximum number of usable logic
elements may be read off the axis.
A ``perfect'' logic array shows no degradation due to congestion,
demonstrating a linear reduction in available space until it is fully
utilised. In general, a later-appearing and slower-increasing
degradation curve is indicative of an architecture which better meets
the circuit's resource requirements. A disadvantage of this measure is
that the curves are heavily dependent on the type of circuit being
used to fill the array, as well as the array itself. A suite of
benchmarks using different circuits is useful here to obtain an
``average'' curve.
Note that the congestion curve is characteristic only of a logic array
architecture - there is no equivalent curve for a circuit. Rather, a
circuit has simply a single number - the number of circuit elements
which it requires. The ability of a logic array to implement a circuit
can be determined by looking up this value on a logic array's
congestion graph. This is a fairly simplistic view but is adequate for
the abstraction we require in DRAMA.
Logic arrays differ greatly in the types of logic element which they
provide. Some provide a large number of very simple elements, others
provide a smaller number of more complex elements, and some provide a
range of logic elements with varying capabilities. The aim of the
logic blocks used in DRAMA was not to provide a device capable
of doing everything that any other conceivable logic block could do,
but rather to provide a versatile building block which could provide
much the same facilities as most common logic blocks without building
in too many architectural tradeoffs.
The measure used here is to classify the level of functionality in an
architecture's logic blocks by the number of logic blocks required to
perform common functions such as demultiplexing, addition, counting
and latching.
Table 3.3
shows some traits comparing
DRAMA to Xilinx 4000-series parts. The Xilinx parts generally
require fewer circuit elements (``CLBs'' in Xilinx terminology) than
DRAMA as they have multiple outputs. One application of this
sort of table is to provide a basis of conversion between
architectures - given a knowledge of how many DRAMA circuit
elements are devoted to particular tasks in a circuit you can make an
estimate of the equivalent resources required in a different
architecture.
Aggregate circuit elements provide another mechanism for equating
the capabilities of different architectures. Higher order functions
can be created from a number of the native circuit elements, and
assigned a ``size'' value to define the comparative complexity of the
aggregate element. As far as the circuit synthesis software is
concerned, this ``size'' value is effectively a scaling factor on the
size of the aggregate device's component elements. Architectures
which provide complex functions such as RAMs in a single circuit
element may require a large number of component elements, but will be
scaled so the total size is no larger than a normal element.
Synthesis of a circuit on to a given logic array is a difficult
task - so far it appears to be an np-hard problem. Some
circuits are better adapted to layout on a normal two-dimensional
logic array than others. As a simple yardstick of compatibility
between a circuit and an architecture, a dimensionality measure is
quite effective.
The definition of dimensionality used in this thesis is the
number of dimensions of interconnect that the logic array must
possess in order to easily model the circuit. For instance a circuit
with a single input and output and some simple logic function in
between would be one-dimensional, a circuit which can be drawn on a
sheet of paper with no crossovers is two dimensional or less, and
circuits which can't be drawn without many crossovers occurring
probably have a dimensionality greater than two. If a circuit is
implemented on an array with lower dimensionality than itself, the
routing resources needed to interconnect the circuit elements will
escalate quite heavily.
Not all areas of the circuit will have the same dimensionality. For
the tests done so far an aggregate value was obtained by a weighted
average function which favoured larger sections of circuit.
Programs to automatically determine the dimensionality of a circuit
are far from trivial, but in practise some relatively simple
algorithms suffice to make a first approximation for both the
placement and interconnect distance traits.
In this case the method used to calculate dimensionality was
based on a fairly simpleminded graph-theoretical placement algorithm.
This used an A* optimisation algorithm to place each of the
components of the circuit in an n-dimensional space while
minimising the number of crossed interconnections. No particular
effort was made to make the size or spacing of components in the
model reflect reality as this placement algorithm was intended as
part of a measure of the dimensionality of the circuit and not as a
serious realistic placement algorithm.
The dimensionality measure uses the result of a number of runs of
the placement algorithm with successively increasing dimensions until
finally the circuit can be entirely placed in an n-dimensional
space of the given order. At worst this order will be equal to the
number of circuit components minus one in a fully-connected circuit
where every component is wired to every other component. More
commonly nearly all of the connections can be made in the very lowest
dimensions, with only a few needing higher dimensions to be directly
connected.
A special weighted average function is used to convert the percentage
success results of all of these runs into a single overall dimensionality
value. Before the averaging function may be used a correction factor
must be applied to all the values to avoid run-on effects - a certain
proportion of all higher-dimensional circuits will be able to be placed
in lower dimensions. For instance a three dimensional grid will place
all its connections in three dimensions, two thirds of them in two
dimensions and a third in one dimension. To counteract this effect an
algorithm is passed over the which starting with the highest dimension
takes the percentage of connections which do not fit in that dimension,
assumes that they are of the next higher dimension and reduces all the
lower dimensions accordingly. Given an n-dimensional set of results
The resulting adjusted results will generally have a lesser emphasis on
the lower dimensions than the unadjusted results, though the nature of
circuit dimensionality is to favour the lower dimensions in any case.
The dimensionality is the weighted average obtained by normalising the
adjusted results and applying a standard weighted average function
using the standard formula -
Given in
figure 3.7
is an algorithm for a merge
sort, written in pseudocode. The inner loop of this
algorithm was converted to circuit form, and a list of resource
requirements gathered.
The algorithm describes an iterative merge sort. A list of items is
split in two and a standard sorting merge is performed on it. This
process repeats until all the items are in sorted order. The version
of the algorithm as implemented in hardware uses four RAMs external
to the logic array to store intermediate results - each pass of the
merge function takes values from two RAMs and stores the merged
results in the two other RAMs, sending the first half of the results
to one RAM and the second half to the other.
After the pass described the circuit would ideally undergo a
reconfiguration to swap the roles of the source and destination RAMs.
However since the Xilinx parts don't provide quick reconfiguration
this step was ignored for this experiment.
The circuit is shown in
figure 3.7. A list
of traits is given in
table 3.3.
Comparison of this table against tables for Xilinx parts shows that
roughly 110 CLBs were required, so a 144-CLB Xilinx 4004A was
sufficient to contain the circuit. When initialised with a
corresponding set of traits, DRAMA placement and simulations of
the merge sort gave quite similar characteristics of array usage.
With improved tools and more accurate calculation of traits, more
accurate results could probably be obtained.
The DRAMA architecture provides a way of performing experiments
on reconfigurable logic without being tied down to the quirks of a
specific hardware architecture. It provides a more general form of
logic array than any current architecture, and also provides some
architectural features which may prove important in the design of
reconfigurable systems in the future. An example fast prime number
generator simulated under DRAMA which uses these architectural
features is described in
in [SaleebaPose96]
and
[SaleebaPose96a].
Traits provide a convenient method of characterising
both logic arrays and circuits. These metrics can be used for a
variety of purposes, including comparison of logic arrays, matching
of circuits to logic arrays, and customising DRAMA to behave
like other forms of logic array - either real or proposed.
While DRAMA does not attempt to provide an environment for
detailed simulations with nanosecond timing measurements, it does
provide a convenient test-bed for circuit design which leaves the
designer free of many of the implementation issues which plague
practical designs. Most importantly, the results of any experiments
performed using DRAMA are designed to reflect the effects of the
variables on the circuit, rather than clouding the results with
side-effects of the architecture.
3.2.4 Interconnect
3.2.5 Sequential logic
3.2.6 Multiple configurations
3.2.6.1 Selecting a configuration
3.2.7 External configuration interface
3.2.7.1 Saving and restoring state
3.2.7.2 Self-reconfiguration
3.3 Constraints
3.4 Traits
3.4.1 Simulating reality
3.4.2 Traits
3.4.2.1 Interconnect distance
3.4.2.2 Congestion
3.4.2.3 Functionality
Logic array type
Functionality traits
Placement trait
Placement trait
16-bit Adder
1:16 Multiplexer
16-bit Counter
16-bit Latch
(dimensionality)
Xilinx 5 9 8 4 2.2 DRAMA 16 19 16 16 infinity 3.4.2.4 Placement and dimensionality


3.4.3 A merge sort example
MergeSort(OrigList) =
UnsortedList <- OrigList
do
UnsortedList1 <- UnsortedList.sublist(1, UnsortedList.length/2)
UnsortedList2 <- UnsortedList.sublist(UnsortedList.length/2 + 1,
UnsortedList.length)
SortedList <- ()
PerfectOrder <- true
while (not UnsortedList1.empty) or (not UnsortedList2.empty)
if not UnsortedList2.empty and (UnsortedList1.empty or
UnsortedList1.first > UnsortedList2.first)
SortedList <- SortedList + UnsortedList2.first
UnsortedList2.remove(UnsortedList2.first)
else
SortedList <- SortedList + UnsortedList1.first
UnsortedList1.remove(UnsortedList1.first)
if SortedList.length > 1 and SortedList.secondlast >
SortedList.last
PerfectOrder <- false
UnsortedList <- SortedList
while not PerfectOrder
return SortedList
3.5 Conclusion