1980

Control allocation: the automated design of Digital Controllers

Richard J. Cloutier
Carnegie Mellon University

Follow this and additional works at: http://repository.cmu.edu/ece
NOTICE WARNING CONCERNING COPYRIGHT RESTRICTIONS:
The copyright law of the United States (title 17, U.S. Code) governs the making
of photocopies or other reproductions of copyrighted material. Any copying of this
document without permission of its author may be prohibited by law.
CONTROL ALLOCATION:
THE AUTOMATED DESIGN OF DIGITAL CONTROLLERS

by

Richard J. Cloutier

DRC-01-05-80

April 1980
Control Allocation:
the
Automated Design of Digital Controllers

M.S. Project Report
by
Richard J. Cloutier

Electrical Engineering Department
Carnegie-Mellon University
18 April 1980

This research has been supported by the United States Army Research office, under grants DAAG29-79-C-0197 and DAAG29-78-G-0070, and the Department of Electrical Engineering, Carnegie-Mellon University.
Acknowledgements

I would like to thank my advisor, Dr. Alice Parker for providing me with the opportunity to work on a very interesting project. Her in-depth discussions about the problems involved and her supportive nature have been invaluable. I would also like to thank Andy Nagle for his assistance especially in the representation of the problems inherent to control allocation.

I would also like to express my thanks to Lou Hafer and Gary Leive who have supplied both detailed information and software support without which this project would have been impossible. In addition, I thank the members of the CMU-DA group who have attentively listened to presentations of my work and have commented on it or have discussed it with me. They have provided me with objective views of my methods, which have pointed out deficiencies that I would never have seen without their help.
# Table of Contents

1. Introduction 2
   1.1 Definition of Terms 3
   1.2 The Basic Requirements 4
   1.3 The Solution 5

2. Control Allocation in Relation to Other Design Tasks 7
   2.1 The CMU-DA project 7

3. The Input Requirements for the CMU-DA Control Allocator 8

4. How Control Information is Stored in the Module Database 10
   4.1 Assumptions Made About the Controller 10
   4.2 An Example 12

5. The Steps of Control Allocation 16
   5.1 Conversion of Data Operations Into Micro-ops 17
   5.2 The Conversion of Control Operations Into Micro-ops 21
   5.3 Micro-cycle Time Evaluation 22
   5.4 Control Graph Generation
      5.4.1 Application of the potential parallelism rule 25
      5.4.2 The Control Graph Model 27
   5.5 Micro-instruction Definition and Micro-word Formatting 28
   5.6 Control Signal Conditioning and Micro-word Representation 32

6. Results 36

7. Suggested Improvements for the Automated Control Allocator 39

8. Conclusions 41

I. Appendix 1: The PDP-11/40 example 43
1. Introduction

Digital system design has normally been separated into two main tasks, data path design and control design. During the data path design the architecture of the system takes shape while during the controller design the sequencing and synchronization details are fixed. Control allocation is the process of specifying a controller which will be able to drive the data path design in some specified manner. An automated control allocator will take a description of some digital hardware data path\(^1\) and a procedural description of the desired behavior (the micro-sequence) and produce a description of an engine which will evoke the data path devices in the order specified by the micro-sequence "program". The resulting controller will be a dedicated digital system with its own memory and I/O and have its own specific timing requirements.

Control allocation is a many-faceted problem. Initially the process may be viewed as a hardware design. The functions and interconnection of the digital devices which will generate the sequence of evoke signals must be specified. The control allocation problem is also a problem of autonomous control since the controller must be able to generate all of its own internal control signals which it might require. The system clock signal is an example of such a control signal since it is used to change the state of the controller.

If the controller being designed is of the microprogrammed variety then the control allocation process must also consider the code generation problem. The micro-instructions which will be placed in the micro-rom must first be compiled from the micro-sequence program. To reduce the cost of the controller the control allocator should also consider the problems involved with bit packing of the micro-words. If the micro-word may be reduced in width by a single bit then the microprogram storage requirements will be reduced by as many bits as there are micro-words, a substantial savings in most cases.

Perhaps the most interesting problem of control allocation is the evaluation of potential parallelism. The first aspect of this problem is hardware independence. This is the determination of which basic operations (micro-ops) in the data path machine may be done in parallel due to the independence of the sets of hardware devices which each require. The second aspect is that of data independence in the micro-sequence program. Two micro-sequence steps are data independent when the results of one operation do not depend upon the results of the other.

In addition to these major design problems there are also some intriguing implementation problems to be solved, such as how to correlate different levels of descriptions. If the control sequence is

\(^{1}\)For a definition of this term and others see section 1.1
described at a more abstract level than the devices to be controlled are, then each of the control sequence steps must be expanded into a more detailed version which is compatible. There is also the problem of the controller having its own data paths (with the control signals as data) which must be controlled by itself. The question arises: Which comes first the controller data part or the controller control part? The allocator must know the details of the data part before it may define the control signals, but it must know the signal requirements before it may define the data paths fully. In fact, the optimal solution would require simultaneous solutions to both aspects of the problem.

Control allocation is a complete digital design problem, from the basics of hardware interconnections to the details of an optimizing compiler.

1.1 Definition of Terms

Control Points or Control Lines: The inputs of any device which are not defined as data inputs. These lines are able to select a function or evoke an operation in the device. Examples are lines such as clock, load, select, r/w, clear.

Control Signals: The values which must be placed on control points to cause a particular action to be performed. Associated with each device primitive is a control signal for each control line on the device.

Data Path: A representation of a collection of devices and the interconnections between them.

Data Path Graph: The name for the data path representation in the CMU-DA system. This representation allows for the nodes: register, operator, multiplexer, constant, concatenation, and link. All of the interconnection information is stored in the links and concatenation nodes.

Device: Any collection of hardware elements for which an operation may be defined. Examples are: transistor, AND gate, flip-flop, register, microprocessor. Some device primitives are (respectively): on/off, and, set, load, run/halt/restart.

Device Primitive: The simplest or most basic operation(s) which a device may perform. For example, the simplest type of operation for a register is a LOAD. Whereas the flip-flops which make up the register may be SET but they may not be ON or OFF. The transistors inside of the flip-flop may only be ON or OFF.

Evoke Control Point: A control input which causes a change in the data stored in a device. The clock, load, and clear lines of most devices are in this category.

Micro-controller: The digital subsystem which produces the control signals for driving the data
Micro-instruction: The name given to a single word stored in a microprogram rom. It is usually equivalent to the micro-step.

Micro-operation or Micro-op: A collection of device primitives which must be done during a single clock cycle.

The micro-op model is defined as:

\[
\text{micro-operation} = \langle \text{device list, operation time} \rangle
\]

An entry in the device list contains:

\[
\text{device list element} = \langle \text{name of device, source or destination flags, control signals for this device, pointer to the next device list element or zero} \rangle
\]

\[
\text{operation time} = \langle \text{the time required for this micro-operation to be completed} \rangle
\]

(Usually the longest propagation delay from one of the sources to the destination device(s).)

Microprogram Rom: The device (memory) of the micro-controller which contains the micro-instructions. Also called the MCROM or the micro storage memory.

Micro Sequence Table: The part of the CMU-DA design description which contains the micro-sequence program. This table is made up of register transfer instruction and control How directives.

Micro-step: A collection of micro-ops which will be done during the same major clock cycle. Micro-ops are allowed in the same micro step only when there are no device or data conflicts between them.

Register-transfer Operation: An abstract description of the transfer of a value stored in a register to another register through an optional operator. A register transfer may also describe a transformation made upon the data in situ, such as a shift in a shift register.

Select Control Point: A control input which does not of itself change the value stored in the device. Function selects and output enable lines are of this type.

1.2 The Basic Requirements

In order to understand how a control allocator works one must first understand what it has to work with. There are three basic requirements and each must be available in order for the allocation
process to operate as an independent process.

The first requirement is a description of each of the devices which will be used to implement the digital system. This description should contain a name for the particular device, the operations which the device may perform, and a concise description of how the particular device may be controlled for each operation. In addition to the devices themselves, the interconnections between them in the digital system is important and must be included in the basic requirements.

The second main requirement for any control allocator is an abstract description of the control sequence. A control sequence is a representation of the proper order of events which should occur in the digital system and may be represented at any level of detail, analogous to the description of the devices. This sequence may be represented in a high level programming language or it may be expressed in minute detail by specifying the actual bit patterns which should be used to drive the control points of the data path devices.

The third and final input requirement for a control allocator is a list of the significant constraints on the design. These may be parameters such as cost and speed or they may be instructions as to which types of controllers should be considered. Each of these restrict the design space and are helpful to the control allocator in its search for a suitable controller.

With these requirements fulfilled the control allocator should be able to define a useful controller.

1.3 The Solution

The micro-sequence program is the heart of the control allocation process. Each instruction will be mapped into one or more basic data path operations (device primitives). Combinations of the device primitives will be used to define micro-operations. Once this has been done for each micro-sequence step then there will exist a micro-operation program at a level of detail sufficient for the completion of the control allocation process. The problem will then be one of determining the form of the controller which should be used. The control allocator will select a particular controller configuration by using parameters measured from the microprogram. Once selected, the controller is entered into the path graph description. The control allocator then evaluates the potential parallelism in the micro-operation program. This is determined from the data and hardware dependencies of successive micro-ops. The micro-operations are then assigned to micro-steps. This assignment is a major problem in itself and the procedure used for this project is a heuristic method which attempts to constrain the micro-word width to a specified value while trying to reduce the execution speed of the micro-controller to a minimum. After the micro-steps are assigned, the size requirements for the micro-storage are known and this information is placed in the data path graph description. The final
step of the control allocation process is the conversion of the micro-operations into bit patterns which will be stored in the microprogram rom. This step is fairly simple and is similar to the assembling of a machine language program.
2. Control Allocation in Relation to Other Design Tasks

Control allocation is only a single step of the complete digital design process. It is not necessarily independent of all of the other steps and in most hand designed projects the controller design is considered during the time that the data paths are being designed. This overlap allows for optimizations in the controller which will reduce the cost or increase the speed of the machine.

2.1 The CMU-DA project

The CMU-DA project is a top-down type of design system. It consists of programs which map the design descriptions from an abstract level to a more detailed level during each step. The initial description of the digital system is in the form of an ISPS procedure. This is converted and modified by successive programs until it is detailed enough for construction. The first step is the Value Trace process. In this step the ISP description is converted into a new language which represents the data flow and control flow of the design in a graphical form. This graph allows the VT program to recognize data and control dependencies which will allow for transforms on the design. Some possible transforms include ones which eliminate unnecessary computations such as the recalculation of values which were previously evaluated and could have been saved. A more advanced transform is similar to the code motion technique used in optimizer compilers. The movement of common operations out of all the branches of select statement and the removal of invariant computations from loops are two such transforms [McFa 78]. The second step is the Design Style Selector which determines which type of design the ISPS description is most similar to, and should be implemented with. The effect of this step is to select which particular data path allocator should be used. The third step, Data Path Allocation, is where the data path devices and their interconnections are selected. This allocator uses generic types of devices as building blocks and assumes that the technology which will be used to implement the design can be used to construct these blocks. The following step is Module Binding, during which the devices in the data path are assigned to specific real devices which are available in the design technology. If a device is not available to perform a requested operation then the module binder transforms the data path graph and the micro-sequence table in such a way that the assumed operation may be performed with available devices. The Control Allocation step is next and here additions are made to the path graph to include the controller hardware. The micro-sequence program which was generated by the Data Path allocator is compiled into a microprogram and the program is added to the description of the design. The final steps of layout and construction take the description and modify it in any way which is required for construction. For a detailed discussion of the CMU-DA system see [McFa 78].
3. The Input Requirements for the CMU-DA Control Allocator

In the CMU-DA project the main requirements for control allocation are satisfied by the combination of three sources. The first is the design description which contains the data path graph and the micro-sequence table. The second is the Module data base which contains control information about all the devices used. The third input is from the user, who selects specific parameters during the control allocation process.

The data path graph contains a list of the devices which are required by the design at its current stage of completion. The data path graph also contains information about how each of these devices are interconnected. Associated with each device is a name of a module which will perform the specified functions. The data path graph has been generated by the Data Path allocator and the module information has been added by the Module Binder. A description of each module is contained in a database called the Module Database. The information concerning how devices may be controlled to perform particular operations is also contained in this database. The combination of the Module Database and the data path graph is sufficient for the first main requirement for control allocation.

Included in the same file as the data path graph is the micro-sequence table. This is the abstract description of the control sequence which is required by the control allocator. The micro-sequence table is a series of register transfer instructions and control flow directives. Each of the register transfer instructions specifically notes which devices of the data path graph should be used for the operation. If the micro-sequence program had not been bound to the path graph in this way then the control allocation problem would have been much more difficult because it would be necessary to associate the program operations with actual hardware devices. This type of conversion and the associated register allocation problems have been considered by DeWitt and Mallett. DeWitt defines the process of register allocation as the procedure necessary to determine which hardware register should be assigned to contain a program variable and when this assignment should be changed to accommodate a new program variable [DeWi 78]. He also deals with the problem of processor allocation used to determine which operator should be used for a particular instruction. DeWitt shows that this problem is NP complete [DeWi 76]. Mallett has also considered the problem of micro-word compaction and states:

"A high-level language to microcode translator cannot afford the time to exhaustively improve the object code for every moderately sized program"

Mallett also presents a heuristic method which seems to compact nearly optimally for a linear segment of microcode, in a predefined digital system [Mall 78]. These heuristic methods include ones which
direct a search through a branch and bound type graph and a method of early termination of search
down branches of this graph. For the current control allocation project however the controller has
not yet been defined and Mallett's procedure may not be used.

The micro-sequence program also includes the control directives in the form of ifs, selects, and
joins. Each of these only evoke devices in the controller and the specific action of each will be known
by the control allocator so there is no register allocation problem here. Thus the micro-sequence
table is sufficient to fulfill the second major requirement of control allocation.

The constraints for the design, which is the third major requirement of control allocation are either
built into the control allocator itself or they are specified by the user when running the allocator.
While there are multiple controller schemes such as asynchronous operation and distributed or
residual control, only the microprogrammed variety is considered in this project. The bitwidth of the
micro-word, which is related to the cost of the controller, is specified by the user at runtime. The
speed of the controller is inversely related to this bitwidth. This effect is due to the fact that wide
words are able to control more data path devices at once and thus more concurrent operations are
allowed.
4. How Control Information is Stored in the Module Database

The control information which is being considered here and that which has been placed in the database is only a subset of that found in an ordinary data book. It is only concerned with the operations which will be requested by the register transfer operations and in addition the information has been conditioned by the type of control structure which will be used by the controller.

4.1 Assumptions Made About the Controller

There have been several assumptions made about the operation of the controller before any of the control data was entered in the database. These assumptions were made for two reasons: to save processing time during the control allocation step and to simplify the process of defining the control signals. The controller model which is used assumes that a sequence of signals will be generated. The select signals will become valid first and then the evoke signals will occur. After the evoke signals have returned to their initial state the select signals will become invalid. Allowing for only one type of sequence simplifies the control allocator's job. If arbitrary types of sequences were allowed then the control allocator would have to map each type into the sequence which the controller could generate. Since this mapping would have to occur once for each bit stored in the microprogram it was decided that the person defining the control signals would store the standard sequence in the database and save a substantial amount of processing time. With only a standard sequence allowed the process of entering the information in the database should also be simpler for the user.

The first assumption made about the controller is that there will be a two phase clock system in the controller. The general form of these signals is shown in Figure 4-1. The pulse width of the clock signals will be narrow enough so that both the rising and falling edges may be used for evoking actions. The clock cycle time and their relative phase will be determined by the control allocator during the generation of the micro-code.

A second assumption made is that each control point in the database will be classified as either a select or evoke line. The signals which drive select inputs will be wired directly from the micro-controller and their values will be held at the value specified in the database for a complete clock cycle. Control points which are designated as evoke points will have their control signals conditioned by the phase 2 clock signal. This allows for four types of evoke signals: rising edge, falling edge, positive pulse, and negative pulse. To do this conditioning, the signal from the microprogram word will either be NANDed or ANDed with phase 2 of the system clock. The micro-word value will be determined by the control allocator so that the proper evoke signal will be
The select signals are valid from the start of a phase 1 cycle to the end of that phase. The evoke signals occur during the phase 2 clock pulse.

Figure 4-1: The micro-clock signals

generated. The use of either NAND or AND will also be determined by the control allocator by examining the non-active state specified in the database for the evoke control point. After the operation is evoked the value at the control point will return to its non-evoke value and then the select control lines will be set to the value required for the next operation.
4.2 An Example

For a more detailed explanation of what the database contains consider this example of one of the entries. An partial listing of the SN74161 entry is:

II): SN74161
LINE -14- -CTILINES- -CILNAME-
5
ID: CTI NAME.1
LINE -0- -PINNAME- -PINTYPE- -NONEVOKE- -SUBMODNO-
ID: PINTNAME.1 -CLOCK 1 H 1
ID: PINTYPE.1 -NONEVOKE- -SUBMODNO-
ID: CICLNAME.2
LINE -0- -PINNAME- -PINTYPE- -NONEVOKE- -SUBMODNO-
ID: PINTNAME.1 -CLEAR 1 H 1
ID: PINTYPE.1 -NONEVOKE- -SUBMODNO-
ID: CICLNAME.3
LINE -0- -PINNAME- -PINTYPE- -NONEVOKE- -SUBMODNO-
ID: PINTNAME.1 -LOAD 1 H 1
ID: PINTYPE.1 -NONEVOKE- -SUBMODNO-
ID: CUNAM.4
LINE -0- -PINNAME- -PINTYPE- -NONEVOKE- -SUBMODNO-
ID: PINTNAME.1 -ENBP 0 X 1
ID: PINTYPE.1 -NONEVOKE- -SUBMODNO-
LINE -14- -ESEQ.
ID: ISEQ.
LINE -0- -EVOKLINE- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
ID: ISEQ.
LINE -0- -SUBMOD-
0 -ESEQ.
In this example only line 14 is shown, however in the data base there are lines 0 through 13 which contain other information about this particular device. See Leive's report on the module database for more information about the lines 0 through 13. [Leiv 79]

Line 14 contains the information required to specify control for the device. The first entry, CTLLINES, is the number of control lines for each device, in this case it is five. Note that this is the number of lines per device and not the number of lines per package. In the SN7474 there are two devices per package with six control lines total but only three CTLLINES per device. If CTLLINES is zero then there are no control lines for the device and it requires no control inputs to perform its function. The SN7400 (NAND) is such a device.

The second entry CTLNAME is a list of characteristics of each control line. There are four traits for each line and the first, PINNAME, is a character string which names the control point on the device. The second trait, PINTYPE, is either 1 or 0. A 1 signifies that the particular control point is an evoke input, a 0 indicates it is a select input. The third trait is NONEVOKE which is a single character indicating the non-evoke state for a particular line. The three valid values are H(high) L(low) and X(don't care). The NONEVOKE has an obvious meaning for evoke lines but if a particular device requires some sort of setup sequence then the value of NONEVOKE for a select line could be
something other than the expected X. The final trait SUBMODNO is the submodule number of the control point. A submodule is defined as the set of control points which are required to cause a particular operation to be performed.

The third section of line 14 is a list of the operations which may be performed with the particular module. This has the heading of CTLESEQ. In the above example the module may perform the operations: INC, LOAD, READ, and CLEAR. The INC, LOAD and CLEAR should be obvious as to what the particular functions is. The READ is included so that any micro-operation which requires this module as an input will be able to find out if any control lines need to be set in order to read the contents of the register. In this case there are no output enable lines so there is no ESEQ (Evoke SEQuence) for a READ. Some other devices might have such a line which must be controlled. To further explain the CTLESEQ consider the INC operation. The EVOKLINE is a number which indicates which of the EVALs (Evoke VALues) is the evoke control point for this particular operation. The evokstep is the number of the ESEQ during which the operation is performed. If a particular operation requires either an intricate setup or hold sequence on the control lines then it may require more than a single ESEQ (control step) and EVOKSTEP should indicate which step the evoke is actually performed. One such example might be the multiplication of two numbers by a special function unit which requires the following steps: Load num1 in register A, Load num2 in register B, Start the multiplication, Get the result. Such an example would have an ID of MULT and would require four ESEQ steps.

The SUBMOD is the submodule number which is used by this operation. If there is more than one submodule in this device then for the current operation (INC) there would be fewer control values listed than the number of control lines for the device. MAXTIME is the time in nanoseconds required for this module to perform the current operation.

ESEQ is a list of the steps which must be followed to perform the operation. If more than one step is required for the operation then there is more than one ESEQ. Each ESEQ.n has a list of the EVALs which the control points must be set to. There are as many entries in the EVAL list as there are control lines in the submodule being used. The EVALs are listed in the same order as the CTLNAMEs list but only the current submodule control points are included. If the line named LOAD of the above example had been in submodule 2 then for the INC operation there would have been only four EVALs listed. EVAL.1 would correspond to CTLNAME.1, and EVAL.2 with CTLNAME.2, EVAL.3 with CTLNAME.4, EVAL.4 with CTLNAME.5. There would be no EVAL.5.

The BITVALUE is a character which indicates what type of control values should be used on the particular control point. Possible values for BITVALUE are P(rising edge), N(falling edge), H(positive pulse), L(negative pulse), S(select), X(don't care). The select is used for the select bits on a
multiplexer and the actual value stored in the microprogram is determined by the control allocator from the path graph link selected. One of the advantages of this control information structure is that it is a simple matter to determine the value of the bits to be stored in the micro-word.

The assumed model of the controller allows the user to squeeze some operations, which would have required more than a single cycle in a simpler controller, into a single step (ESEQ). Note that in this example the 74161 requires that the two count enable lines (ENBP and ENBT) be changed only while the clock line is high. With the NONEVOKE value of the clock equal to H the select lines may be changed to any value before the clock goes low then high (the rising edge is the evoke signal) to perform the selected operation. If the controller could not have set the non-evoke value of the clock line then this would have required two control steps.
5. The Steps of Control Allocation

The first main step that the control allocator does is to read the interconnection information from the path graph and places it in an internal form which will allow for tracing through the data paths in search of controllable devices. The micro-sequence table is also read into an internal form but at this point it has not been optimized and there are some simple transformations which may be performed upon it.

The first optimization is a macro-type of substitution of subroutine calls by the instructions of the subroutine. If the subroutine is called only once throughout the micro-sequence table, then there is no reason that a call is required. The subroutine code may be placed in-line. This macro substitution optimizes in three ways, first each subroutine call is expanded into either two or four micro-ops (depending upon the type of controller used). These extra micro-ops will not be required if there is no call. Along with every call there must also be a return which requires either one or three micro-ops (again depending upon the type of controller). If all of the subroutine calls can be removed from the micro-sequence program in this manner then a much simpler controller may be used and further cost savings will be realized. There are also cases when it would be advantageous to do macro substitution even when a subroutine is called many times throughout the microprogram. If the subroutine is very short then the overhead of a minimum of three micro-ops for each call would be greater than the cost of the routine itself. Remember also that the extra hardware required to allow for micro-subroutines, when eliminated, will reduce the cost of the micro-machine. There is a hidden advantage to this type of transformation since with the subroutine instructions inserted in-line there will be potentially more parallel operations and the machine may operate faster. This is due to the fact that a call or return delimits a section of straight line micro-code and potential parallelism is only allowed between micro-operations of the same straight line micro-sequence.

A second modification which is performed on the micro-sequence table at this point is the removal of all the diverge and merge information which was included due to the way the ISP description was written. The designer determined at the ISP level that some operations were independent and that they could be done in parallel. This information would have been helpful if the allocator had been sure that the designer knew how the design would be implemented, but this is impossible. The control allocator will have to evaluate the micro-sequence program for potential parallelisms in a later step, so if the designer has guessed right then the correct information will be recovered.

The micro-sequence has a control operation called PEND which indicates the end of a particular routine. If the PEND for the main routine is executed then the machine should halt. The control allocator converts the main routine's PEND into a operation which will stop the machine. PENDs
which do not delimit the main routine, are converted by the control allocator into a return from a micro-subroutine. There is another control opcode called BAILOUT which is an instruction to cause the control flow to leave the named routine. Currently the control allocator only allows static bailouts, which are converted to branches to the end of active routine. ISPS allows for dynamic RESTARTS and LEAVES which require that the control flow leave one of the calling routines, but not necessarily at the same calling level each time it is executed. This type of control construct would require extra hardware in the controller in order to label all of the routines and it was deemed too expensive to include in any of the controller designs.

After all of these transformations have been performed upon the micro-sequence table there are likely to be cases where a JOIN instruction indicates a branch to the next instruction. A micro-step generated for this type of instruction is useless and if it is converted into a micro-op it will increase the cost and reduce the speed of the controller. To avoid this, the last step of the micro-sequence optimization is to remove such operations and clean up other instructions which would generate no-op types of micro-ops.

At this point there is enough information in the micro-sequence table to select an efficient micro-controller. Currently there is only one class of controller available, it is a microprogrammed controller with a two phase clock system. The term two phase refers to the types of signals which emanate from the controller and not to a system which will allow for two sequential operations to be performed from a single micro-word fetch. There are three types of micro-controllers and the proper one is selected by determining the maximum number of subroutines which may be active at one time. A diagram of these controllers is shown in Figure 5-1. All unconditional branch addresses are stored as constants on an input to the multiplexer which is able to load the microprogram counter. The conditional branch addresses are looked up in a Rom (mcarom) when necessary. The maximum nesting level of the subroutines determines how large the micro-machine stack must be. At this point the designer must select the option of having a micro-fetch/execute overlap cycle and accept the additional costs for the extra hardware required to perform this type of operation.

5.1 Conversion of Data Operations Into Micro-ops

There are many types of data operations possible such as: binary, unary, operator, non-operator, array access, device functions and some combinations of these. These operations range in complexity from move to multiply. It is the problem of the control allocator to convert this wide range of operations into device primitives which later will be used to specify how to control the devices. This conversion is dependent upon certain aspects of the micro-sequence step itself, such as whether there has been an operator specified or if the operation has been left for one of the registers to
Figure 5-1: The micro-controller
perform. If there is an operator then the control allocator assigns the operation to it and the
destination register is assigned the LOAD function. If the destination is not a register but instead a
memory array then the operation WRITE is used. If there is no operator defined then the operation
must be done in one of the source registers or in the destination register. This usually depends upon
the particular opcode.

In most cases each of the source registers is assigned the operation of READ so that if there are
output enable lines on that particular device they will be enabled by the controller. Since most
devices are able to perform only very basic functions there is a table in the control allocator which
maps the register transfer operations into basic functions. For example the add2c operation maps
into a simple ADD operation. This assumes that the module binder has found a device which will do a
two's complement add and has used it or it has determined that a simple adder will work (with
perhaps some modification of the data paths to assure two's complement operation). Table 5-1
contains the mapping of register transfer data operations into these basic functions.

The conversion of the data micro-sequence steps into micro-ops is the first difficult step of the
control allocation process. A fairly common one will make a good example:

\[
\text{#Z43(add):#Z:#10,#3(dest),#4(src1):#11,#6(src2):#12;}
\]

The opcode #243 is defined as the operation ADD. In this example the device #2 is an adder
which will be used to sum the numbers stored in the devices #4 and #5. The result will be stored in
the device #3. The numbers 10,1 Land 12 indicate some of the links over which the data must pass
for this operation. There is however additional information not included in this instruction which must
be determined from the path graph. First it must be determined if there are any unnamed devices
which are used to perform this operation, such as multiplexers in the data paths. For this example let
us assume that link #11 is an input to a multiplexer whose output connects to the input of the adder.
The control allocator must trace through the path graph to find which input of the multiplexer is being
used and store it and also remember that the multiplexer was used. All of the data path devices and
their associated operations which this micro-sequence step uses are stored in a device list. Once the
set of devices which this micro-sequence step requires is known then the allocator must determine
how many micro-steps are required. For this the module database is consulted. In this example the
allocator must find out how to cause the device #2 to ADD. If it is an ALL) then there will be some
control bits to set to specific values. If #2 were a simple adder such as a 7483 then there would be
no control bits and of course the micro-controller would not need to set any values for this device.
There are devices and operations which may require a setup sequence, and in such a case there will
be a series of steps which must be performed. Multiple control steps will cause the micro-sequence
instruction to generate more that a single micro-op. Once the number of steps required for the
micro-sequence step is known and the steps have been fixed relative to each other then the
Register Transfer Operation => Device primitive

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Device Primitive</th>
</tr>
</thead>
<tbody>
<tr>
<td>test</td>
<td>TEST</td>
</tr>
<tr>
<td>eql</td>
<td>EQL</td>
</tr>
<tr>
<td>neq</td>
<td>NEQ</td>
</tr>
<tr>
<td>lss</td>
<td>LSS</td>
</tr>
<tr>
<td>leq</td>
<td>LEQ</td>
</tr>
<tr>
<td>geq</td>
<td>GEQ</td>
</tr>
<tr>
<td>gtr</td>
<td>GTR</td>
</tr>
<tr>
<td>move</td>
<td>MOVE</td>
</tr>
<tr>
<td>clear</td>
<td>CLEAR</td>
</tr>
<tr>
<td>noop</td>
<td>NOP</td>
</tr>
<tr>
<td>read</td>
<td>READ</td>
</tr>
<tr>
<td>write</td>
<td>WRITE</td>
</tr>
<tr>
<td>lshftd</td>
<td>SHIFTL</td>
</tr>
<tr>
<td>rshftd</td>
<td>SHIFTR</td>
</tr>
<tr>
<td>lrot</td>
<td>ROTL</td>
</tr>
<tr>
<td>rrot</td>
<td>ROTR</td>
</tr>
<tr>
<td>not</td>
<td>NOT</td>
</tr>
<tr>
<td>incr</td>
<td>INC</td>
</tr>
<tr>
<td>decre</td>
<td>DEC</td>
</tr>
<tr>
<td>and</td>
<td>AND</td>
</tr>
<tr>
<td>or</td>
<td>OR</td>
</tr>
<tr>
<td>nand</td>
<td>NAND</td>
</tr>
<tr>
<td>nor</td>
<td>NOR</td>
</tr>
<tr>
<td>xor</td>
<td>XOR</td>
</tr>
<tr>
<td>eqv</td>
<td>EQV</td>
</tr>
<tr>
<td>add</td>
<td>ADD</td>
</tr>
<tr>
<td>sub</td>
<td>SUB</td>
</tr>
<tr>
<td>lshftl</td>
<td>SHIFTL</td>
</tr>
<tr>
<td>rshftl</td>
<td>SHIFTR</td>
</tr>
<tr>
<td>lshftO</td>
<td>SHIFTL</td>
</tr>
<tr>
<td>rshftO</td>
<td>SHIFTR</td>
</tr>
<tr>
<td>cone</td>
<td>LOAD</td>
</tr>
<tr>
<td>neg2c</td>
<td>NEG</td>
</tr>
<tr>
<td>neglc</td>
<td>NEG</td>
</tr>
<tr>
<td>negsm</td>
<td>NEG</td>
</tr>
<tr>
<td>add2c</td>
<td>ADD</td>
</tr>
<tr>
<td>addle</td>
<td>ADD</td>
</tr>
<tr>
<td>addsm</td>
<td>ADD</td>
</tr>
<tr>
<td>sub2c</td>
<td>SUB</td>
</tr>
<tr>
<td>sub1c</td>
<td>SUB</td>
</tr>
<tr>
<td>sublc</td>
<td>SUB</td>
</tr>
<tr>
<td>mult2c</td>
<td>MULT</td>
</tr>
<tr>
<td>multic</td>
<td>MULT</td>
</tr>
<tr>
<td>multsm</td>
<td>MULT</td>
</tr>
<tr>
<td>div2c</td>
<td>DIV</td>
</tr>
<tr>
<td>divsm</td>
<td>DIV</td>
</tr>
<tr>
<td>mod2c</td>
<td>MOD</td>
</tr>
<tr>
<td>mod1c</td>
<td>MOD</td>
</tr>
<tr>
<td>modsm</td>
<td>MOD</td>
</tr>
<tr>
<td>mult</td>
<td>MULT</td>
</tr>
<tr>
<td>div</td>
<td>DIV</td>
</tr>
<tr>
<td>mod</td>
<td>MOD</td>
</tr>
<tr>
<td>move</td>
<td>ALOAD</td>
</tr>
<tr>
<td>move2c</td>
<td>ALOAD</td>
</tr>
<tr>
<td>move1c</td>
<td>ALOAD</td>
</tr>
<tr>
<td>move2sm</td>
<td>ALOAD</td>
</tr>
<tr>
<td>test2c</td>
<td>TEST</td>
</tr>
<tr>
<td>eql2c</td>
<td>EQL</td>
</tr>
<tr>
<td>eq2c</td>
<td>EQ</td>
</tr>
<tr>
<td>n2c</td>
<td>NEQ</td>
</tr>
<tr>
<td>iss2c</td>
<td>LSS</td>
</tr>
<tr>
<td>leq2c</td>
<td>LEQ</td>
</tr>
<tr>
<td>geq2c</td>
<td>GEQ</td>
</tr>
<tr>
<td>gtr2c</td>
<td>GTR</td>
</tr>
<tr>
<td>test1c</td>
<td>TEST</td>
</tr>
<tr>
<td>eql1c</td>
<td>EQL</td>
</tr>
<tr>
<td>neq1c</td>
<td>NEQ</td>
</tr>
<tr>
<td>lss1c</td>
<td>LSS</td>
</tr>
<tr>
<td>leq1c</td>
<td>LEQ</td>
</tr>
<tr>
<td>geq1c</td>
<td>GEQ</td>
</tr>
<tr>
<td>gtr1c</td>
<td>GTR</td>
</tr>
<tr>
<td>test2sm</td>
<td>TEST</td>
</tr>
<tr>
<td>eql2sm</td>
<td>EQL</td>
</tr>
<tr>
<td>neq2sm</td>
<td>NEQ</td>
</tr>
<tr>
<td>lss2sm</td>
<td>LSS</td>
</tr>
<tr>
<td>leq2sm</td>
<td>LEQ</td>
</tr>
<tr>
<td>geq2sm</td>
<td>GEQ</td>
</tr>
<tr>
<td>gtr2sm</td>
<td>GTR</td>
</tr>
</tbody>
</table>

Table 5-1: The mapping from register transfer instructions to device primitives

Micro-op(s) may be generated. The steps are fixed in a relative position because if one of the devices requires a single step and another requires three they must be evoked at the same time.

For each micro-op, a device list is generated which contains only those devices which the micro-op needs. Each element of this list contains a flag indicating whether the device is required as a source.
or destination which will be important during the potential parallelism analysis step.

There is an important feature of the micro-sequence table which might not be apparent to the reader. From the above description one might think that only one operation may be performed during each micro-op. This is not true. A micro-op is only generated when a micro-sequence step indicates that there should be an evoke signal. This is designated by the presence of a destination in the micro-sequence step or the execution of a control operation. If there is nothing to evoke then the micro-sequence step is called a chained micro-sequence step and all of the control information which it generates is saved and added to the immediately following step (which may itself be a chained micro-sequence step). The number of steps in a chain is unlimited so that any combination of operations may be performed in a single micro-operation.

5.2 The Conversion of Control Operations Into Micro-ops

The conversion of the control operations into micro-ops is not as simple a process as it was for data operations. The allocator must convert each control operation into a data-transfer operation which operates upon part of the micro-controller itself. The control operation JOIN is the simplest; all that is necessary is that the new micro-address be jammed into the microprogram counter. If the control operation is a conditional branch then the new address must be looked up in a table (rom) and then placed in the microprogram counter. For calls to subroutines the old microprogram counter must be stored (either in a register or on a stack) and the new one placed in the counter, while for returns the old microprogram counter must be restored and the stack pointer changed. The action of control operations is highly dependent upon the type of controller which has been selected and for this reason the device primitives are built into the control allocator program.

The CALL micro-sequence instruction is a good example of the necessity of building these device primitives into the control allocator. If a type 2 controller (Figure 5-1) is selected then the CALL micro-sequence instruction is converted into two micro-ops:

<table>
<thead>
<tr>
<th>Ifmicro-op 1</th>
<th>Ifmicro-op 2</th>
</tr>
</thead>
<tbody>
<tr>
<td>iftcro-op 1 = Load MCSMDR</td>
<td>ISave the old micro program address</td>
</tr>
<tr>
<td>Ifmicro-op2 = Select proper input of MCPCIMX</td>
<td>ILoad the new microprogram address</td>
</tr>
<tr>
<td>Load MCMPC</td>
<td></td>
</tr>
</tbody>
</table>

If the controller selected was of the type 3 form then the call micro-sequence step is converted into four micro-ops:
Micro-op 1 = Increment MCSSP  
Increment the micro stack pointer
Micro-op 2 = Select input # 1 of MCSMDIMX  
Load MCMDR  
Save the old microprogram address
Micro-op 3 = Write MCSSTK  
Store old value on the stack
Micro-op 4 = Select the proper input of MCPCIMX  
Load MCMPC  
Load the new microprogram address

In this example micro-op 1 for the type 2 controller and micro-op 2 for the type 3 controller perform the same function but have different device primitives because in the type 3 controller there is an extra multiplexer which must be controlled.

5.3 Micro-cycle Time Evaluation

Once a complete device list for a micro-op is available then the control allocator may determine the time required to perform each micro-op. This timing measurement is necessary in order to determine the optimal micro-cycle time. The minimum allowable micro-cycle time is defined as the maximum of the following:

- The maximum time it takes to read a word out of the microprogram store when using the microprogram counter as an address.
- The maximum time it takes to determine the next micro-address by accessing the micro-address rom.
- The maximum time required to retrieve the next micro-address when a micro-subroutine call or return is being executed.
- The minimum time required to perform any of the data operations in the data path section of the machine.

The selection of the micro-cycle time has been left to the user but the control allocator measures all of the operation times and reports the maximum and minimum times for the user to use in his selection. In order to evaluate the time required for each micro-operation the control allocator must find the slowest path over which data must pass from any source device to any destination device. To do this the allocator evaluates the time required for data to propagate from each devices output to the output of the device which follows it. The control allocator then finds the slowest path from each of the source devices to a destination device and takes this as the operation time for this micro-operation. The control allocator allow for the user to specify the desired micro-cycle time. When this has been done the micro-operation program is reviewed and any micro-op which requires more time than the specified cycle time will be divided into a series of micro-ops. Each of these will fit into the specified micro-cycle time slot and each performs part of the original operation.
5.4 Control Graph Generation

At this point, the design description is a sequential list of micro-operations. It is highly probable that this program contains at least two sequential micro-operations which have disjoint device sets, and in most cases these two micro-ops may be executed in their original order or in a reversed order without affecting the results of the program. Since these operations are hardware-independent they may even be done at the same time with equivalent results. Two micro-operations are defined as potentially parallel if the execution of both in parallel would cause the same results as when they are executed in the original order. Potential parallelism is not limited to the combination of just pairs of operations. If three micro-ops are all device independent and their execution ordering is non-consequential then all three are defined as potentially parallel. The combination of serial micro-ops into parallel operations will reduce both the length of the microprogram and its execution time, and the control allocator will attempt to exploit such potential parallelism because of these advantages.

In the previous paragraph, two rules were used to determine if micro-ops were potentially parallel. They were, total device independence and execution order insensitivity. These are valid rules but also excessively restrictive. Another set of rules is presented by Dasgupta [Dasg 76]. His rules indicates that two micro-operations, MO_a and MO_b, are potentially parallel if the following is true:

\[(SC_a \cap SK_b = \phi) \land (SC_b \cap SK_a = \phi) \land (SK_a \cap SK_b = \phi) \land (U_a \cap U_b = \phi)\]

Where:

- \(SC_i\) = The set of source devices for micro-operation \(i\)
- \(SK_i\) = The set of sink devices (destinations) for micro-operation \(i\)
- \(U_i\) = The set of paths (links and operators) used in micro-operation \(i\)

This rule simply states that two micro-operations may be done in parallel if one's source is not the destination of the other, and that they do not use the same links or write to the same destination. It is slightly more comprehensive than the one presented at the beginning of this section because it also considers the links and operators which the data must pass through. This rule does not, of itself, detect cases where the interchanging of the order of execution would generate an incorrect result, which may indicate that parallel operation would also be incorrect. To avoid such cases Dasgupta only applies the rule to micro-ops which are in the same Straight Line Micro-code Segment. A SLMS is defined as a section of the micro-operation program which has no branches in it. An equivalent definition is: a section of the micro-op program which only contains data micro-ops (i.e. no control ops). If two congruent micro-ops of a SLMS satisfy Dasgupta's rule then they are potentially parallel.

Dasgupta's rule is correct but still too restrictive to be used by the control allocator. There are
usually cases where two micro-ops use the same sources but different destinations and it is possible to do both in parallel. His rule might not indicate that these were potentially parallel because of the $(U_a \cap U_b) = \langle >$ restriction. The optimal rule for potential parallelism detection indicates all micro-ops which may be done in parallel with the available hardware and disallows combinations which would produce incorrect results. This rule is dependent upon the controller which is used and a new rule must be defined whenever a new controller style is implemented.

The rule used for this project is defined as follows:

For two micro-operations of the same SLMS where $MO_i$ precedes $MO_j$ the micro-operations are potentially parallel if the following is true.

$$(SC, f \in SK, = \langle >) \land (SK, \Pi SK, = \langle >)$$

and

If $(SC, D SK, \#) < f)$ then SAME, « MO, » MO, is added to MO/s SAME list

and

If $(SU, f SU, = C * < f)$ then the following must also be true, for all $k$: $C_{k|i} = C_{k|f}$

Where:

$SU_n = SC_n \cup U_{i}$ This is the set of all devices used in the micro-op except for the destination devices.

SAME$_n = A$ pointer in $MO_n$ to the micro-operation(s) which it must never occur before.

$C = The$ intersection of the source sets of the two micro-ops.

$C_{n|i} = The$ function (device primitive) specified for the device $C_n$ by $MO_{n|i}$.

In simple terms this rule states that two micro-ops are potentially parallel if the preceding micro-op's destination device(s) are different from the other's sources, and their destination devices are different, and if they use any devices in common $(SU_n)$ then these devices must be controlled in the same manner for both operations.

The availability of the SAME, pointer allows for cases where the interchanging of the micro-ops would affect the results of the program but their execution in parallel will yield correct results. The SAME pointer does not necessarily point to a single micro-op, it may also point to a list of micro-ops which the micro-op must never occur before.
The use of this potential parallelism rule may be demonstrated with a couple of examples:

Example 1

MO-1 A<-B
MO-2 C+-D
MO-3 E<->F
MO-4 D+-A

In this example MO-1 and MO-2 are device independent so they are potentially parallel. The same is true for the combinations of MO-2 and MO-3, MO-1 and MO-3, and MO-3 and MO-4. Note that it is not true that when MO-1 and MO-3, and MO-3 and MO-4 are potentially parallel that MO-1 and MO-4 are also potentially parallel. In this case the result of MO-1 is required by MO-4 and so they must be sequential. In this example it is also true that MO-2 and MO-4 are potentially parallel but for this case SAME4 is set to point to MO-2 indicating that MO-4 must not occur before MO-2.

Example 2

MO-1 D+-25
MO-2 B*-L
MO-3 A<-B+C
MO-4 L*-3
MO-5 C«-D

In this example the following combinations are potentially parallel: MO-1 and MO-2, MO-1 and MO-3, MO-1 and MO-4, MO-3 and MO-4, MO-4 and MO-5, MO-2 and MO-5. There are also of potentially parallel pairs of MO-2 and MO-4 in which SAME4 points to MO-2, and MO-3 and MO-5 where SAME5 points to MO-3. The combination of MO-2 and MO-5 is of little interest since it will never be allowed to be parallel because SAME5 points to MO-3 which must follow MO-2.

5.4.1 Application of the potential parallelism rule

The procedure which is used by the control allocator is the fastest possible method which will test for all potentially parallel operations. An approach which just compares each micro-operation with all the others in the same SLMS is inefficient and rather expensive since the number of comparisons grows polynomially as a function of the number of micro-ops in the SLMS.

The method used by the control allocator takes advantage of the fact that the micro-operation program is ordered and that some comparisons may be ignored when certain conditions are found to be true. These conditions may be seen in the following example.
Consider MO-4; comparing it with MO-3 they are found to be potentially parallel. Next, comparing it with MO-2 it is found to not be potentially parallel. There is then no need to compare MO-4 with MO-1 since MO-4 must follow MO-2 and the relationship between MO-2 and MO-1 will determine how MO-4 and MO-1 will be related. The control allocator avoids the unnecessary comparisons by remembering cases where this data dependency precludes any potential parallelism.

This procedure used by the control allocator is one of finding the range of potential parallelism for each micro-operation. To do this first a base micro-operation is selected and it is compared with its immediate predecessor. If these two are potentially parallel then the original micro-op and its predecessor's predecessor are compared. This process continues to backtrack through the micro-op program until a limit is found. A limit is defined as any one of the following: the beginning of the micro-operation program, a micro-operation which is not potentially parallel with the base micro-op, or the beginning of the SLMS. Once a limit is found an entry is made in a table which stores lists of micro-ops which may be started immediately after each upper limit micro-op is completed. During the search for the upper limit the SAME pointer may have had entries added to it. This does not constitute an upper limit but the information about these listed micro-operations will affect some of the potential parallelism in a later step.

The next step is to find the lower limit of each micro-operation. To do this a base micro-op is again selected and successive following micro-ops are compared with the base until a limit is found. The lower limit is defined as one of the following: the micro-op which is not potentially parallel with the base micro-op, the end of the SLMS (the micro-op which follows the first control operation\(^2\) ), or the end of the micro-operation program. The base micro-op is entered in a second table which contains lists of micro-ops which must be completed before the lower limit micro-operation may be started.

This procedure is repeated for each micro-op in the program and when completed there are two tables, one called the UPPER and a second called the LOWER, which contain all of the potential parallelism information. For the control allocation project a graph structure was selected to represent this information and the next step of the control allocation process is the generation of the Control graph from these tables.

\(^2\)A slight modification of the definition of SLMS has been made for use by the control allocator. The change is that the last micro-operation of a SLMS may be a control operation. This allows for a control micro-operation to be performed in parallel with a regular data operation. This type of overlap is almost always taken advantage of in microprogrammed machines because of the extra time which is required to generate the new microprogram address.
5.4.2 The Control Graph Model

The control graph is a representation of the potential parallelism information which was determined in the preceding step. It only represents the potential parallelism information and does not include any control flow information (such as control ops) which one might normally associate with a control graph. The control operations such as the IF statement were converted into data transfer operations during the generation of the micro-operation program and their effect has not been lost but they do not exist as control ops in the control graph.

There are three types of nodes in the control graph, Forks, Joins and Operations. The Operation node is the most common and is defined as:

Operation Node = <Forward, Backward, Sequence number, Devicelist, Cgsame>

The fields Backward and Forward are pointers to other control graph nodes which must precede and follow (respectively) this node in order of execution. With these fields the nodes of the control graph are interconnected. The Sequence number is both a sequence number and a cross reference to the micro-operation which this control graph node was generated from. Since there is a one to one relationship between each control graph node and each micro-operation this number is unique for each control graph node. The Devicelist is a list of the devices which this control graph node must control. It is a subset of the device list which is contained in the micro-op and does not contain those devices which have no control lines. Cgsame performs a similar function as that done by SAME in the micro-ops, it points to the control graph node(s) which the current control graph node must never occur before.

The fork nodes of the control graph indicate the start of two or more potentially parallel paths in the control graph. It is defined as:

Fork Node = <Backward, List of followers>

The field Backward points back to the preceding control graph node. The list of followers is a list of the control graph nodes which must follow this one.

The join nodes indicate when two or more potentially parallel paths of the control graph must be completed before another node may be started. The form of the join node is:

Join Node = <Forward, List of Predecessors>

The Forward field points to the control graph node which follows this node and the List of Predecessors contains a list of the control graph nodes which must be done before this one is started.

There is nothing in the control graph structure which requires that the fork and join nodes be
paired. A control graph may contain a single fork which has ten potentially parallel legs, and four joins which "collect" these legs when they are no longer potentially parallel. The forks only indicate when potential parallelism starts, and the joins when it ends.

The generation of the control graph is a simple process once the UPPER and LOWER tables are complete. The first step is to provide a control graph node for each micro-op in the program. The forward and backward pointers are left empty at this time. The UPPER table is then processed. For each entry in this table which has more than a single follower a fork is inserted in the control graph which will branch to each of the followers. The pointers in the fork node point to all the correct places so that the upper limit control graph node (and micro-op) are completed before the following nodes are started. In cases where there is only a single follower then only the forward and backward pointers of the affected control graph nodes are set. After all of the UPPER table has been processed the LOWER table is used to place the joins in the control graph. For each entry in the table which has more than one predecessor a join from the listed nodes is inserted in the control graph. When this table has been finished all of the Forward and Backward fields will have been filled and the control graph will be complete. Figure 5-2 shows the process of generating a control graph for the microprogram used in Example 1 above.

The information which is contained in a control graph can be very confusing when it is not represented in the graphical form. To demonstrate this and to further understand the power of the control graph structure consider the control graph shown in Figure 5-3. If one does not consider the constraints caused by the Cgsame pointers then there are 38 potential parallel combinations of two micro-ops each in this example. A few of these are the combinations of 1 and 2, 2 and 10, 3 and 9, 5 and 6, and 5 and 7. Combinations which are not allowed are ones such as 1 and 10, 2 and 9, or 4 and 7. When one considers the limits set by the Cgsame pointers then there are only 30 potential parallel combinations of two nodes. This is caused by the Cgsame of node 11 which indicates that it must never occur before node 9 and thus it is not really potentially parallel with nodes such as 5 and 7. Note that node 11 is still potentially parallel with node 3 since node 9 and node 3 are potentially parallel and the nodes 11 and 9 are also potentially parallel.

### 5.5 Micro-instruction Definition and Micro-word Formatting

This step of the control allocation process was not developed by the author of this report but since it is an integral part of the process it will be outlined here. For a more detailed description of this step and the problems associated with it see [Nagl 78].

The control graph contains a representation of all of the potentially parallel combinations of micro-ops which may occur. Since it only represents the potentially parallel combinations a decision
Original micro-op program

MO-1  A <- B
MO-2  C <- D
MO3   E <- F
MO-4  D <- A

Step 1
Generation of the CG nodes without the forward and backward fields filled in.

Step 2
After processing the Upper table

Step 3
After processing the Lower table

Upper table
Micro-op node numbers
MO0
MO-1
MO-2
MO-3
Micro-op(s) which must follow
1, 2, 3
4

Lower table
Micro-op node numbers
MO-1
MO-2
MO-3
MO-4
MO-5
Micro-op(s) which must precede
1
2, 3, 4

Figure 5-2: Steps in Control Graph generation
The solid lines indicate the same information which is contained in the forward and backward fields. The dashed lines indicate the nodes which Cgsame points to. The numbered nodes are the ordinary control graph nodes. The forks and joins are represented by F and J respectively.

Figure 5-3: An example control graph

must be made about which of the micro-ops will actually be performed in parallel. This decision is not trivial because in most cases a micro-op may be potentially parallel with more than one other micro-op and one combination may be better than the other. One method which may be used is to arbitrarily combine micro-ops into micro-instructions. Operating in this manner it is possible to take advantage of all of the potential parallelism of the micro-op program and thus generate the shortest possible microprogram but the width of each micro-instruction may be wider than the width necessary if one had been careful about combining the micro-ops.
In microprogrammed machines the width of the micro-instruction is a function of how the instruction is formatted. In machines which have a horizontal word format each bit of the micro-instruction contains the signal for a single control point of the digital system and the micro-instruction has enough bits so that every control point may be controlled in a single word. Obviously such words tend to be very wide and in such cases the micro-storage memory for a system is a substantial part of the total controller cost. In machines which have a vertical word format the micro-instruction contains fields which are decoded by the controller to generate the proper control signals. A totally vertical machine has the control signals encoded so that only a single micro-op may be performed by each micro-instruction. The advantage of encoding the micro-instruction is that the width of the instruction is reduced which tends to reduce the cost of the microprogram memory. A disadvantage is that only a single micro-operation may be performed during a micro-instruction and thus any potential parallelism is not possible.

In most designs it is highly unlikely that there will ever be a need for a micro-instruction which will be able to evoke every device at once (for which the horizontal format is required). It is also unlikely that there will ever be a micro-operation program with no potential parallelism, for which a totally vertical format would be the correct choice. Since neither extreme is expected the obvious solution is to use a hybrid format which combines the advantages of both. This new format, which the control allocator uses, allows for all the parallel micro-operations necessary but it also overlays or encodes some bits to reduce the width of the control word. To use this new format care must be taken when defining the micro-instructions so both the instruction definition and the formatting processes must work together to reduce the cost of the controller while trying to generate the fastest possible microprogram. The reason that this factor may be exploited by the control allocator is that the controller has not been defined and the allocator may decide what formats the micro-instructions will have. In systems which compile micro-instruction for predefined controllers the format of the micro-instructions are already specified and the combination of micro-operations is limited to the ones which will fit into the available formats.

The process of determining which micro-ops will be done in parallel can be thought of as a kind of mental exercise with an imaginary control graph. Consider a control graph such as the one in Figure 5-3. The control graph nodes in this imaginary control graph are connected to each other by elastic links. The nodes of the graph are allowed to move only up or down on the page and when moved the elasticity of the links maintains their connections. The dotted arrows of the control graph (the cgsame pointers) are special elastic links which must never have a negative slope. A negative slope is defined as when the arrowhead is lower on the page than the tail. The time axis of this system starts at the top of the page and the positive direction is towards the bottom of the page. Nodes which are above others are performed before them when the control graph is converted into actual instructions. The
objective of this exercise is to move the control graph nodes relative to each other to see how they interact and eventually to determine the best set of micro-instructions. A micro-instruction is defined as any number of Cgnodes (not including the fork and join nodes) which lie on a horizontal line. For the control graph of Figure 5-3 it is possible to configure the graph so that up to six micro-operation are on the same horizontal line and thus in the same micro instruction. Figures 5-4 and 5-5 show two other possible configurations of this same graph.

The method which the control allocator uses to select micro-ops for the micro-instructions is based upon attraction weights. These weights are similar to the probability that two potentially parallel micro-ops will be combined. The attraction weights are calculated for every combination of micro-ops throughout the micro-op program and the pair with the highest weight are placed in the same micro-instruction first. It is then necessary to then recalculate the potential parallelisms and attraction weights since the action of combining two may restrict the range of other micro-ops in the program. When the new attraction weights are know the pair with the highest value are combined and the process is repeated until there are no more potential parallel pairs or the size of the micro-word is too wide. If the width is the limiting factor then the routine creates a new micro-instruction format and continues the process. With this procedure the micro-instructions will contain collections of micro-ops which occur in parallel the most often throughout the complete micro-op program.

5.6 Control Signal Conditioning and Micro-word Representation

When all of the micro-instructions have been defined only a few minor details remain which the control allocator must address before it is done. The first is that the controller must be defined in such a way that the two phase control signals which were assumed to be available in an earlier step will be generated. To do this the control allocator must specify whether an AND or NAND gate should be used to condition the phase 2 clock pulse for each of the evoke control points in the design. The value of the NONEVOKE\(^3\) field for the particular control point, stored in the Module Database, is the determining factor. For each control point which has a NONEVOKE value of H(high) a NAND gate is specified and for each with a L(low) value the AND gate is specified.

If during the micro-word packing step more than one micro-instruction format was specified then the control signal conditioning will also include the bit steering decoder which is placed between the micro-instruction register and the control points of the design. This device just directs the evoke control signals stored in the micro instruction to the proper devices of the data path and is itself

\(^{3}\)Refer to section 4.2 for a description of what this field represents
Figure 5.4; A modified control graph
The longest possible program, one micro-op per micro-instruction

Figure 5-5: Another modified control graph
controlled by a field of the micro-instruction.

One additional detail which must be considered is how to represent the micro-instruction in a form which may be used by the builder of the digital system. The micro-instructions are internally represented as collections of micro-ops which are just lists of device primitives. These primitives are converted by the control allocator into the actual ones, and zeros which should be placed the microprogram rom. To convert the device primitives into binary values the control allocator compares the BITVAL for a particular control line of the CTLESEQ for the current device primitive to the NONEVOKE value for the device. If these values are not the same then the value to be stored in the microprogram rom is a one, if they are the same then the value is zero. This comparison is performed for each control line of each micro-instruction and a table of these ones and zeros is printed in the same file as the data path graph and the micro-sequence table. A table similar to this is also printed which contains the programming for the micro-address rom. For this memory the values are just the addresses in the microprogram which will be branched to during select instructions.

In the binary form the microprogram is difficult for a human designer to understand, so the control allocator also provides an optional file which contains an english version of each micro-instruction. Since each instruction was generated from a micro-operation which was itself generated from a micro-sequence step they are all related and the control allocator prints the micro-sequence step for each micro-instruction so that the user may relate it back to the original program.
6. Results

During the process of building the control allocator a few features of the control allocation process became apparent. The first was that it is absolutely necessary that the data path graph be completely bound before the micro-op program generation step can begin. The path graph must also contain the hardware which makes up the controller. Unfortunately, details such as the size of the microprogram are unknown at this early stage of the design so the control allocator must estimate the requirements in order to bind the devices to physical hardware.

The second result was that the structure of the control graph allows for every potentially parallel combination of micro-operations to be described. The control allocator itself always finds any potential parallelism in a SLMS and so the control allocator using the control graph does not introduce any restriction on the speed or cost of the digital design. If it had restricted the design then one could not expect optimal designs from it.

In an effort to compare the controllers designed by the control allocator and the ones designed by humans, a path graph which nearly matches the data paths of the PDP11/40\(^4\) was generated by hand\(^5\). This path graph was then processed by the control allocator and the size of the generated microprogram was compared with the one in the human-designed PDP-11/40. A problem in the representation of digital systems was found which indicated a need for extensions in the data path graph "language". It is presented here because the lack of specific features constrained the possible designs which the control allocator was able to develop.

Thus, a third result is that there is a genuine need to be able to specify control lines in a data path graph. This is not necessary when the path graph is being generated automatically but when one wants to hand-code a path graph for an existing system, there may be some aspects of it which cannot be described. In the PDP-11/40, for example, the ALU is in some cases controlled from the micro-machine and in other cases its function is specified by decoding the instruction register. In the machine-generated data path graph the control signals are the responsibility of the controller, so one method to implement such an ALU control problem would be to leave the problem of connecting the instruction register to the control lines of the ALU up to the control allocator. This would seem to be a reasonable solution but there is no way to specify such a connection in the micro-sequence program. Even if there was such an instruction, the data path allocator would be unable to completely specify it (the data path allocator writes the micro-sequence program) since it does not even know that an ALU

\(^4\)PDP is a registered trademark of Digital Equipment Corporation

\(^5\)A diagram of this Data Path and a listing of the Micro Sequence program are included in Appendix 1
would be used to perform the desired function and that the ALL would require control signals.

The comparison of the automatically generated controller and the PDP-11/40 controller is incomplete but the preliminary results indicate that the control allocator is performing well. In this comparison an effort has been made to closely match the data paths of the human-designed PDP-11/40 but to allow the control allocator to generate its own controller design. The optimization routines which were written by Nagle were used to generate the fastest possible microprogram for the micro-sequence specified [Nagl 78].

In one experiment 36 PDP-11/40 micro-words were implemented in a micro-sequence program. These words consisted of the macro-instruction fetch and the source and destination processing sections of the microprogram. The program was added to the data path graph for the 11/40 and run through the control allocator. The resulting automatically generated microprogram was 38 words long and 62 bits wide. To relate these numbers to the human design one must note that it has a micro-word width of 56 bits but that not all of these should be included in the comparison. The PDP-11/40 micro-word fields called CLK, CB, CBA, and CD (which require a total of six bits) should not be included because they are control lines for clock signals which were not implemented in the data path description. With these bits removed the PDP-11/40 micro-word is 50 bits wide which is still a fairly close match to the 62 which the control allocator designed. Thus the size of the automatically generated controller in total bits required was about 31% larger than the human design. This comparison looks even better if one realizes that the PDP-11/40 is able to perform more than one micro-op in a single micro-instruction and so the number of operations which were packed into 38 words of the control allocator's microprogram was potentially 72 and in actuality 44 micro-operations [Nagl 80].

A user of the control allocator should be primarily interested in the resulting controller designs but consideration should also be given to the speed with which each design is generated. The control allocator was run for a few different digital designs and the execution speed of specific sections of the design process were measured. The execution times were recorded for the processes of micro-operation program and control graph generation. Figure 6-1 is a graph of the results of these runs. The four designs which were used in this test were the AM2909, MARK1, AM2910 and the PDP-11/40. The first three designs were automatically generated and the path graph for the PDP11/40 example was hand generated.
Figure 6-1: Micro-op and control graph generation times as a function of micro-ops
7. Suggested Improvements for the Automated Control Allocator

During the evolution of the control allocator some aspects of the problem have become apparent which were not investigated due to their difficulty or the lack of time to properly address them. One of the primary problems of the control allocation process which could be improved is the method by which the micro-cycle clock speed is selected. Currently the times required for each micro-op are calculated and the minimum and maximum are reported. The user then specifies a time which he feels would be the best. A better solution to this problem would be a system which provides a histogram of the micro-op execution times and the user would then select the cycle time which would cause the least amount of wasted time. Even this method may be improved upon, however. One can not assume that each micro-op will be executed with the same frequency in a digital system, since some operations such as the instruction fetches will occur more often than instructions such as halts. A system which measures the relative frequencies of each micro-operation and combines these values with the operation times should be able to determine a micro-cycle clock speed which will result in an optimal controller design.

Since the control allocator is only a tool and repeated processing of slightly modified digital designs is probable it would be helpful if the control allocator could find sections of the micro-sequence program which cause bottlenecks in the control graph. This information could be used by the data path allocator to make changes in the data path structure so that the slower sections of the micro-sequence program may be improved. Along the same lines the control allocator could identify the execution speed of macro instructions of the digital system and report these so that the slow ones could be speeded up by either modifications to the data path or instructions to some of the other control allocator routines.

A third problem is that the current control allocator is only able to generate a controller for a single process executing on a data path. In ISPS the designer is able to describe parallel independent processes and the data path allocator is able to define the hardware for such ISP descriptions. The control allocator is not able to handle this type of system and will not resolve or even recognize any hardware conflicts which may arise. To solve this problem the control allocator would have to define the additional control hardware which will do the arbitrating for it.

The current control allocator is limited by the restriction that multi-phase micro-instructions are not allowed. This type of instruction format would allow for a higher density packing of micro operation in the micro-instructions but also is much more difficult to process. An improved version of the control allocator would design multi-phase-controllers which would be able to generate multiple sequential
operations from a single micro-instruction fetch. Such a system has the advantage that the speed of the microprogram memory can be slower (and usually less expensive) without slowing down the execution speed of that program.
8. Conclusions

The ideas and results presented in this report should lead the reader to the following conclusions.

• The automated generation of microprogrammed controllers from the CMU-DA data path graph is a reality.

• The automated generation of controllers from any hardware description language is a very strong possibility since the techniques presented here may be adapted to other design automation programs.

• The control graph structure which is used to represent the potential parallelism of the data paths does not restrict the design in any way.

• The control allocator is able to generate microprograms which are similar in size to those which have been extensively hand optimized.
References

[Dasg 76] Dasgupta, S.
Parallelism in Microprogramming systems.

[DeWi 76] DeWitt, D. J.
A Machine-Independent Approach to the Production of Horizontal Microcode.

[DeWi 78] DeWitt, D. J.
The Complexity of Microprogram Optimization.
1978.

[Leiv 79] Leive, G.W. and Thomas, D.E.
The CMU Design System, Module Database-Users Guide
1979.

[Mall 78] Mallett, P. W.
Methods of Compacting Microprograms.

[McFa 78] McFarland, M.C.
The Value Trace: A Data Base for Automated Digital Design.

[Nagl 78] Nagle, A.
Automatic Design of Micro-controllers.
In Proceedings of the 11th Annual Workshop on Microprogramming. IEEE

Future Article on Control Allocation.
I. Appendix 1: The PDP-11/40 example
This is the hand generated micro-sequence table for the PDP11/40. It represents 185 of the micro-words of the human designed machine.

<table>
<thead>
<tr>
<th>MICRO</th>
<th>0PIRA11ON SI QUINCE</th>
</tr>
</thead>
<tbody>
<tr>
<td>1#1:</td>
<td></td>
</tr>
<tr>
<td>#352(PIHGIN).#77(P)1140). #00000000; #220(RIAD):#10#3,.#47(07):#247.;</td>
<td></td>
</tr>
<tr>
<td>1#133:</td>
<td></td>
</tr>
<tr>
<td>#720(RI AD).#31(TRG70),#2(UNIHUS),#25(BAREG);</td>
<td></td>
</tr>
<tr>
<td>#210(MOVF).#26(INSTR),#31(TRGX0):#115,;</td>
<td></td>
</tr>
<tr>
<td>#210(MOV).#21(1RHI).#31(TRGXR):#115.;</td>
<td></td>
</tr>
<tr>
<td>#221 (Win II ),#10#3(RGI0RGR).#53(13).#31(TREGX0);</td>
<td></td>
</tr>
<tr>
<td>#220(RI AI):#10#3,.#47(07):#247.;</td>
<td></td>
</tr>
<tr>
<td>#220(IISI):#36:#1520#153.,#422(1):#222.;</td>
<td></td>
</tr>
<tr>
<td>#234(ADD):#16:#10#3(RG10UGR).#10#3(REGL0RGR):#1000#101.#36(CONCAT):#107;</td>
<td></td>
</tr>
<tr>
<td>#220(RI AD):#10#3,.#47(07):#247.;</td>
<td></td>
</tr>
<tr>
<td>#220(IISI):#36:#1520#153.,#422(1):#222.;</td>
<td></td>
</tr>
<tr>
<td>#234(ADD):#16:#10#3(RG10UGR).#10#3(REGL0RGR):#1000#101.#36(CONCAT):#107;</td>
<td></td>
</tr>
<tr>
<td>I#4:</td>
<td></td>
</tr>
<tr>
<td>#360(STUSC).#15(DECODF):#141. #775.50.1. #5.0.0.1. #6.1.0.1. #10.2.0.1. #11.3.0.1.</td>
<td></td>
</tr>
<tr>
<td>1.12.4.0.1. #13.5.0.1. #14.6.0.1. #16.7.0.1. #20.8.0.1. #23.9.0.1. #21.10.0.1. #24.11.0.</td>
<td></td>
</tr>
<tr>
<td>#1.12.13.0.1. #14.14.0.1. #15.15.0.1. #20.20.0.1. #23.23.0.1. #31.31.0.1. #32.32.0.1.</td>
<td></td>
</tr>
<tr>
<td>I#6:</td>
<td></td>
</tr>
<tr>
<td>#220(RIAD):#10#3,.#26(INSTR):#127.;</td>
<td></td>
</tr>
<tr>
<td>#210(MOVIF).#25(HARIG),#10#3(RG10UGR).#1000#101.; #200(IISI):#36:#1520#153.,#422(1);</td>
<td></td>
</tr>
<tr>
<td>#234(SUN):#16:#10#3(RG10UGR).#1000#101.#36(CONCAT):#107;</td>
<td></td>
</tr>
<tr>
<td>#220(RI AD):#10#3,.#26(INSTR):#127.;</td>
<td></td>
</tr>
<tr>
<td>#200(IISI):#36:#1520#153.,#422(1);</td>
<td></td>
</tr>
<tr>
<td>I#10:</td>
<td></td>
</tr>
<tr>
<td>#220(RIAD):#10#3,.#36(INSTR):#127.;</td>
<td></td>
</tr>
<tr>
<td>#200(IISI):#36:#1520#153.,#422(1);</td>
<td></td>
</tr>
<tr>
<td>#234(SUN):#16:#10#3(RG10UGR).#1000#101.#36(CONCAT):#107;</td>
<td></td>
</tr>
<tr>
<td>#220(RI AD):#10#3,.#36(INSTR):#127.;</td>
<td></td>
</tr>
<tr>
<td>#200(IISI):#36:#1520#153.,#422(1);</td>
<td></td>
</tr>
<tr>
<td>I#12:</td>
<td></td>
</tr>
<tr>
<td>#220(RIAD):#10#3,.#36(INSTR):#127.;</td>
<td></td>
</tr>
<tr>
<td>#210(MOVIF).#25(HARIG),#10#3(UIG0RGR).#1000#101.;</td>
<td></td>
</tr>
<tr>
<td>#305(JOIN):#15;</td>
<td></td>
</tr>
<tr>
<td>250</td>
<td></td>
</tr>
</tbody>
</table>
46

L1#16: #220 (RFAD) : #103 , #47 (07) : #247 , ;
#200 (11 SI) : #36 : #1520#153. , #422 (2) : #222. ,
#243 (add) : #16 : #110. , #23 (DREG) ; #243 (REGI0REGR) : #1009#101 , #36 (CONCA1) : #107;
#221 (WRIU) . #1093 (REGI0REGR) . #47 (07) . #23 (DREG) ;
#220 (RIAD) . #31 (REGOGR) . #2 (UNIUS) , #25 (BAREG) ;
#210 (MOVJ) . #21 (HRIG) , #31 (I WIG) ; #115 ;
#220 (RIAI) : #1083. , #26 (INSIN) : #127. ;
#243 (AILI)) : #16 : #110. , #2G (BARIG) . #103 (REGLORIGR) : #1009#101 , #21 (BREG) : #1448#145;
#36 (JOIN) : #15 ; 1247

L1#14: #220 (I IAD) . #31 (I IIGO) , #2 (UNIUS) , #25 (BAREG) ;
#210 (MOVJ) , #21 (BRIG) , #31 (MMGO) ; #115 ;
#221 (write) , #1093 (RegIOregr) . #51 (11 source) , #31 (TREGO) ;
#220 (RTAI) : #1083. , #51 (11source) : #251. ;
#210 (MOVJ) . #25 (BARIG) . #183 (RegLORFLR) : #1009#101 ;
#36 (JOIN) ; #27; 1260

I #15: #220 (I IAD) . #31 (I IIGO) , #2 (UNIUS) , #25 (BAREG) ;
#210 (MOVJ) , #21 (BRIG) , #31 (IRILGO) ; #115 ;
#220 (WRITE) . #25 (BARIG) . #183 (RegLORFLR) : #1009#101 ;
#210 (MOVJ) , #21 (BRIG) , #31 (IRILGO) ; #115 ;
#220 (WRITE) . #25 (BARIG) . #183 (RegLORFLR) : #1009#101 ;
#36 (JOIN) ; #27; 1260

L1#16: #360 (selecl) ; #15 (decode) : #187 ; #265.16.1. #62.0.0.1. #61.1.0.1. #65.2.0.1. #31.3, 0, 1. #20.4.0.1. #23.5.0.1. #24.7.0.1. #2 5.8.0.1. #70.9.0.1. #101,10.0.1. #102.11.0.1.
#104.12.0.1. #105.13.0.1. #106.14.0.1. #17.15.0.1.
#17: #210 (MOVJ) , #23 (I IUG) , #21 (BRIG) : #1448#151; ,
#221 (write) , #1093 (RegIOregr) . #51 (11 source) , #23 (DREG) : #116; 
#200 (USR) : #15 , #26 (INSTR) : #130 , #416 (BUI36) : #216;,
#360 (SI ICH) , #15 (11COD1) : #157 ; #755. 15. 1 . #62.0.0.1 . #61.1.0.1 . #65.2.0.1 . #31.3, 0, 1.
#20.4.0.1. #23.5.0.1. #24.7.0.1. #2 5.8.0.1. #70.9.0.1. #101,10.0.1. #102.11.0.1.
#104.12.0.1. #105.13.0.1. #106.14.0.1.

I #31: #220 (I IAD) : #183. , #26 (I ISTR) : #260. ;
#210 (MOVJ) , #25 (BARIG) , #183 (RegG1REL) : #1009#101 ;
#365 (JOIN) ; #30 ; 1260

L1#21: #220 (RFAD) : #183. , #26 (I ISTR) : #260. ;
#200 (I IAD) : #1083. , #263 (RLIGORLLR) : #1009#101 ;
#220 (RIAD) : #1083. , #76 (INSIN) : #260. ;
#200 (11 SI) : #36 : #1I2I8103. , #423 (10R2) : #223. ;
#243 (AILI)) : #16 : #110. , #23 (BRIG) , #183 (RegG1REGR) : #1009#101 , #36 (CONCA1) : #107;
#365 (JOIN) ; #22 ; 1260

I #31: #220 (I IAD) : #183. , #26 (I ISTR) : #260. ;
#210 (MOVJ) , #25 (BARIG) , #183 (RegG1REL) : #1009#101 ;
#365 (JOIN) ; #30 ; 1260

L1#21: #220 (RFAD) : #183. , #26 (I ISTR) : #260. ;
#200 (I IAD) : #1083. , #263 (RLIGORLLR) : #1009#101 ;
#220 (RIAD) : #1083. , #76 (INSIN) : #260. ;
#200 (11 SI) : #36 : #1I2I8103. , #423 (10R2) : #223. ;
#243 (AILI)) : #16 : #110. , #23 (BRIG) , #183 (RegG1REGR) : #1009#101 , #36 (CONCA1) : #107;
#365 (JOIN) ; #27 ; 1264

I #24: #220 (RIAD) : #103. , #2G (INSIN) : #260. ;
#200 (II SI) : #35 : #1520#153. , #422 (2) : #222. ;
#244 (SUB) : #16 : #104#29 (BARIG) . #103 (RIGI0UIR) : #1009#101 , #36 (CONCA1) : #107;
#220 (II AD) : #1003. , #2G (INSIN) : #260. ;
#200 (11 SI) : #36 : #1520#153. , #422 (2) : #222. ;
#244 (SUB) : #16 : #104#29 (BARIG) . #103 (RIGI0UIR) : #1009#101 , #36 (CONCA1) : #107;
#221 (WRIII) . #103 (RIGI0UIR) , #26 (INSIN) : #260. , #23 (DRI G) : #116;
#365 (JOIN) ; #27 ; 1264

I #25: #220 (RIAD) : #103 . #47 (07) : #247. ;
L#26: #210 (MOVC). #25 (BAREG), #10 #3 (REGLOREGR): #1000 #101;
#220 (RFAD): #10 #3, #47 (O7): #247.
#200 (HS1): #36: #1520 #153.. #422 (2): #222.;
#243 (ADD): #16: #110, #23 (DRIG), #10 #3 (REGLOUIGR): #1000 #101; #36 (concat): #107;
#221 (WR1U): #10 #3 (REGLOREGR). #47 (O7PC). #23 (DRG): #116;
#220 (RrAD): #31 (IRLGO). #25 (BAREG);
#210 (MOVI), #21 (BREG), #31 (IRLGO): #115.;
^220 (IUAD): ^1»/Sf3 . .
#26 (INSTR): #260 , ;
#360 (select).,#15 (decode): #157.; #745.15.1. #33.0.0.1. #35.1.0.1. #40.2.0.1. #41.3.0.
1. #43. 4. 0. t. #44. 5. 0. 1. #45. 6. 0. 1. #46. 7. 0. 1. #50. 8. 0. 1. #51. 9. 0. 1. #53.10.0.1. #55,11,0,
1. #56.12.0.3, #34, 13, 0.1. #32,14, 0;
L#32: #210 (MOVE): #16. #23 (DREG). #21 (BREG): #1440 #151.;
#221 (WR1I): #10 #3 (IRIG0 I RIG). #52 (12dest): #252; #23 (DREG): #116;
#200 (UST): #15. #26 (INSTR): #130. #413 (BUT33): #213;
#360 (select). #15 (decode): #157.; #745.15.1. #33.0.0.1. #35.1.0.1. #40.2.0.1. #41.3.0,
1.143.4. 0. t. #44. 5. 0. 1. #45. 6. 0. 1. #46. 7. 0. 1. #50. 8. 0. 1. #51. 9. 0. 1. #53.10.0.1. #55,11,0,
1. #56.12.0.3, #34, 13, 0.1. #32,14, 0;
L*52: #210(MOVE),#21(BREG),#23(DREG):116.; !SPS=1
#210(MOV),#21(BREG),#23(DREG):116.; !SPS=3
#210(move):16.23(dreg),#21(breg):1508#147.; #365(join):#37; 1375
L*53: #210(move):16.23(breg):1448#145.;
#710(ino),#70?&3(lGdreg),#16(alu):134.16(a1u):110;
#210(move),#30(pscary),#208.23(lbdreg):1208#117.
#210(MOVI),#21(BUG),#208.23(lBIRG):1208#117.;
#210(move):16.23(dreg),#21(breg):1508#151.;
L*54: 212(noop); !set the condition codes sps=2
#365(join):#36; 1367
#210(move),#30(pscary),#208.23(lGdreg):1208#117;
#210(MOVI),#21(MU G),#208.23(lQI):1208#117;
#210(move):16.23(dreg),#21(breg):1508#147.;
#365(join):#37; 1277
L*56: #210(10),#21(dreg); 1; !this should be a sign extend
L*57: #210(movo),#30(pscary),#208.23(dreg):116.; Jailer the condition codes sps=3
#211(10WR),#16(#3IMGI ORLGN),#26(INSTR):260,#23(DREG):116;
L*60: #365(join):*1; 1000 this should be a but 27
L*61: #220(IUAI));#16*3. S !1(11 source):251.;
#210(move),#21(breg),#18*3(reg18breg):121.;
#220(10RAD):#10*3. ,#26(JNSIF):260.;
#24(f SUH):#16.110.23(DREG),#18*3(REG18RReg):1008#101.21(BREG):1448#145;
#365(join):#37; 1360
L*62: #220(II AD):#18*3. ,#26(INSTR):260.;
#210(move),#21(breg),#18*3(reg18breg):121.;
#220(HI IF):#18*3. ,#26(11 source):251.;
#235(AND):#16.110.23(DREG),#18*3(REG18RReg):1008#101.21(BREG):1448#145;
L*151: 200(lSI):15. ,#26(INSTR):130,#411(BUI 31):#211;
#360(select),#15(decode):262. .753.3.1.,#57.0.0.1.,631.0.1.,642.0;
L*63: 221(write),#16(1reg),#26(1nstr):260,#23(dreg):116;
#365(join):#37; 1000 this should be a but 27
L*64: 212(noop); !alter the condition codes sps=3
I*65: #365(join):#37; 1000 this should be a but 27
I*66: #220(IIAD):#18*3. ,#26(INSTR):260.;
#210(move),#21(breg),#18*3(dreg):116.;
#221(WRIIR),#16*3(RIGI RReg);#52(12DES1),#23(dreg):116;
#200(INS1);#15. ,#26(INSTR):130,#400(BUT20):#200;
#360(select),#15(decode):262. .753.4.1.,#1000.,1313.1,0.1.,134.2,0,1.,673,0;
L*67: #210(move),#21(breg),#23(dreg):116.;
#220(IIAD):#18*3. ,#26(INSTR):260.;
#210(move),#21(breg),#23(dreg):116.;
#221(WRIIR),#16*3(RIGI RReg);#52(12DES1),#23(dreg):116;
#200(INS1);#15. ,#26(INSTR):130,#400(BUT20):#200;
#360(select),#15(decode):262. .753.4.1.,#1000.,1313.1,0.1.,134.2,0,1.;
L*70: #220(RIAD):#18*3. ,#26(INSTR):260.;
#210(MOVI),#25(BAUIG),#18*3(REGI RReg);#1008#101.;
L*71: 221(WR),#18*3(R:1G1UGR),#26(INSTR):260,#23(DREG):116;
L*72: #200(II ST):#15. ,#26(INST IN):#130,#402(BUI 22):#202;
#360(select),#15(decode):262. .753.4.1.,#72.0.0.1.,753.1.0.1.,762.0.1.1.,1003.0;
L#72: #220(RIAD);#18*3. ,#51(l1 source):251.;
#210(move),#21(breg),#16*3(dreg);#11*3(reg18breg):103.;
#212(noop); !alter the condition codes sps=3
#365(join):#37;
L*75: #220(111 AI));#193.; #26(INSI IN):#127.;
#200(MSI);#36.1528*ir3. ,#436(00R2):#236.;
#243(Al));#16.110.23(DREG),#18*3(REG18UGR);#1008#101.;#36(CONCA1);#107;
#212(noop); !alter the condition codes sps=3
#365(join):#37;
L*76: #270(RIAD);#18*3. ,#51(l1 source):251.;
#210(move),#21(breg),#16*3(dreg);#11*3(reg18breg):121.;
L*77: #210(move),#16*3(dreg);#21(breg):1508#147.;
#212(noop); !alter the condition codes sps=3
#365(join):#37;
*21O(move),*21(breg),*10*3(regl0regr):#121,;
#365(join),"77;

L101: *22O(READ):*10*3,"26(INSTR):*260,;
#210(MOVI),*25(BARIG),*10*3(REGL0REL):*10001101,;
#243(AOD):*16:10*3(DRLG),*10*3(RG10RIGR):*10001101,36(CONCAT):*107^;
#243(add):*16:26(INSTR):*260,;
#243(Test):*36:1520*153.,*422(2):222,;
\#243(ADD):*16:110*3(DRLG),*10*3(RG10RIGR):*10001101,36(CONCAT):*107;
#221(WRM):*10*3(III G1 ORG1):*26:INS1R:*260,29(BREG):*116,;
L103: *210(MOVI),*31(I FI G3),*2(UNIBUS),25(BARIG);
#210(MOVI),*16:26(INSTR):*260,;
#220(MOVI),*16:26(INSTR):*260,;
#221(WRITI),*10*3(RG10RIGR),*52(12DRES),*31(REG10RIGR);
#220(READ):*10*3,;"26(INSTR):*260,;
#221(WRITI),*10*3(RG10RIGR),*52(12DRES),*31(REG10RIGR);
L105: *220(RlAd):*10*3,*26(InSIR):*260,;
#200(HSF):*36:1520*153.,*423(102) :223,;*
\#244(SUH):*16:10*3(DRLG),*10*3(REGL0RIGR):*10001101,36(CONCAT):*107;
#220(RFST):*10*3,;26(INSTR):*260,;
#221(WRITI),*10*3(III G1 ORG1):*26:INS1R:*260,23(DRrG):*116,;
#35(join),"103,;"74,;

L104: *220(RIAO):*10*3,*26(InSIR):*260,;
#200(HSF):*36:1520*153.,*423(102) :223,;*
\#244(SUH):*16:10*3(DRLG),*10*3(REGL0RIGR):*10001101,36(CONCAT):*107;
#220(RFST):*10*3,;26(INSTR):*260,;
#221(WRITI),*10*3(III G1 ORG1):*26:INS1R:*260,23(DRrG):*116,;
#35(join),"103,;"71,;"221(WIIL I ),*10*3(III G1 ORG1),*31(REG10RIGR);*10001101,;
#365(join),"103,;"74,;

L105: *220(RIAd):*10*3,*26(InSIR):*260,;
#200(HSF):*36:1520*153.,*423(102) :223,;*
\#244(SUH):*16:10*3(DRLG),*10*3(REGL0RIGR):*10001101,36(CONCAT):*107;
#220(RFST):*10*3,;26(INSTR):*260,;
#221(WRITI),*10*3(III G1 ORG1):*26:INS1R:*260,23(DRrG):*116,;
#35(join),"103,;"71,;"221(WIIL I ),*10*3(III G1 ORG1),*31(REG10RIGR);*10001101,;
#365(join),"103,;"74,;

L106: *220(RIAd):*10*3,*26(InSIR):*260,;
#200(HSF):*36:1520*153.,*423(102) :223,;*
\#244(SUH):*16:10*3(DRLG),*10*3(REGL0RIGR):*10001101,36(CONCAT):*107;
#220(RFST):*10*3,;26(INSTR):*260,;
#221(WRITI),*10*3(III G1 ORG1):*26:INS1R:*260,23(DRrG):*116,;
#35(join),"103,;"71,;"221(WIIL I ),*10*3(III G1 ORG1),*31(REG10RIGR);*10001101,;
#365(join),"103,;"74,;

L110: *220(IUAI)):*10*3,*26(INSTR):*260,;
#243(ADD):*16:10*3(III G1 ORG1),*25(BA111G):*10*3(RFGE0IUGR):*10001101,21(BREG):*1440145,;
#200(TEST):*15,.*26(INSTR):*130,401(BUT21):*201,;"263,.;#360(select),"15(decode):"2630,.;#375.2.1.;#110,0,0,1.;#1111,1,0;
L111: *220(IUAI)):*10*3,*26(INSTR):*260,;
#243(ADD):*16:10*3(III G1 ORG1),*25(BA111G):*10*3(RFGE0IUGR):*10001101,21(BREG):*1440145,;
#200(TEST):*15,.*26(INSTR):*130,401(BUT21):*201,;"263,.;#360(select),"15(decode):"2630,.;#375.2.1.;#110,0,0,1.;#1111,1,0;
L111: *220(IUAI)):*10*3,*26(INSTR):*260,;
#243(ADD):*16:10*3(III G1 ORG1),*25(BA111G):*10*3(RFGE0IUGR):*10001101,21(BREG):*1440145,;
#200(TEST):*15,.*26(INSTR):*130,401(BUT21):*201,;"263,.;#360(select),"15(decode):"2630,.;#375.2.1.;#110,0,0,1.;#1111,1,0;
L111: *220(IUAI)):*10*3,*26(INSTR):*260,;
#243(ADD):*16:10*3(III G1 ORG1),*25(BA111G):*10*3(RFGE0IUGR):*10001101,21(BREG):*1440145,;
#200(TEST):*15,.*26(INSTR):*130,401(BUT21):*201,;"263,.;#360(select),"15(decode):"2630,.;#375.2.1.;#110,0,0,1.;#1111,1,0;
L111: *220(IUAI)):*10*3,*26(INSTR):*260,;
#243(ADD):*16:10*3(III G1 ORG1),*25(BA111G):*10*3(RFGE0IUGR):*10001101,21(BREG):*1440145,;
#200(TEST):*15,.*26(INSTR):*130,401(BUT21):*201,;"263,.;#360(select),"15(decode):"2630,.;#375.2.1.;#110,0,0,1.;#1111,1,0;