# VLBA Correlator Memo No. 87

VLBA Correlator ASIC Chip Specification

by Ray Escoffier

July 1, 1987

The attached specification is the first attempt at a formal specification for the FFT butterfly/cross multiplier ASIC (Application Specific Integrated Circuit) that will be the heart of the VLBA FX correlator. This specification will grow in detail and specificity as we progress through the project but for now it is only a start. Many things that should be in an integrated circuit specification are missing. For instance, the chip pinouts are not given in this document since these will not be known until the actual design is complete. The purpose for starting the specification now is to disseminate the information early to the VLBA community and to give us a document to send to ASIC manufacturers in searching for potential vendors.

This ASIC specification was strongly influenced by a product offered by LSI Logic Corp. This product is the LSA-1502, a 1.5 micron technology structured array with an onboard 1K by 36-bit RAM memory and with 30,000 uncommitted gates available (estimated 10,000 usable). This chip will easily meet our 32-Mhz clock requirement and has a large enough RAM to support an FFT signal resolution of 7,7,4 and a 15,15,6 short term accumulator. We have some hopes of finding other candidates for this application to introduce some competition in the bid cycle but as of now this is the only chip we know of that will meet our requirements. LSI Logic has given NRAO a budgetary quote for the LSA-1502 of \$95,000 non-recurring engineering charge and \$67.00 per chip in the quantities needed for the VLBA correlator (about 3000 chips are required for the VLBA correlator). If NRAO can find other persons interested in this chip, the resulting higher quantity may reduce the cost to NRAO.

# NATIONAL RADIO ASTRONOMY OBSERVATORY CHARLOTTESVILLE, VIRGINIA

VERY LONG BASELINE ARRAY PROJECT

Specification No. A13400N1 Date: July 1, 1987

Name: BUTTERFLY ASIC

Prepared by: \_

Approved by:

#### Ι. INTRODUCTION

This specification defines the operational characteristics of the FFT butterfly ASIC to be used in the NRAO VLBA correlator. This chip is to have several digital signal processing functions, the principal applications being in performing radix 4 and radix 2 FFT butterfly operations and in performing floating point complex multiply-accumulations. In both of these applications a fast onboard RAM will be required with a minimum depth of 1K and a minimum width of 32-bits. In addition, the RAM must come in two independently addressable 1K X 16 (minimum) halves. The intended clock rate for this ASIC will be 32 MHz.

#### II. DIGITAL SIGNAL REPRESENTATION

The signal representations described in this specification are generally expressed in complex floating point form. The exact number of bits of precision used in the digital signal processing to be done using this ASIC is not stated in this specification since cost/performance tradeoffs will not be known until more study of available products is done. Instead, tables of acceptable precision are given with preferred precision levels indicated where applicable.

The complex floating point signal representation to be used is a non-standard form in which the real and imaginary components of a number have a common exponent bit field. Such complex numbers are expressed in a short hand fashion, an example of which is 6,6,4 indicating 6-bit real and 6-bit imaginary mantissa fields and a common 4-bit exponent field. Four basic precisions are required in the various applications of this ASIC:

1) Data points being Fourier transformed;

| acceptable precision | comment                 |
|----------------------|-------------------------|
| 5,5,4                | minimum acceptable      |
| 6,6,4<br>7,7,4       | preferred               |
| 8,8,4                | better but not required |

2) Sin, Cos twiddle factors used in FFT butterfly;

| acceptable     | precision | comment           | t |     |          |
|----------------|-----------|-------------------|---|-----|----------|
| 5,5,4<br>6,6,4 |           | accepta<br>better |   | not | required |

3) Points being multiply/accumulated; (one of the two complex data points to be multiplied together in the multiply/accumulate function will use the FFT twiddle factor input port into the chip and hence this precision is tied to 2), above, except that fewer bits than the twiddle factor port may be used at NRAO's discretion.)

acceptable precision comment 4,4,4 5,5,4 6,6,4

4) Accumulator precision;

acceptable precisioncomment13,13,6requires lk X 32 RAM14,14,6requires lk X 34 RAM15,15,6preferred, requires lk X 36 RAM16,16,6requires lk X 38 RAM

# III. FUNCTIONAL DESCRIPTION

The ASIC chip to be used in the VLBA correlator is a multifunction chip that can be used in the following applications:

Radix 4 FFT butterfly;

The ASIC chip must perform a standard radix 4 DIT FFT butterfly on four digitally sampled complex data points clocked serially into the chip in four consecutive clock cycles. Two input ports exist into the ASIC, one for the points being transformed and one for FFT twiddle factors. One point and its corresponding twiddle factor enter the chip simultaneously, i.e. on the same clock edge. The ASIC must be fully pipelined so that one complex data point enters and one complex output data point exits the chip every 32 MHz clock cycle. In order to avoid offset biases that would result from using two's complement arithmetic, the points into or out of the chip are expressed in signmagnitude floating point and the internal fixed point adders will work in one's complement arithmetic. RAM storage for the data points (not the twiddle factors), inserted between the chip input port and the butterfly input circuitry, will be required in some butterfly stages. The RAM addresses for some FFT butterfly stages will be generated in external address generators, while for other butterfly stages address generation will be provided on chip. A RAM configuration of two independently addressable 1K X 16 banks (minimum) is required to allow double buffering of the points to be transformed.

#### Radix 2 FFT butterfly;

The ASIC chip must also perform a standard radix 2 DIT FFT butterfly on two digitally sampled complex data points clocked serially into the chip in two consecutive clock cycles. Most of the details defined above for the radix 4 function are true for this application except that the internal RAM must be logically placed between the butterfly circuitry output and the chip output pins so that the FFT results may be read out in any spectral order.

# Radix 2 FFT butterfly (for a 2048 point FFT);

The ASIC chip must do a radix 2 butterfly as above except that two inputs ports must exist on the chip for the data points so that a point shuffle between two FFT chains made up of these ASIC chips can occur at a final radix 2 butterfly stage. If a 2K deep RAM is used, this requirement is unnecessary. Straight through function;

The ASIC chip will allow all points to flow straight through the chip unaltered except for a possible rearrangement in time sequence.

Complex multiply/accumulator;

The ASIC chip will input two floating point complex numbers, one on the butterfly data point input port and one through the twiddle factor input port, perform a complex multiplication between the two and add the complex result into a complex floating point accumulation obtained from the RAM. The accumulation result is stored in the RAM across the entire width of the RAM available (32-bits for a two bank 1K X 16 RAM). Two operating modes are required in the multiply/accumulator application. In one mode, points to be multiplied will enter the chip in pairs, one pair every second 32-MHz clock. On average, the RAM must be read, a multiplication done, the result added to the accumulator and the sum stored back in RAM in the two available clock ticks. The other mode requires that point pairs be multiply/accumulated one pair every clock cycle. In the latter mode, at least two consecutive (or two closely spaced in time) point pairs entering the chip will be added into the same accumulation result between reading the accumulator partial sum and writing that sum back into the RAM. Thus, on average, the RAM must be read, two multiplications made, both results added via the accumulator to the same accumulation sum, and the new accumulation stored back in RAM each two clock ticks in this mode. In both of these modes, the RAM access requirement is one RAM read or one RAM write per clock cycle.

Number controlled oscillator;

The ASIC chip will have an 8-bit slice of a number controlled oscillator (NCO) on board for external applications. Two secondary storage registers for loading an initial oscillator phase and oscillator rate will be required. The secondary storage registers will have serial I/O. The NCO adder carry in and carry out lines will be pipelined so that any number of the bit slices may be tied together to make larger NCOs.

#### IV. Block Diagram

A block diagram of the ASIC chip is seen in NRAO drawing D13400K001. The digital precisions used in this design are;

7,7,4 for the FFT points 5,5,4 for the twiddle factors 4,4,4 for the cross multiplier input points 15,15,6 for the accumulator which resulted in RAM requirements of two 1K X 18 memory banks.

A careful gate count was made for this design again using generic gate array standards. The result of this gate estimate is given below. This count represents active gates and not the required gate array size since some gate inefficiency is expected.

| FUNCTION                              | NUMBER   | GATES/FUNCT | TOTAL |
|---------------------------------------|----------|-------------|-------|
| ************************************* |          |             |       |
| Flip flops                            | 564      | 5           | 2820  |
| Flip flops (with reset)               | 72       | 6           | 432   |
| Full adders (w look ahead carry)      | 112-bits | 16          | 1792  |
| Sign/mag multipliers (5 by 7-bit)     | ) 4      | 315*        | 1260  |
| Shifters                              | 6        | 188*        | 1128  |
| Two into one multiplexers             | 192-bits | 2.2         | 423   |
| Four into one multiplexers            | 72-bits  | 7           | 504   |
| Three state drivers                   | 36-bits  | 8           | 288   |
| RAM address generators                | 2        | 150         | 300   |
| Control logic                         | -        | -           | 500   |
| RAM                                   | 1K X 36  | -           |       |
|                                       |          | TOTAL       | 9447  |

\* see appendix II

An estimate was also made of the pins required for this chip and is given below. No specific pin out assignments are required.

| FUNCTION             | -       | NUMBER          |
|----------------------|---------|-----------------|
| ***************      |         |                 |
| Input point pins     |         | 18              |
| Second input point p | ins     | 18              |
| Twiddle factor input | pins    | 14              |
| Output pins          | -       | 18              |
| RAM address pins     |         | 10              |
| NCO generator pins   |         | 14              |
| Control pins         |         | 8               |
| Power pins           |         | 10              |
|                      |         |                 |
|                      | TOTAL 1 | 10 package pins |

# V. GENERAL SPECIFICATIONS

#### Package

The ASIC chip will be packaged in an industry standard IC package. Either pin grid array or chip carrier packages are acceptable, a plastic package may be used if economical.

Environmental conditions

The ASIC chip will meet all operational specifications under the following conditions:

1. Temperature: 0 to 70 C

2. Operating voltage: 4.5 to 5.5 VDC

The ASIC chip will not be permanently damaged under the following conditions:

1. Storage for 24 hours at -55 to +125 C

- 2. Thermal shock of 5 cycles 60 seconds at 0 C, 60 seconds at 100 C with a 10 second transfer time.
- 3. Voltage spikes to 7 volts (30 second transient).
- 4. Vibration, 20 Hz to 2 khz at 20 G for 60 seconds. Shock, 1000 G shock in any axis.

Logic level specifications

All chip input and output functions will operate at standard HCMOS logic levels.

Active characteristics

The chip must work properly with a clock rate from 1 to 32 MHz.

#### Appendix I

# A1.0 BUTTERFLY ASIC BLOCK DIAGRAM

This appendix will describe the NRAO block diagram of drawing D13400K001. The block diagram breaks down into six general sections. The two fundamental operating modes of the device are referred to as FFT mode and Multiplier/ Accumulator (MAC) mode. FFT mode includes Radix 4, Radix 2, bypass and Fractional Sample time Error (FSE) modes.

| SECTION | FUNCTION                                                                                                    |  |  |  |  |  |  |  |  |
|---------|-------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|--|--|
| A1.1    | 'INPUT / OUTPUT                                                                                             |  |  |  |  |  |  |  |  |
| A1.2    | COMPLEX MULTIPLY                                                                                            |  |  |  |  |  |  |  |  |
| A1.3    | a) FLOATING POINT TO FIXED POINT for<br>FFT mode or<br>b) FLOATING POINT TO COMMON EXPONENT for<br>MAC mode |  |  |  |  |  |  |  |  |
| A1.4    | a) FFT BUTTERFLY for FFT mode or<br>b) ACCUMULATOR for MAC mode                                             |  |  |  |  |  |  |  |  |
|         |                                                                                                             |  |  |  |  |  |  |  |  |

- A1.5 FIXED POINT TO FLOATING POINT CONVERSION
- A1.6 NUMBER CONTROLLED OSCILLATOR

Sections 1, 2, 3 and 4 are involved in both FFT mode and MAC mode. Section 5 is involved only in FFT mode. Section 6 is totally independent of the rest of the device (except for the 32 MHz clock).

GENERAL COMMENTS:

Multiplexers have all been assumed to be inverting types for maximum speed and minimum gate count. The AND gates that would normally be found in the multipliers have been replaced with NOR gates for the same reasons. The inverting outputs from flip flops have been used where required to correct the polarity of the data.

In order to limit the outputs of the first and second adders in section Al.4 to 16-bit widths, the values going into the adders are "divided by two" and sign extended to 16-bits. (This is accomplished by wiring manipulations, not logic.)

#### A1.1 INPUT / OUTPUT:

This section provides the data interface for the device, along with RAM storage. Multiplexers are provided so that the two blocks of RAM may be placed in either the input data stream for Radix 4 operations, in the output data stream for Radix 2 and FSE operations, or in the middle of the accumulator logic for the MAC application. The address generators allow the double buffered points to be written in one sequence to one RAM block while previous data is read from the second RAM block using a separate sequence. In MAC mode, an external address generator controls the read/write sequences, and the memory is addressed as a 1K X 36 bit RAM.

# A1.2 COMPLEX MULTIPLY:

This section provides a complex multiply of two floating point numbers:

 $(R1 + jI1)(2^{exp1}) * (R2 + jI2)(2^{exp2}) = (RE + jIM)(2^{EX})$ 

where

RE= [(R1 \* R2) - (I1 \* I2)]IM= [(I1 \* R2) + (R1 \* I2)] and EX= (exp1 + exp2)

R1 + jIl has 7,7,4 resolution and R2 + jI2 has 5,5,4 resolution. The mantissas are in sign/magnitude form. After the multiplications, the sign of the Il\*I2 product is inverted (to allow the upper adder to perform a subtraction) and then the RE and IM values are converted from sign/magnitude form to one's complement just prior to the addition function. The addition is one's complement, where both sets of eleven input bits are sign extended to 12 bits, producing a 12-bit output.

The two exponents both represent negative numbers where the sign bit is implied. The range of possible values is from zero to minus 12. After the unsigned addition of the two exponents, the resulting 5-bit number can be thought of as being in sign/magnitude form with a range of zero to minus 24 with the implied sign bit = 1, but for purposes of the next section, it can also be thought of as a one's complement number that has been negated which has an implied sign bit = 0. For example:

<u>S 16 8 4 2 1</u> (implied sign bit plus 5-bit result from adder)

| 1 | 00101 | represents the value -5 (sign/magnitude) |
|---|-------|------------------------------------------|
| 1 | 11010 | converted to one's complement            |
| 0 | 00101 | negated for use in subtraction           |

A1.3 FLOATING POINT TO FIXED POINT (OR COMMON EXPONENT)

This section consists of four shifters. Each has 12 to 15 input and 15 output bits. The possible shifts are between 0 and 15 bits. The direction of shift is towards the LS bit, with sign extension. The two inner shifters are involved in FFT and MAC modes. The two outer shifters are involved only in MAC mode.

In FFT mode, the inner shifters need to shift an amount equal to the exponent. In the example above, the other input to the adder in this stage would be forced to zero, and the result would be a five place shift. The rest of the exponent logic in this section is not active.

In MAC mode, the current result from the complex multiplier needs to be added to the accumulated total, which may be accessed from RAM, or may be in an accumulator loop where two or more accumulations may be done before the result is written back to RAM. In either case, the exponent of the multiplier must be compared to the exponent of the accumulated value to determine which set of shifters (inner or outer) must be shifted.

The adder is used to subtract as follows:

(exp of accum result) - (exp of cmplx mult) = result

The magnitude of the result determines the number of places to shift. The result of the one's complement subtraction is gated to produce a 4-bit magnitude plus a sign bit, where any difference of greater than 15 is limited to 15 and the sign bit determines which set of shifters will shift. (A shift of more than 15 places would shift below the LSB, so a shift of 15 is sufficient for any larger shift that might be required.)

The final effect is that the mantissas associated with the smaller (more negative) exponent will be shifted until the exponent equals the larger one, which is selected as the common exponent.

The remaining logic relating to the exponent calculation is used to add one to the exponent if a carry is generated in the next stage, where the actual accumulation occurs.

# A1.4 FFT BUTTERFLY (OR ACCUMULATOR)

In FFT mode, this section can calculate either a Radix 4 or a Radix 2 butterfly. The calculations required for these operations are indicated on the block diagram. The input points such as (A + jW) represent the result of a complex multiply of a, data point and a twiddle factor.

The sequence of events for calculating the real and

imaginary values will be described for a Radix 4'operation.

As the points become available at the output of the shifters, they are clocked into two parallel paths in such a manner as to allow the first adder to produce the required sums and differences and clock the results into the four following parallel paths. This makes the results available for the second adder where they are multiplexed and summed to form the final results.

The butterfly timing from the output of the shifters to the output register following the second adders is shown in the following chart:

|    | CLOCK CYCLE |       |           |           |       |       |         |       |       |       |  |  |  |
|----|-------------|-------|-----------|-----------|-------|-------|---------|-------|-------|-------|--|--|--|
|    | 1           | 2     | 3         | 4         | 5     | 6     | 7       | 8     | 9     | 10    |  |  |  |
|    |             |       |           |           |       |       | • • • • |       |       |       |  |  |  |
| 1  | A1          | C1    | <b>B1</b> | D1        | A2    | C2    | B2      | D2    | A3    | C3    |  |  |  |
| 2  | BO          | Al    | A1        | <b>B1</b> | B1    | A2    | A2      | B2    | B2    | A3    |  |  |  |
| 3  | BO          | BO    | Al        | Al        | B1    | B1    | A2      | A2    | B2    | B2    |  |  |  |
| 4  | DO          | DO    | C1        | C1        | D1    | D1    | C2      | C2    | D2    | D2    |  |  |  |
| 5  | A0+C0       | A0+C0 | A0+C0     | A1+C1     | A1+C1 | A1+C1 | A1+C1   | A2+C2 | A2+C2 | A2+C2 |  |  |  |
| 6  | A0-C0       | A0-C0 | A0-C0     | A0-C0     | A1-C1 | A1-C1 | A1-C1   | A1-C1 | A2-C2 | A2-C2 |  |  |  |
| 7  |             | B0+D0 | B0+D0     | B0+D0     | B0+D0 | B1+D1 | B1+D1   | B1+D1 | B1+D1 | B2+D2 |  |  |  |
| 8  |             |       | B0-D0     | B0-D0     | B0-D0 | B0-D0 | B1-D1   | B1-D1 | B1-D1 | B1-D1 |  |  |  |
| 9  | A0+C0       | A0+C0 | A0-C0     | A0-C0     | A1+C1 | A1+C1 | A1-C1   | A1-C1 | A2+C2 | A2+C2 |  |  |  |
| 10 |             | A0+C0 | A0+C0     | A0-C0     | A0-C0 | Al+Cl | A1+C1   | A1-C1 | A1-C1 | A2+C2 |  |  |  |
| 11 |             | B0+D0 | B0+D0     | X0-Z0     | X0-Z0 | B1+D1 | B1+D1   | X1-21 | X1-Z1 | B2+D2 |  |  |  |
| 12 |             |       | A0+C0     | A0+C0     | A0-C0 | A0-C0 | Al+C1   | A1+C1 | A1-C1 | A1-C1 |  |  |  |
|    |             |       | +         | -         | +     | -     | +       | -     | +     | -     |  |  |  |
|    |             |       | B0+D0     | B0+D0     | X0-20 | X0-20 | B1+D1   | B1+D1 | X1-Z1 | X1-Z1 |  |  |  |

| la  | Wl    | Yl         | Xl    | <b>Z1</b> | W2        | ¥2        | X2    | Z2         | W3            | Y3            |
|-----|-------|------------|-------|-----------|-----------|-----------|-------|------------|---------------|---------------|
| 2a  | XO    | W1         | W1    | X1        | X1        | W2        | W2    | <b>X</b> 2 | X2            | W3            |
| 3a  | XO    | XO         | W1    | W1        | X1        | X1        | W2    | W2         | X2            | X2            |
| 4a  | Z0    | <b>Z</b> 0 | Y1    | Y1        | <b>Z1</b> | <b>Z1</b> | Y2    | ¥2         | Z2            | Z2            |
| 5a  | W0+Y0 | W0+Y0      | W0+Y0 | W1+Y1     | WI+YI     | W1+Y1     | W1+Y1 | W2+Y2      | W2+Y2         | W2+Y2         |
| 6a  | WO-YO | W0-Y0      | W0-Y0 | WO-YO     | W1-Y1     | W1-Y1     | W1-Y1 | W1-Y1      | W2-Y2         | W2-Y2         |
| 7a  |       | X0+Z0      | X0+20 | X0+20     | X0+Z0     | X1+21     | X1+21 | X1+Z1      | <b>X1+</b> 21 | X2+Z2         |
| 8a  |       |            | X0-Z0 | X0-20     | X0-Z0     | X0-Z0     | X1-Z1 | X1-Z1      | X1-Z1         | X1-Z1         |
| 9a  | W0+Y0 | W0+Y0      | WO-YO | WO-YO     | W1+Y1     | W1+Y1     | WI-YI | W1-Y1      | W2+Y2         | W2+Y2         |
| 10a |       | W0+Y0      | W0+Y0 | WO-YO     | WO-YO     | W1+Y1     | W1+Y1 | W1-Y1      | W1-Y1         | W2+Y2         |
| 11a |       | X0+20      | X0+Z0 | B0-D0     | B0-D0     | X1+Z1     | X1+Z1 | B1-D1      | B1-D1         | X2+Z2         |
| 12a |       |            | W0+Y0 | W0+Y0     | WO-YO     | WO-YO     | W1+Y1 | W1+Y1      | W1-Y1         | W1-Y1         |
|     |       |            | +     | •         | -         | +         | +     | •          | -             | +             |
|     |       |            | X0+Z0 | X0+20     | B0-D0     | B0-D0     | X1+21 | X1+Z1      | B1-D1         | <b>B1-</b> D1 |

Each column represents one clock cycle.

Row 1 shows the real part of the products at the output of the shifter. A, C, B and D are the four real numbers required for a single butterfly operation. The number associated with each entry (0, 1, 2, 3 etc.) refers to separate butterfly operations. (Every four clock cycles a new butterfly operation is completed.) Rows 1a, 2a, etc. represent the corresponding points in the imaginary path.

The following chart relates the rows to the indicated points in the block diagram and identifies the clock rates at these points for Radix 4 and Radix 2 operations:

| REAL | PATH    | IMAG       | PATH    | RAD4 CLK     | RAD 2 CLK |
|------|---------|------------|---------|--------------|-----------|
|      |         |            |         |              |           |
| 1    | SHIFTER | 1 <b>a</b> | SHIFTER | C32          | C32       |
| 2    | REG B   | 2a         | REG X   | C16          | C16       |
| 3    | REG A   | 3a         | REG W   | C16BAR       | C16BAR    |
| 4    | REG C   | 4 <b>a</b> | REG Y   | C16BAR       | C16BAR    |
| 5    | REG A+C | 5a         | REG W+Y | STB1 (8-MHz) | C32       |
| 6    | REG A-C | 6a         | REG W-Y | STB2 (8-MHz) |           |
| 7    | REG B+D | 7a         | REG X+Z | STB3 (8-MHz) |           |
| 8    | REG B-D | 8a         | REG X-Z | STB4 (8-MHz) |           |
| 9    | TOP MUX | 9a         | TOP MUX |              |           |
| 10   | REG E   | 10a        | REG E   | C32          | C32       |
| 11   | BOT MUX | 11a        | BOT MUX |              |           |
| 12   | REG F   | 12         | REG F   | C32          | C32       |

Register E is required to make sure the Al-Cl result is still available when required.

Row 11 represents the other input to the second adder. Row 12 represents the output of the register that follows the second adder.

For FFT lengths that are not divisible by four, the final FFT butterfly chip in the pipeline must do a Radix 2 butterfly. To accomplish this, the clock enable to the A+C and W+Y registers is changed from STB1 to C32, and the multiplexers that drive the second adder are controlled so that the adder is "bypassed". Only the first output point of each butterfly operation is calculated.

In MAC mode, the clock enables to registers A,C,W,Y,A+C and W+Y are changed to provide a 32 MHz clock to the registers. The first adder does the addition for the accumulation function. There are two fundamental modes of operation for the accumulator, non-polarization and polarization.

. In non-polarization mode, an accumulation cycle consists of reading a number from RAM, adding a new single value to that number and writing the result back to RAM, as indicated in the following chart:

# ACCUMULATOR OPERATION IN NON POLARIZATION MODE

| CLOCK | RAM   | Dr    | REG A REG C  | Dre                |
|-------|-------|-------|--------------|--------------------|
|       |       |       |              |                    |
| 1 RD  | AC PO |       |              |                    |
| 2 WR  | ?     | AC PO |              |                    |
| 3 RD  | AC P1 |       | AC PO NEW PO |                    |
| 4 WR  | RO    | AC P1 |              | RO- AC PO + NEW PO |
| 5 RD  | AC P2 |       | AC P1 NEW P1 |                    |
| 6 WR  | Rl    | AC P2 |              | R1= AC P1 + NEW P1 |
| 7 R.D | AC P3 |       | AC P2 NEW P2 |                    |
| 8 WR  | R2    | AC P3 |              | R2-ACP2 + NEWP2    |
| 9 R.D | AC P4 |       | AC P3 NEW P3 |                    |
| 10 WR | R3    | AC P4 |              | R3- AC P3 + NEW P3 |
| 11 RD | AC P5 |       | AC P4 NEW P4 |                    |
| 12 WR | R4    |       |              | R4-ACP4 + NEWP4    |

(AC- ACCUM PO, P1 etc. - POINT NUMBERS RO, R1 etc. - RESULTS)

In polarization mode, it is necessary to read two numbers from RAM (RxR and RxL for example), add in two new adjacent points of data, and write the results back to RAM. The following table represents the timing for a 512 point FFT in polarization mode, where each FFT produces 256 points to be accumulated (thus the last stage of the FFT pipeline has a 1K output buffer containing four adjacent FFTs). For smaller FFTs, there will be more adjacent FFTs in the 1K buffer, but they will be read in pairs and each set handled in the same manner.

#### ACCUMULATOR OPERATION IN POLARIZATION MODE

| CL | ock |     | RAN       | 1   | Dr     | REG A  | REG C  | Dre                 |
|----|-----|-----|-----------|-----|--------|--------|--------|---------------------|
|    |     | • • | • • • •   |     |        |        |        | ******************  |
| 1  | RD  | AC  | PO        | RR  |        |        |        |                     |
| 2  | RD  | AC  | PO        | RL  | ACPORR |        |        |                     |
| 3  | WR  | ?   |           |     | ACPORL | ACPORR | FOPORR |                     |
| 4  | WR  | ?   |           |     |        | ACPORL | FOPORL | TORR- ACPORR+FOPORR |
| 5  | RD  | AC  | <b>P1</b> | RR  |        | TORR   | FLPORR | TORL- ACPORL+FOPORL |
| 6  | RD  | AC  | P1        | RL  | ACP1RR | TORL   | FIPORL | RORR-TORR+F1PORR    |
| 7  | WR  | ROE | RR        |     | ACP1RL | ACP1RR | FOPLRR | RORL=TORL+F1PORL    |
| 8  | WR. | ROF | RL .      |     |        | ACPIRL | FOPIRL | T1RR- ACP1RR+F0P1RR |
| 9  | RD  | AC  | P2        | RR  |        | TIRR   | FlP1RR | TIRL- ACPIRL+FOPIRL |
| 10 | RD  | AC  | P2        | RL. | ACP2RR | TIRL   | FIPIRL | R1RR-T1RR+F1P1RR    |
| 11 | WR  | R1F | RR        |     | ACP2RL | ACP2RR | FOP2RR | R1RL=T1RL+F1P1RL    |
| 12 | WR  | R1F | TT.       |     |        | ACP2RL | FOP2RL | T2RR- ACP2RR+FOP2RR |
| 13 | RD  | AC  | P3        | RR  |        | T2RR   | F1P2RR | T2RL- ACP2RL+F0P2RL |
| 14 | RD  | AC  | P3        | RL  | ACP3RR | T2RL   | F1P2RL | R2RR=T2RR+F1P2RR    |
| 15 | WR  | R2F | RR        |     | ACP3RL | ACP3RR | FOP3RR | R2RL=T2RL+F1P2RL    |
| 16 | WR  | R21 | II.       |     |        | ACP3RL | FOP3RL | T3RR- ACP3RR+F0P3RR |

(AC= ACCUM T= TEMP R= RESULT FO= FFT#0 F1- FFT#1)

For each point nr x :
 TxRR= (AC Px RR) + (FFT0 Px RR)
 RxRR = TxRR + (FFT1 Px RR)
 (That is, each result is the sum of the accumulation plus
points from two adjacent fft's)

The data written to ram is delayed one clock from Dre (SHOWN AS Dree ON BLOCK DIAGRAM)

A1.5 FIXED POINT TO FLOATING POINT CONVERSION

For FFT mode, the fixed point results of butterfly operations are converted back to 7,7,4 floating point format in this stage.

The calculation of the exponent provides a capability of adding a programmable value to the exponent to counteract the tendency of the exponent to grow more negative at each stage. (The amount of exponent adjustment at each stage of the FFT pipeline will be determined in such a manner as to insure that overflow does not occur to any significant extent.)

# A1.6 NUMBER CONTROLLED OSCILLATOR

This section is functionally independent of the rest of the device. An eight bit slice of a wider counter is implemented on each device. Cascading some number of these circuits provides a wider number controlled oscillator that increments at a controlled rate. The applications for the NCO are to generate the fringe rotation and fractional sample error corrections.

# Appendix II

This appendix shows two block diagrams used in estimating the gate count presented in section IV. Figure A2.1 shows a block diagram for a 5-bit by 7-bit sign/magnitude multiplier. This diagram is straight forward and little discussion is required. The gate estimate is given on the figure and comes from generic gate array estimates of primitive logic elements.

Figure A2.2 shows a block diagram for one of the data shifters seen in the block diagram. A 4-bit binary shift code programs the shifter for the desired shift. Again the operation is straight forward.



FIGURE A2.1



FIGURE A2.2





|        |       | ADIX 4 AND | RADI | ×z | BUTTEN  | 'L Y | OPERA | T 2 |
|--------|-------|------------|------|----|---------|------|-------|-----|
| A+ 3 M | s o 🛶 | ~          | • •  | 1  | ( (A+C) | ٠    | (8+0) | 1   |
| 8+1X   | 314   | SA         | • 1  | 3  | ( (A-C) | ٠    | (X-Z) | 1   |
| C+ 1 Y | 2 2 - | 200 -      |      | 2  | ( (A+C) | -    | (8+0) | 1   |
| 0+12   | 4.3 🛋 |            | • 3  | 4  | ( (A-C) | -    | (X+2) | 1   |
|        |       | ACTUAL OR  |      |    |         |      |       |     |
| A + 3W | • • ~ | $\sim$     | •    |    | (A+C) 4 | • •  | (W+Y) |     |
| C + 14 | 2.4   | $\sim$     | •    |    | (A-C) + | • •  | (H-A) |     |