VLBA Correlator Memo No. <u>56</u>

### C S I R O Radiophysics Australia Telescope Project

#### Bandwidth Multiplication and Correlator Arrays λ Tutorial 5 February 1986

M S Ewing

#### 1 INTRODUCTION

Digital correlators for fringe detection and spectroscopy in radio astronomy operate on one or two input streams of data samples. The correlators may be arranged in two-dimensional arrays, e.g., in order to use slow logic elements to process fast data streams. This paper develops some general ideas of how to make use of arrays of correlators.

#### 2 BASIC CONCEPTS

<u>Elementary Correlator</u> - An elementary correlator accepts two data streams, multiplies them sample by sample, and accumulates the result for a specified integration time. For the purposes of this memo, the data sample resolution (number of bits/sample) is arbitrary, as are the details of the multiplier and accumulator. Symbolically, the elementary correlator computes the function

$$r(X,Y) = \sum_{i} (X[i] * Y[i])$$

over the integration period. In schematic form the elementary correlator may be drawn as in Fig. 1.



<u>Correlator Array</u> - Elementary correlators may be connected in a 2-dimensional scheme called a correlator array. The location of an elementary correlator specifies its input data streams. An N x N array may be written

$$R(X,Y) = \begin{pmatrix} (r(X1,Y1) & r(X2,Y1) & \dots & r(XN,Y1) ) \\ (r(X1,Y2) & r(X2,Y2) & \dots & r(XN,Y2) ) \\ ( & \dots & & \dots & ) \\ ( & \dots & & \dots & ) \\ ( & r(X1,YN) & r(X2,YN) & \dots & r(XN,YN) ) \end{pmatrix}$$

Equivalently, an array may be drawn as in Fig. 2.



# Fig. 2. Correlator Array

Arrays of correlators are convenient, particularly in VLSI implementations, since the number of inputs (N) rises only with the square root of the number of elementary correlators. (In this memo we always deal with square correlator arrays.)

<u>Serial-to-Parallel</u> <u>Converter</u> - A fast stream of data samples may be divided into N streams clocked at 1/N times the input clock rate. The serial-to-parallel converter is represented by the function

P(i,j) = X(N\*i + j); j = 0,1,...,N-1

where <u>i</u> is the "word number" of the parallel output, and <u>i</u> specifies the sample within word <u>i</u>. If N=4, the output words consist of

[X(0), X(1), X(2), X(3)]Word 1[X(4), X(5), X(6), X(7)]Word 2

The input data stream is broken into four streams each containing every fourth sample. The output streams are clocked at 1/4 of the input clock rate. A typical schematic of a serial to parallel converter is given in Fig. 3. Each signal path is <u>n</u>

etc.

Serial Data In n Serial Clock In +4 Register N Po P1 P2 P3 Parallel Data Out

bits wide, where n is the number of bits per sample.

## Fig. 3. Serial-to-Parallel Convertor

<u>Bandwidth Multiplication</u> - A correlator system running at S multiplies/sec can be realized using elementary correlators running at S/F multiplies/sec through the technique of Bandwidth Multiplication (BWM). The ratio of clock rates, F, is the <u>Bandwidth Multiplication Factor</u>. In convenient implementations, F will be a power of 2. Thus some number of 10 MHz elementary correlators may be connected to realize a 40 MHz correlator system with a bandwidth multiplication factor of 4.

## 3 CONSTRUCTING A BW MULTIPLYING CORRELATOR

A bandwidth multiplication (BWH) correlator can be made from two serial-to-parallel converters and a correlator array. A schematic for a simple BWM correlator operating at 40 MHz with a BWM factor F=4 might resemble Fig. 4.



Fig. 4. Simple BW Multiplying Correlator

The X and Y input streams are divided into four 10 MHz streams each, and all crossproducts are taken in the correlator array. The products that are realized are

 $\begin{array}{c} (r(X1,Y1) r(X2,Y1) r(X3,Y1) r(X4,Y1) ) \\ R(X,Y) = (r(X1,Y2) r(X2,Y2) r(X3,Y2) r(X4,Y2) ) \\ (r(X1,Y3) r(X2,Y3) r(X3,Y3) r(X4,Y3) ) \\ (r(X1,Y4) r(X2,Y4) r(X3,Y4) r(X4,Y4) ) \end{array}$ 

Every 100 ns, a new set of products is taken with independent data.

Except for the diagonal elements, the correlations are taken between "lagged" (delayed) samples. If we organize R(X,Y) by lag, we see that lags are not equally represented in the array.

| -3       | -2                   | -1                               | LAG<br>O                                     | +1                               | +2                   | +3       |
|----------|----------------------|----------------------------------|----------------------------------------------|----------------------------------|----------------------|----------|
| r(X1,Y4) | r(X1,Y3)<br>r(X2,Y4) | r(X1,Y2)<br>r(X2,Y3)<br>r(X3,Y4) | r(X1,Y1)<br>r(X2,Y2)<br>r(X3,Y3)<br>r(X4,Y4) | r(X2,Y1)<br>r(X3,Y2)<br>r(X4,Y3) | r(X3,Y2)<br>r(X4,Y3) | r(X4,Y1) |
| <u>1</u> | 2                    | <br>3<br>NUMBI                   | 4<br>ER OF PROI                              | <br>3<br>DUCTS                   | 2                    | 1        |

This <u>triangular weighting</u> of products by lag is a characteristic of BWM using square arrays. Since the next data word (group of 4 samples) will be independent of the current one, the "missing" products cannot be filled in. A trapezoidal array would be required to provide equal lag weighting, but such an array is relatively awkward to implement. In many cases, triangular weighting can be tolerated; it corresponds to a windowing function when an Fourier transform is calculated, leading to a relatively low-sidelobe "beam", [sin(x)/x]\*\*2 "sinc squared", in the frequency domain.

A "<u>Compound BWM Correlator Array</u>" may be constructed from square correlator arrays. Array inputs are delayed by a number of samples equal to F, the BWM factor, or one "word". For example, three 4 x 4 correlator arrays may be connected according to Fig. 5



In this case, the <u>span</u>, the maximum lag range, is extended to +/-7 samples. Moreover, the <u>cover</u>, the range of uniformly weighted lags, is +/-3. A symmetric "L" shaped array gives symmetric lag coverage, as indicated in Fig. 6.



Fig. 6. Weight vs. Lag. Compound Correlator

In a more general scheme,  $Lx+Ly-1 N \times N$  correlator arrays may be connected as shown in Fig. 7. This compound array will cover +/- F\*(Lx+Ly) lags uniformly and span +/- F\*(Lx+Ly+2).



### Fig. 7 Scheme for Compound BWM Correlator

Each point in the X-Y plane corresponds to a particular pair of data samples. Inside the "Ls", the the pairs are correlated in one or another elementary correlator. Successive words make successive "Ls", giving a central filled-in area, where all lags are present, but "staircases" on the edges, where the lack of some products produces the triangular weighting.

Total lag coverage depends only on the total number of arrays (Lx+Ly-1). If Lx=Ly, the coverage is symmetrical about zero, while if Lx=1 and Ly>1 (or vice versa), the uniform part of lag coverage is "single-ended", with zero lag occuring near one end.

## 4 EXAMPLE: THE AT COMPACT ARRAY CORRELATOR

The Australia Telescope's Compact Array Correlator is built up of the custom-designed XCELL chips, which are 8 x 8 correlator arrays. A single board, or "product", comprises 16 XCELLs with inputs coming from corresponding IF channels of one pair of telescopes. In the widest bandwidth mode a product is configured as a single 32 x 32 correlator array, with the input sample rate equal to 32 times the correlator clock rate, a BWM factor F=32. This is 512 Ms/s (256 MHz bandwidth) or 256 Ms/s (128 MHz) with 1- or 2-bit sampling, respectively. In Fig. 7, this configuration corresponds to Lx=Ly=1, producing a +/- 31 lag coverage with triangular weighting.

For the AT correlator, the <u>reconfiguration</u> <u>number</u>, <u>R</u>, is defined inversely to the multiplication factor F:

$$R = \frac{32}{--- 1}$$

The full 32 x 32 array is "unreconfigured", i.e., R=0. Other sample rates are achieved, in descending factors of two, by reconfiguring the product and reducing the bandwidth multiplication.

With a F=16, or a reconfiguration R=1, the product has 4 16 x 16 arrays, corresponding, say, to Lx=2 and Ly=3 in Fig. 7. Uniform lag coverage is available over a 47 lag range with triangular "wings" extending coverage to 63 lags. The AT module provides enough optional delay switching to place the effective zero lag at an arbitrary point from the center to one end of the lag cover.

The reconfigurations may be summarised in the following table.

|   |    | •      | Lags    |         |  |
|---|----|--------|---------|---------|--|
| R | F  | Blocks | Covered | Spanned |  |
| 0 | 32 | 1      | 1       | 63      |  |
| 1 | 16 | 4      | 49      | 79      |  |
| 2 | 8  | 16     | 121     | 135     |  |
| 3 | 4  | 64     | 253     | 259     |  |
| 4 | 2  | 256    | 511     | 513     |  |
| 5 | 1  | 1024   | 1024    | 1024    |  |

R and F are the reconfiguration number and bandwidth multiplication factor, respectively. The "blocks" column is the number of subarrays made from one basic 32 x 32 module. "Covered" indicates the number of lags completely calculated, while "spanned" shows the number that are at least partially calculated.