# NATIONAL RADIO ASTRONOMY OBSERVATORY Charlottesville, Virginia

#### MEMORANDUM

14

| To: | D. | Bagri     | $\checkmark$ . | Romney   |
|-----|----|-----------|----------------|----------|
|     | В. | Clark     | S.             | Padin    |
|     | J. | Carlstrom | R.             | Sramek   |
|     | L. | D'Addario | D.             | Thornton |
|     | D. | Emerson   | J.             | Welch    |

From: R. Escoffier

Subject: A Possible MMA Correlator Design

This memo describes the outline of a design for a MMA correlator. The lag design approach used here is not meant to be selected as the MMA standard but just to present a practical design to which future designs can be compared. The final decision on a MMA correlator architecture should be made later during the initial phases of an actual design project.

The correlator design is based on the MMA specifications listed below:

40 antennas
4 4-GHz samplers per antenna
3 KM maximum baseline
1024 lags per baseline at a 2 GHz bandwidth

The design below does not distinguish between continuum and spectral line observations. The samplers and correlators can be configured in factors of two from all samplers working at maximum bandwidths to a single sample per antenna working at the narrowest bandwidth.

A conservative design approach using 125 MHz interconnect technology is contemplated. Higher speed correlator chips and interconnect could be considered for cost effectiveness, but for now a well understood (essentially VLA correlator) technology is assumed. The design is based on a hypothetical correlator chip which is based, in turn, on the 1024-lag correlator chip used in the GBT spectrometer. It represents a conservative projection of what should be possible and affordable when the MMA is funded.

# I. Block Diagram

A block diagram for the MMA correlator is seen in Figure 1. Four 4-GHz samplers are available for each antenna. This configuration is derived from the minimum system given in Dick Thompson's memo of April 18, 1995. A full



16-GHz bandwidth system, as described in this memo, would require eight 4-GHz samplers per antenna. Digital delay lines with up to 32- $\mu$ sec of delay adjustment are provided for each sampler output.

A switching matrix is used to provide all of the mode versatility required. This includes reducing the number of active samplers from four to two or one per antenna and/or discarding samples to reduce the effective sample rate of a given sampler. As the aggregate data rate of the samplers goes down, the switching system will re-route the samples from active samplers to the correlator chips to optimize the number of lags for the observation being conducted.

The output of an active sampler is used to fill large RAMs in the memory system shown in Figure 1 so as to use the correlator chips more efficiently. The conventional technique used in correlators, where the correlator chips run at a lower clock rate than the samplers, uses a two-dimensional array of correlator chips to insure that every sample in the parallel output of one sampler is correlated with every sample in the parallel output of a second sampler. RAM memory is used here, however, to reduce the correlator requirements to a one-dimensional array of correlator chips.

The correlators seen in Figure 1 are formed into 40 by 40 arrays to correlate the outputs of the 40-station array. The 32-wide parallel outputs of the samplers (at 125 MHz) require an additional dimension to the correlator array as does the 4 parallel samplers. Thus, a total of 40 X 40 X 32 X 4 individual correlators (of some lag length) are required by this correlator.

Not shown in the block diagram of Figure 1 is the requirement for a long-term accumulator (LTA). The correlator chips themselves will provide short-term accumulation (from 1 to 16 msec). This relative long integration time provided by the correlator chips should allow the LTA to be made with high density (and inexpensive) dynamic RAMs. It is assumed that several integrations bins will be built into the LTA structure (for signal/reference/ calibration, etc.).

### II. Samplers

Four (or eight for a full 16-GHz system) 4-GHz samplers are available for each MMA antenna. By the time the MMA correlator is designed, it is assumed that there will be several approaches available for the design of this part of the correlator.

A phase lock loop can be used to phase shift the sample clock and adjust the exact sample time to a fraction of the sample period.

Either 3-level or 4-level samplers could be contemplated with the correlator chip and the sampler itself being the only part of the design significantly affected by this decision.

Integral to a 4-GHz sampler would be a 1-to-32 serial-to-parallel conversion stage allowing the sampler to use a 125 MHz output clock (actually, two such stages, one for each sampler bit, are required). The output of the sampler

system for one antenna would hence be 4 X 32 X 2 signals with a 125 MHz clock rate. A given signal line from a sampler would carry a bit from every  $32^{nd}$  sample.

## III. Delay Lines

There will be 131,072 bits of RAM associated with the output of each 4-GHz sampler, yielding a delay range of 32  $\mu$ sec (this is more than required, but is consistent with the size of fast RAMs). Since RAM addressing can only adjust the delay in steps of 32 samples, some additional logic will be required to obtain the final delay resolution of 1 bit.

The 131,072 bits would be provided in 16 1K X 8 RAMs (for each sampler bit). The entire delay requirement for one antenna will take 128 1K X 8 RAMs and associated logic and will probably fit on two identical PC cards of moderate size. The full 16-GHz system would have twice this number.

#### IV. Memory and Switching Matrix

The memory cards illustrated in the block diagram seen in Figure 1 will convert the 32-wide parallel sampler output (with each output carrying every  $32^{nd}$  sample) into 32 parallel outputs of a different format. The samples from the 32 sampler outputs will be written into a large memory in time order and read from the RAM as 32 parallel outputs, each carrying a short burst of contiguous samples.

If the RAM is thought of as a circular buffer  $1024 \times 32 \times 128$  samples in circumference, each of the 32 (2-bit) outputs from a memory card would be assigned 1/32 of the total RAM. The 128-bit broadside input to the RAM buffer (obtained by splitting each of the 32 parallel sampler outputs into four parallel lines) are wired to store 128 consecutive samplers into a given RAM address after a write pulse. Thus, the RAM buffer can be thought of as a linear time buffer containing 4,194,304 consecutive samples (originally taken at 4 GS/S) at any given time.

As stated above, each of the 32 outputs of the buffer is assigned 1/32 of the total buffer or 131,072 samples. At a clock rate of 125 MHz, the RAM can support a burst of contiguous samples from one output requiring about 1 msec to scan. In this 1 msec scan time, the entire RAM can be re-written at the 4 GHz sample rate. Hence, at the completion of each 131,072 sample scan, new samples are available for a subsequent scan. Thus, the correlator system will see short bursts of 131,072 contiguous samples originally taken at 4 GS/S but now slowed down to 125 MS/S from each of the 32 memory card outputs.

In this arrangement, all samples are used with the exception of 256 samples at the start of each burst which are required to fill the 256-bit lag generating shift register in the correlator before integration can begin. An additional small loss of sensitivity occurs since samples at the burst boundaries will not be correlated with samples in adjacent 1 msec segments.

For full versatility, two memories are required for each 4-GHz sampler. Each 40 antenna X 40 antenna correlator array has two dimensions and each axis

requires one memory card per antenna (the two dimensions are driven by the prompt and delayed memory card outputs in Figure 1, the prompt signal represents the correlator input that drives all correlators in the 256-lag block and the delayed signal represents the signal that goes down the 256-bit shift register of the correlator).

When fewer than four samplers are being used, the switching matrix of Figure 1 can connect the remaining active sampler outputs to more than one memory card. (It would probably be possible to put multiplexor stages in the custom correlator chip to reduce the memory card requirement to only one per 4-GHz sampler.)

When a sample rate of less that 4 GHz is required, fewer that 32 inputs to the RAMs are required and the 32 outputs of the RAMs can be used to generate additional lags. Addressing in the "delayed memory" of Figure 1 can be offset to generate large lags allowing full digital versatility of the correlator with minimal number of switching stages.

For example, suppose only two samplers per antenna were active in a given observation. The correlator chips normally used by the two inactive samplers will be used to obtain twice the number of lags by having each active sampler drive its own memory cards plus an inactive sampler's memory cards. This possibility means that the correlator chips need not be cascaded together to produce more lags. As stated above, the delayed memory can, by offset RAM addressing, instantaneously generate the lag offset required by the higher lag correlator chips.

## V. Correlators

The correlator is designed around a proposed correlator chip. A block diagram of this chip is seen in Figure 2. The chip is proposed to be a 4 X 4 array of 256-lag correlators that operate at a clock rate of 125 MHz. A little bit of multiplexing on the chip would probably make the switching matrix easier to design, but this aspect of the design has not been pursued much at this point.

The ability to break the 256-lag correlators into two 128-lag correlators to support polarization observation will probably be necessary. Also, if the full 16-GHz system is selected, the correlator chip could be built as thirtytwo 128-lag correlators. With this modification, the full bandwidth system could be built with the same number of correlators (but with twice the delay lines, memory cards and twice the 125-MHz cabling).

The total number of correlators required by this design is 40 X 40 X 32 X 4 or 204,800 256-lag correlators. By placing 16 such correlators on the chip in a small array, the number of chips required is reduced to a more practical number of 12,800 chips. Even with this number of chips, 200 to 400 correlator cards will be required for the MMA correlator.

The correlator chip seen in Figure 2 represents a factor of two increase in integration level from the 1024-lag correlator chip being used in the GBT spectrometer (assuming a 3-level by 3-level correlator). The GBT chips have 1024 3-level correlators, 1024 32-bit integrators and 1024 32-bit secondary



storage registers for results readout. Cutting the short-term integrator to 12 or 16 bits, while increasing the total number of lags to 4096, results in an increase of the integration level of the chip by a factor of about two.

A higher speed correlator chip might be cost effective but would require a more expensive signal interconnect technology. One compromise might be to double the speed of the correlator chip but keep the data input rate at 125 MHz by putting 2-into-1 mux stages on the chips. This would halve the number of correlator chips required and would still allow use of a relatively easy interconnect technology.

## VI. Long-Term Accumulation

A long-term accumulator design should be fairly straightforward. The one to several millisecond integration capacity of the correlator chips should allow high density and low cost DRAMs to be used here.

The LTA and the correlator switching networks can be designed for very rapid switching between modes. The fundamental memory cycle of 1 msec can be carried through to other parts of the system such that the system should have the ability to switch from full bandwidth continuum to spectral line, for example, many times a second. Additional integration/storage space can be put into the LTA to do essentially simultaneous wide band and narrow band observations.

# VII. Performance

Straight factors of two trade-off between bandwidth and frequency resolution are made easy by the use of the memory cards. Because these cards can generate large lags by RAM addressing, the correlator arrays need not be interconnected. It might be advantageous to cascade the 256-lag correlator segments on the correlator chips with switching stages, but the correlator chips or matrices themselves need not be cascaded to increase the frequency resolution.

As the bandwidth is halved, the number of lags available for a given sampler doubles, and the frequency resolution improves by factors of four until the bandwidth (per sampler) goes below 62.5 MHz. After this point, factors of two improvement will occur unless recirculation is built into the correlator. The table below gives some of the performance parameters to be expected from this correlator design:

A) FOUR ACTIVE SAMPLERS PER ANTENNA (NO POLARIZATION CROSS PRODUCTS):

| <u>Total Bandwidth</u>  | Lags/IF | Frequency Resolution/IF |
|-------------------------|---------|-------------------------|
| 8 GHz                   | 256     | 7.8125 MHz              |
| 4 GHz                   | 512     | 1.953 MHz               |
| 2 GHz                   | 1024    | 0.488 MHz               |
| 1 GHz                   | 2048    | 122.070 KHz             |
| 500 MHz                 | 4096    | 30.517 KHz              |
| 250 MHz                 | 8192    | 7.629 KHz               |
| 125 MHz (oversampling)  | 8192    | 3.814 KHz               |
| 62.5 MHz (oversampling) | 8192    | 1.907 KHz               |

B) FOUR ACTIVE SAMPLERS PER ANTENNA (WITH POLARIZATION CROSS PRODUCTS):

| Total Bandwidth         | Lags/Product | Frequency Resolution/IF |
|-------------------------|--------------|-------------------------|
| 4 GHz                   | 128          | 15.625 MHz              |
| 2 GHz                   | 256          | 3.906 MHz               |
| 1 GHz                   | 512          | 0.976 MHz               |
| 500 MHz                 | 1024         | 244.140 KHz             |
| 250 MHz                 | 2048         | 61.035 KHz              |
| 125 MHz                 | 4096         | 15.258 KHz              |
| 62.5 MHz (oversampling) | 4096         | 7.629 KHz               |
| 31.2 MHz (oversampling) | 4096         | 3.814 KHz               |

C) TWO ACTIVE SAMPLERS PER ANTENNA (NO POLARIZATION CROSS PRODUCTS):

| <u>Total Bandwidth</u>  | Lags/IF | Frequency Resol | ution/IF |
|-------------------------|---------|-----------------|----------|
| 4 GHz                   | 512     | 3.906           | MHz      |
| 2 GHz                   | 1024    | 0.976           | MHz      |
| 1 GHz                   | 2048    | 244.140         | MHz      |
| 500 MHz                 | 4096    | 61.035          | MHz      |
| 250 MHz                 | 8192    | 15.258          | KHz      |
| 125 MHz                 | 16384   | 3.814           | KHz      |
| 62.5 MHz (oversampling) | 16384   | 1.907           | KHz      |
| 31.2 MHz (oversampling) | 16384   | 0.953           | KHz      |

D) TWO ACTIVE SAMPLERS PER ANTENNA (WITH POLARIZATION CROSS PRODUCTS):

| Total Bandwidth         | Lags/Product | Frequency Resolution/IF |
|-------------------------|--------------|-------------------------|
| 2 GHz                   | 256          | 7.8125 MHz              |
| 1 GHz                   | 512          | 1.953 MHz               |
| 500 MHz                 | 1024         | 0.488 MHz               |
| 250 MHz                 | 2048         | 122.070 KHz             |
| 125 MHz                 | 4096         | 30.517 KHz              |
| 62.5 MHz                | 4096         | 7.629 KHz               |
| 31.2 MHz (oversampling) | 4096         | 3.814 KHz               |
| 15.6 MHz (oversampling) | 4096         | 1.907 KHz               |

E) ONE ACTIVE SAMPLER PER ANTENNA:

| Total Bandwidth         | Lags/IF | Frequency Resol | <u>ution/IF</u> |
|-------------------------|---------|-----------------|-----------------|
| 2 GHz                   | 1024    | 1.953           | MHz             |
| 1 GHz                   | 2048    | 0.488           | MHz             |
| 500 MHz                 | 4096    | 122.070         | KHz             |
| 250 MHz                 | 8192    | 30.517          | KHz             |
| 125 MHz                 | 16384   | 7.629           | KHz             |
| 62.5 MHz                | 32768   | 1.907           | KHz             |
| 31.2 MHz (oversampling) | 32768   | 0.953           | KHz             |
| 15.6 MHz (oversampling) | 32768   | 0.476           | KHz             |

In addition to the modes shown above, mixed modes (where one sampler samples a wide bandwidth and another sampler on the same antenna samples a narrow bandwidth) and subarrays will be easily accommodated by this design.

# VIII. Estimated Size and Power Requirement

The (8-GHz bandwidth) system described above would require 160 4-GHz samplers and approximately 600 PC cards in the 6-U to 9-U EURO card size. This would require approximately four racks for the samplers and eight racks for the correlators. Power dissipation in the 100 to 200 KW range should be expected.

By far the most difficult design problem this correlator will present is in the signal cabling. One matrix of 40 X 40 correlators for a 4-GHz sampler will require 5120 125-MHz cables driving 51,200 loads. To this total a factor of 4 or 8 must be applied for the full 8- or 16-GHz system.