# The ALMA Correlator LTA (Long Term Accumulator)

#### \*\*\* PRELIMINARY \*\*\* Introduction 1

This document describes the function and specification of the Long Term Accumulator (LTA) being designed for the ALMA correlator. For purposes of this description, a 64-antenna ALMA system is assumed.

The input to the LTA is from short term integrations performed in the correlator chips of the correlator. The correlator chip is capable of short term integrations of 1.0 msec for antennas in auto correlation mode (where only auto products are available) or 16.0 msec for antennas in cross correlation mode (where all cross and auto products are available).

The LTA output drives the ALMA real time computer system which is assumed here to consist of VME computers.

The basic specifications of the LTA are:

- Minimum accumulation time
  - 16 msec for antennas in cross correlation mode
  - 1 msec for antennas in auto correlation mode
- Maximum accumulation time
  - 65 seconds for antennas in cross correlation mode (integer multiples of 16 msec)
    - 8 msec for antennas in auto correlation mode (1, 2, 4 or 8 msec)
- Output interface

Ę

¢

- Front Panel Data Port (FPDP) 32 bit parallel bus, 12.5 MHz clock, 50 MByte/sec maximum capacity from each quadrant of the correlator (raw burst rate, no overhead taken into account)
- Sub-array support
  - Independent mode (cross or auto), and accumulation time, for each of 64 antennas
- Output bins (auto correlation mode only)
  - 32 bins (e.g. for an antenna in auto correlation mode, with 1 msec accumulation time, 32 successive sets of 1 msec results are stored in 32 separate memory blocks)
- Timing
  - Sub-arrays become active or inactive on 16 msec boundaries

The 1 msec cycle is the fundamental blanking cycle in the correlator system. This blanking cycle is 125,000 system clock cycles (8 nsec per cycle), where correlation is blanked for a minimum of 64 clock cycles out of every 125,000 clock cycles.

The proposed LTA design is based on a correlator chip that produces 8K lags (even though the actual chip may only produce 4K lags).

### 2 Correlator Cards, Planes and Arrays

In the discussion below, a correlator PLANE is defined as a 64-antenna times 64-antenna correlator matrix working at a clock rate of 125 MHz. A correlator plane processes 1/32 of the samples of a baseband pair (at a sample rate of 4 GS/S). With the present correlator design, a correlator plane consists of 4 printed circuit cards, where each correlator card is a 32-antenna times 32-antenna correlator matrix.

Thirty-two correlator planes are required to process the full output of a baseband pair. A set of thirty-two planes is referred to as an ARRAY in the following text. The final correlator will contain four of these arrays, one of which is shown in Figure 1, ALMA Correlator Array, below.



#### Figure 1, ALMA Correlator Array

Results from each correlator card are read out to a LTA over a single 16 bit wide bus. Additionally, the correlator card will have a capability of high speed readout, with every result read out every 1 msec, over multiple buses, to multiple LTAs. This is intended to be an option for future expansion.

Each correlator card has 1,024 correlator blocks (32 x 32), and each block produces 512 sixteen bit wide results every integration period, for a total of 512K results per card. These correlator blocks will be referred to as baselines. The total number of results in one card, one plane, one array and in the complete system of four arrays are:

| PRODUCT | ONE NON-  | ONE DIAG | ONE PLANE        | ONE ARRAY   | FOUR        |
|---------|-----------|----------|------------------|-------------|-------------|
|         | DIAG CARD | CARD     | (2 diag + 2 non- |             | ARRAYS      |
|         |           |          | diag cards)      | (32 planes) |             |
| AUTO    | 0         | 16,384   | 32,768           | 1,048,576   | 4,194,304   |
| CROSS   | 524,288   | 507,904  | 2,064,384        | 66,060,288  | 264,241,152 |
| TOTAL   | 524,288   | 524,288  | 2,097,152        | 67,108,864  | 268,435,456 |

(Total ram storage for 268,435,456 results at 4 bytes per result = 1,073,741,824 bytes; when double buffered, this produces a total of 2,147,483,648 bytes = 2 Gbyte total distributed over the LTA cards in the entire system.)

#### 2.1 Correlator Plane

Figure 2, One Correlator Plane, shows a set of four correlator cards, forming a single plane (1 of 32) of a correlator array, with the diagonal correlator chips highlighted. Each chip contains 16 baselines (a baseline produces 512 product results), and in each diagonal chip four of the baselines are AUTO products (the baselines on the chip diagonal). Each card contains 64 chips, labeled 0-63 in the figure.



Figure 2, One Correlator Plane

### 3 LTA Modes

Two modes are planned for the LTA:

Cross-Correlation Mode:

For sub-arrays in this mode, Long Term Accumulation is provided for both CROSS and AUTO products.

Auto-Correlation Mode:

For sub-arrays in this mode. Long Term Accumulation is provided for AUTO products and CROSS products are not available.

Correlator chip results are 16 bit positive integers. LTA results are 32 bit positive integers.

A total of 64 sub-arrays are supported where the characteristics of a sub-array in the LTA are defined by two parameters:

| Correlation mode:  | Cross or Auto                                                                                                               |
|--------------------|-----------------------------------------------------------------------------------------------------------------------------|
| Accumulation time: | Integer multiples of 16 msec for Cross correlation mode<br>1, 2, 4 or 8 times 1 msec for Auto correlation mode (in 32 bins) |

# 4 Correlator Card Output Data Paths

There are 64 correlator chips on a correlator card. The chips are grouped in 8 local groups of eight. Each group of 8 chips connects to a local FPGA. This FPGA provides control for sequencing results out of the correlator chips. Results from four local groups are multiplexed together, producing two output streams that are further multiplexed into one final output data link at the card output. Figure 3, Data Paths on Correlator Card, shows the 8 local groups.



Figure 3, Data Paths on Correlator Card

There are four tri-state buses (labeled 0-3 in the figure) feeding each local group controller. Each bus is 16 bits wide and connects to two correlator chips. The groups and chip buses are numbered as shown in order to allow each local controller to use the same sequence to address the diagonal results on the card.

The output data bus to the LTA is 16 bits wide. The future expansion mode would require a total of four or eight output buses (depending on whether the correlator chip produces 4K or 8K lags).

# 5 Correlator Chip to LTA Data Readout

In order to transfer results from correlator chips to LTA, results will dump to secondary storage in the correlator chips once every 16 msec for sub-arrays in cross correlation mode and once every 1 msec for sub-arrays in auto correlation mode.

The LTA must read all AUTO products for sub-arrays in auto correlation mode every 1 msec and all CROSS and AUTO products for sub-arrays in cross correlation mode every 16 msec. In order to support this requirement, the LTA will be capable of reading all the AUTO products plus 1/16 of all products (CROSS and AUTO) every 1 msec. Readout of results from the correlator card will be at an average rate of 20 nsec per result.

The readout of all AUTO products from correlator chips on one correlator card to the LTA requires 0.32768 msec (20 nsec times 16,384). This time interval will be referred to as TIME SLOT 0. Readout of 1/16 of all products (CROSS and AUTO) requires 0.655360 msec (20 nsec times 32,768). This time interval will be referred to as TIME SLOT 1. The remaining 16.96 usec of the 1.0 msec blanking cycle provides time for housekeeping tasks. The figure below is a sketch of the 1 msec blanking cycle, where the actual number of clock cycles that are blanked must be equal to at least the maximum length lag generator in use (between 64 and 512 for a 4K lag chip, between 128 and 1024 for a 8K lag chip).



#### Figure 4, Sketch of 1 msec Blanking Cycle and Time Slots

In timeslots 0 and 1 of a single 1 msec cycle, the local controller chips go through the motions of accessing all results (all AUTO results in timeslot 0 and 1/16 of all results (CROSS plus AUTO) in timeslot 1). The enabling of the transfer from correlator chips to LTA in timeslot 0 or 1 is determined by the correlation mode (cross or auto) of each antenna. During timeslot 0 a correlator chip only responds to the access if the chip has been configured for auto correlation mode. Likewise, in timeslot 1, a chip only responds if in cross correlation mode. If no antennas are in auto correlation mode, then no transfers will be enabled in timeslot 0. If no antennas are in cross correlation mode, then no transfers will be enabled in timeslot 1. See Figure 6, Correlator to LTA Transfers and LTA Memory Addressing, for a list of which results are accessed in a full cycle of sixteen 1 msec cycles.

For correlator chip readout accesses during timeslot 0, if the column antenna (the one on the prompt axis, as seen in Figure 5) is in auto correlation mode, the AUTO results are accumulated into multiple bins in the LTA memory space, otherwise writes to the LTA ram are inhibited. Similarly during timeslot 1, if the column antenna is in cross correlation mode, both AUTO and CROSS results are accumulated into the LTA memory, otherwise writes to the LTA ram are inhibited.

### 6 Data Rates from the Correlator Cards and the LTA

#### 6.1 LTA Input Data Rate In Cross-Correlator Mode

The maximum data rate from a single plane of one correlator array, when all antennas are in cross correlator mode, is 2,097,152 results every 16 msec, or 128M results per second. At two bytes per result, this is 256 MByte/sec data rate from a single plane, 8 GByte /sec total from one array of 32 planes, and 32 GByte/sec total from all four arrays.

On a per card basis, there are 524,288 results per 16 msec. The data rate is thus 32M results per second, or 64 MByte/sec from each card

### 6.2 LTA Input Data Rate In Auto-Correlator Mode

The maximum data rate from a single plane of one correlator array, when all antennas are in auto correlator mode, is 32,768 results every 1 msec, or 32M results per second. At two bytes per result, this is 64 MByte/sec data rate from a single plane, 2 GByte/sec total from one array of 32 planes, and 8 GByte/sec total from all four arrays.

On a per card basis there are 16,384 results per 1 msec. The data rate is thus 16M results per second or 32 MByte/sec from each card. AUTO results are available on two of the four correlator cards in each plane (cards 0 and 3).

#### 6.3 LTA Output Data Rate and VME Interface

The LTA will provide a window for back end access of a single result (for transferring data to the VME system) every 80 nsec for a maximum transfer rate of 12.5 MHz from each of the four correlator arrays. A standard 32 bit parallel interface, the Front Panel Data Port (FPDP) will be used to transfer data from the LTA to the VME system, one FPDP per array. At four bytes per result, the maximum data rate capacity from one array is 50 MByte/sec. This is the raw transfer rate, with no overhead taken into account.

For perspective, in a single 16 msec interval, a total of approximately 209,000 results could be transferred to the VME system from one array at 50 MByte/sec. This is close to ten percent of the total data available from a single correlator plane (or multiple planes added together).

# 7 Adder Tree

The final stages of the LTA are the Adder Tree, for adding together results from individual correlator planes. After every bank switch for a sub-array, results are available in the inactive LTA memory bank in each of the 32 planes in each of the four arrays. These results are available for transfer to the VME system. Depending on the baseband bandwidth, results from 2 or more planes may be added together in the adder tree, or results from all 32 planes may be transferred separately to the VME system, passing through the adder tree in a transparent manner.

At full bandwidth (2 GHz), all 32 planes must be summed together, producing 512 spectral channel results per baseline in each of the four arrays. At minimum bandwidth (62.5 MHz), all 32 planes contain distinct lags, producing 16,384 spectral channel results per baseline in each of the four arrays. The spectral channels from each array could be for a single baseband, or split between the two basebands in a pair, or split into full polarization mode (RR, LR, RL, LL). The following table defines 6 adder tree modes as a function of baseband bandwidth and oversampling.

| ADDER TREE | SPEC CHANS  | BASEBAND  | BASEBAND      |
|------------|-------------|-----------|---------------|
| MODE       | PRODUCED IN | BANDWIDTH | BANDWIDTH     |
| NUMBER     | EACH ARRAY  | NON-OVS   | (OVERSAMPLED) |
| 0          | 512         | 2 GHz     | l GHz         |
| 1          | 1024        | l GHz     | 500 MHz       |
| 2          | 2048        | 500 MHz   | 250 MHz       |
| 3          | 4096        | 250 MHz   | 125 MHz       |
| 4          | 8192        | 125 MHz   | 62.5 MHz      |
| 5          | 16384       | 62.5 MHz  | na            |

#### Table 1, Adder Tree Modes

Each of the modes in the table above requires a different grouping of planes to be added together in the adder tree. Table 2, Groupings of Correlator Planes for each Adder Tree Mode, found at the end of this document, defines which planes must be added together to produce which spectral channels for each of the adder tree modes. Control of which planes are added together is from the VME system. A mask is provided with each request from the VME to specify which planes are to be added together in the adder tree.

# 8 LTA and Adder Tree Control

Each of the 64 sub-arrays will have a correlation mode, accumulation time, and adder tree mode specified. As defined previously, supported accumulation times are integer multiples of 16 msec in cross-correlation mode, and multiples of 1, 2, 4 or 8 times 1 msec in auto-correlation mode. Each sub-array will be capable of becoming active or inactive on 16 msec boundaries. Adder tree modes are those defined in Table 1.

Each LTA chip (an FPGA handling results from a single correlator card) will maintain 32 bank switch counters, one per antenna on the prompt axis as seen in Figure 1, ALMA Correlator Array. The two LTA for correlator cards 0 and 2 keep track of the bank switching for antennas 0-31, while the two LTA for correlator cards 1 and 3 keep track of bank switching for antennas 32-63. Each LTA will be given the number of accumulations, N, required for each of the 32 antennas, and each LTA will be commanded when to become active or inactive for each of these 32 antennas. To become active means to set the bank switch counter to zero at the next 16 msec boundary and then to count modulo N, switching banks at each return to zero.

The LTA will also be given the AUTO/CROSS mode for each antenna. When an antenna is in AUTO correlation mode, the value N above will specify the number of 1 msec intervals. When an antenna is in CROSS correlation mode, the VME system must convert N (the number of 16 msec intervals to accumulate) to the corresponding number of 1 msec intervals before sending the value to the LTA since the bank switch counters are 16 bits long, and count 1 msec intervals.

The Adder Tree Mode is provided with each request from the VME system for transfer of data, in the form of a mask that specifies which of the 32 planes to add together for the requested baseline.

The blanking signal to the correlator chips will also be used to provide a global blanking feature in each of the four arrays. In order to support global blanking, a counter will be required to count the number of samples that go into each accumulation. Although there is only one blanking signal per correlator array, a total of 64 such counters will be required in each correlator array, to handle 64 sub-arrays (since each sub-array potentially has a different accumulation time).

The internal communications bus for the correlator has not been defined.

# 9 LTA Memory Addressing

Each LTA provides double buffered storage for every result on a correlator card. There are 512K results per card. Figure 5, Correlator Readout Paths, defines the row and column numbers on a single correlator card (Card 0 in this case). The eight FPGA local controller groups are shown, each with four chip buses (0-3) and two correlator chips per chip bus (C0 and C1).

Figure 6, Correlator to LTA Transfers and LTA Memory Addressing, defines the sequence in which results are transferred from correlator card to LTA. A lookup table is used to translate the GROUP, BUS, CHIP and CHIP\_BASELINE fields into card based ROW and COLUMN fields. Thus the LTA memory stores results using address fields that relate directly to the ROW and COLUMN fields on a correlator card as defined in Figure 5, Correlator Readout Paths. For antennas in auto correlation mode, rows 0-31in the LTA memory (at the appropriate column) are used for the thirty two bins in which to store the N \* 1 msec results. Figure 7, LTA Memory Address Generation, shows details of the address generation in the LTA.

For every result transferred to the LTA, the COLUMN field is used to lookup the correlation mode (cross or auto mode) and the current active bank. Adder Tree mode information is provided when results are read from the LTA inactive bank. This is in the form of a mask, provided by the VME system, that specifies which planes are summed together in the adder tree.

### **10 LTA and Adder Tree Cards**

Figure 8, LTA and Adder Tree For One Array, presents a possible implementation of the LTA and Adder Tree on a total of 9 cards. A single LTA function handles one Correlator Card. The eight LTA/ADDER TREE cards shown in the figure each contain 16 LTA providing a total of 128 LTA functions plus the initial stages of the adder tree. One additional card contains the final adder stage and the FPDP interface to the VME system. This card receives requests for data transfer from the VME system and routes the requests to the individual LTA/ADDER TREE cards.

### 11 FPDP Data Transfers

The FPDP interface is bi-directional. The VME will make requests over the FPDP interface for data to be transferred back over the interface from the LTA. At a minimum, the request must specify the ROW and COLUMN of a baseline to be transferred, along with a 32 bit mask that determines which of the 32 planes (if any) will be added together. For example, if row 5, column 37 (see figure below) is to be requested, and if the adder tree mode is mode 4, spectral channels 0-511 are produced by adding together planes 0 + 16. Spectral channels 512-1023 are produced by adding together planes 1 + 17 etc. (See Table 2.) The correlator end of the interface will determine which of the four cards must be accessed based on the MS bits of the row and column fields. In the process of defining the exact format of the request buffer from the VME system to the LTA, we will determine if the transfer size is always exactly 512 results (or 256 in the case of a 4K lag correlator chip) and if more than one baseline can be requested at a time. For example, in auto correlator mode, perhaps the request could specify column 37, row 0 and a total of 32 rows so that a single request would cause all 32 bins to be transferred for "antenna" 37.



The following table is a suggested starting point for the definition of the request buffer:

| TR                                    | AN             | SFE         | ER F  | REĢ  | <u>כת:</u> | ES | т   | BI  | JF | FER |
|---------------------------------------|----------------|-------------|-------|------|------------|----|-----|-----|----|-----|
|                                       | 31             | 30          |       | 6    | 5          | 4  | 3   | 2   | 1  | 0   |
| FIRST ROW NUMBER                      |                |             |       |      | R5         | R4 | R3  | R2  | R1 | R0  |
| LAST ROW NUMBER                       |                |             |       |      | L5         | L4 | L3  | L2  | L1 | LO  |
| COLUMN NUMBER                         |                |             |       |      | C5         | C4 | C3  | C2  | C1 | C0  |
| 32 BIT PLANE MASK                     | M31            | M3 0        |       | М6   | M5         | M4 | MЗ  | M2  | M1 | M0  |
| THE FIRST THREE S<br>INTO A SINGLE 32 | IX BI<br>BIT W | T IT<br>ORD | EMS C | JULI | ) BE       | CO | MBI | NED |    |     |





**Figure 5, Correlator Readout Paths** 

.



Figure 6, Correlator to LTA Transfers and LTA Memory Addressing

.



Page 11 08/09/99







Figure 8, LTA and Adder Tree For One Array

۴

5

### Table 2, Groupings of Correlator Planes for each Adder Tree Mode

| MODE 0       |                       | SPEC CHANS |
|--------------|-----------------------|------------|
| 0+1+2+3+4+5+ | +25+26+27+28+29+30+31 | 0-511      |
| MODE 1       |                       |            |
| MODE I       |                       | SPEC CHANS |

| 0+2+4+6+8+10+12+14+16+18+20+22+24+26+28+30 | 0-511    |
|--------------------------------------------|----------|
| 1+3+5+7+9+11+13+15+17+19+21+23+25+27+29+31 | 512-1023 |

| MODE 2                | SPEC CHANS |  |  |
|-----------------------|------------|--|--|
| 0+4+ 8+12+16+20+24+28 | 0- 511     |  |  |
| 1+5+ 9+13+17+21+25+29 | 512-1023   |  |  |
| 2+6+10+14+18+22+26+30 | 1024-1535  |  |  |
| 3+7+11+15+19+23+27+31 | 1536-2047  |  |  |

| MODE 3           | SPEC CHANS |  |  |  |
|------------------|------------|--|--|--|
| 0 + 8 + 16 + 24  | 0- 511     |  |  |  |
| 1 + 9 + 17 + 25  | 512-1023   |  |  |  |
| 2 + 10 + 18 + 26 | 1024-1535  |  |  |  |
| 3 + 11 + 19 + 27 | 1536-2047  |  |  |  |
| 4 + 12 + 20 + 28 | 2048-2559  |  |  |  |
| 5 + 13 + 21 + 29 | 2560-3071  |  |  |  |
| 6 + 14 + 22 + 30 | 3072-3583  |  |  |  |
| 7 + 15 + 23 + 31 | 3584-4095  |  |  |  |

| MODE 4  | SPEC CHANS |
|---------|------------|
| 0 + 16  | 0- 511     |
| 1 + 17  | 512-1023   |
| 2 + 18  | 1024-1535  |
| 3 + 19  | 1536-2047  |
| 4 + 20  | 2048-2559  |
| 5 + 21  | 2560-3071  |
| 6 + 22  | 3072-3583  |
| 7 + 23  | 3584-4095  |
| 8 + 24  | 4096-4607  |
| 9 + 25  | 4608-5119  |
| 10 + 26 | 5120-5631  |
| 11 + 27 | 5632-6143  |
| 12 + 28 | 6144-6655  |
| 13 + 29 | 6656-7167  |
| 14 + 30 | 7168-7679  |
| 15 + 31 | 7680-8191  |

• • • • •

#cbroadwe

| Page 14 08/ | 09/99 |
|-------------|-------|
|-------------|-------|

| MODE 5 | SPEC CHANS  |
|--------|-------------|
| 0      | 0- 511      |
| 1      | 512-1023    |
| 2      | 1024-1535   |
| 3      | 1536-2047   |
| 4      | 2048-2559   |
| 5      | 2560-3071   |
| 6      | 3072-3583   |
| 7      | 3584-4095   |
| 8      | 4096-4607   |
| 9      | 4608-5119   |
| 10     | 5120-5631   |
| 11     | 5632-6143   |
| 12     | 6144-6655   |
| 13     | 6656-7167   |
| 14     | 7168-7679   |
| 15     | 7680-8191   |
| 16     | 8192-8703   |
| 17     | 8704-9215   |
| 18     | 9216-9727   |
| 19     | 9728-10239  |
| 20     | 10240-10751 |
| 21     | 10752-11263 |
| 22     | 11264-11775 |
| 23     | 11776-12287 |
| 24     | 12288-12799 |
| 25     | 12800-13311 |
| 26     | 13312-13823 |
| 27     | 13824-14335 |
| 28     | 14336-14847 |
| 29     | 14848-15359 |
| 30     | 15360-15871 |
| 31     | 15872-16383 |