Variable resolution Associative Memory for the Fast Tracker ATLAS upgrade



#### Alberto Annovi

Istituto Nazionale di Fisica Nucleare Laboratori Nazionali di Frascati



#### Fast tracking in Pixel and SCT detectors

Detector and trigger coverage up to  $|\eta| < 2.5$ 

FTK processes all level-1 accepted events (100kHz), it provides tracks for level-2 algorithms Output: all tracks down to  $p_T>1$  GeV. Typical latency ~100 $\mu$ s

Advantages: high-bandwidth connection with detector & HW optimized for the specific tasks



## FTK part 1: Associative Memory

• AM Pattern recognition – find track candidates with enough Si hits



- O(10<sup>9</sup>) prestored patterns simultaneously see the silicon hits leaving the detector at full speed.
- This pattern recognition step is essential to reduce the combinations for the following fit procedure.
- The AM outputs patterns that match 7 out-of 8 layers: called roads.

## FTK part 2: Linearized Track Fit

Over a narrow region in the detector, equations linear in the local silicon hit coordinates give resolution nearly as good as a time-consuming helical fit.



$$p_i = \sum_{j=1}^{16} a_{ij} x_j + b_i$$



16D coord. space of hit combinations

5D track surface

- $p_i$ 's are the helix parameters and  $\chi^2$  components.
- $-x_i$ 's are the hit coordinates in the silicon layers.
- $a_{ij} \& b_i$  are prestored constants determined from full simulation or real data tracks.
  - » The range of the linear fit is a "sector" which consists of a single silicon module in each detector layer.
- This is VERY fast in FPGA DSPs approx 1 Gfit/s/FPGA
- Based on Principal component analysis

j.nima.2003.11.078, and H. Wind, CERN-EP-INT-81-12-REV, 1982

#### **Associative Memories**

#### • First AM for HEP idea

- Search its entire memory at each clock cycle: fast pattern recognition
- Inspired from Content Addressable Memories (CAM)
- M. Dell'Orso, L. Ristori
- NIM A **278, 436 (1989)**
- First application: SVT @ CDF
  - Seeded by drift chamber tracks
  - Look for associated Silicon hits at radii 2.5-10.5cm
  - Started with 384k patterns
  - Upgraded to 6M patterns

#### Extrapolate to inner silicon layers



We discuss the architecture of a device based on the concept of associative memory designed to solve the track finding problem, typical of high energy physics experiments, in a time span of a few microseconds even for very high multiplicity events. This "machine" is implemented as a large array of custom VLSI chips. All the chips are equal and each of them stores a number of "patterns". All the patterns in all the chips are compared in parallel to the data coming from the detector while the detector is being read out.

A. Annovi - September 24th, 2013

#### AM for ATLAS

- Silicon only trackers
- High luminosity → high detector occupancy
- Thousand tracks / bunch crossing <µ>=20

- For AM to reduce information
  - Needs very high resolution
  - Needs billions of patterns
  - Needs faster clock of 100MHz
  - Can profit from today electronics
  - Requires O(8k) of AM chips
  - Need also a new kind of Associative Memory!!

#### 25 reconstructed vertex + $Z \rightarrow 2\mu$

Up to <µ>=80 by 2019; <µ>=200 by 2023 HL-LHC [CERN-LHCC-2012-022]



A. Annovi - September 24th, 2013

#### AM working principle



Pattern matching is completed as soon as all hits are loaded. Data arriving at different times is compared in parallel with all patterns. Unique to AM chip: look for correlation of data received at different times.

A. Annovi - September 24th, 2013

## AM technological evolution



• (90's) Full custom VLSI chip - 0.7μm (INFN-Pisa)

- 128 patterns, 6x12bit words each, 30MHz
- F. Morsani et al., IEEE Trans. on Nucl. Sci., vol. 39 (1992)



Alternative FPGA implementation of SVT AM chip

P. Giannetti et al., Nucl. Intsr. and Meth., vol. A413/2-3, (1998)

G Magazzù, 1<sup>st</sup> std cell project presented @ LHCC (1999)



**Standard Cell 0.18**  $\mu m \rightarrow 5000$  pattern/AM chip SVT upgrade total: 6M pattern, 40MHz A. Annovi et al., **IEEE TNS,** Vol 53, Issue 4, Part 2, **2006** 





AMchip04 –65nm technology, std cell & full custom, 100MHz Power/pattern/MHz ~30 times less. Pattern density x12. First variable resolution implementation!

F. Alberti *et al 2013 JINST 8 C01040, doi:10.1088/1748-0221/8/01/C01040* 

## AM technological evolution







AMchip04 –65nm technology, std cell & full custom, 100MHz Power/pattern/MHz ~30 times less. Pattern density x12. First variable resolution implementation!

F. Alberti *et al 2013 JINST 8 C01040, doi:10.1088/1748-0221/8/01/C01040* 

## Generatig the pattern bank



High efficiency with less patterns (hardware) BUT more fakes More patterns (hardware) for same efficiency less fakes Fakes are workload for track fitter

#### Pattern bank size and efficiency



# of patterns in AM chips (barrel only, 45  $\phi$  degress)

<# matched patterns/event @ 3E34> = 342k <# matched patterns/event @ 3E34> = 40k

# roads (large fake fraction) represents the workload for the track fitter

14th ICATPP Conference

#### Pattern bank size and efficiency



#### Variable resolution with "don't care" (DC) bits



| l |    |          |    |     |
|---|----|----------|----|-----|
|   | -  | <b>F</b> | FF |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    | - 2      | -  |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   |    |          |    |     |
|   | 88 |          |    |     |
|   |    |          |    | UT. |

- For each layer: a "bin" is identified by a number with DC bits (X)
- Least significant bits of "bin" number can use 3 states (0, 1, X)
- The "bin" number is stored in the Associative Memory
- The DC bits can be used to OR neighborhood high-resolution bins, which differ by few bits, without increasing the number of patterns

Pixels:

| 0  | 1  | 2  | 3  | 4  | 5  | 6  | 7  |
|----|----|----|----|----|----|----|----|
| 8  | 9  | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 |
| 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |

Using binary format "01010" selects bin 10 "0001x" selects bins 2 or 3 "1x000" selects bins 16 or 24 "0x11x" selects bins 6,7,14, or 15 "111xx" selects bins 28 to 31

#### AMCHIP04: VARIABLE RESOLUTION



Implemented with the "don't care" feature: inspired by the Ternary CAMs

- Increases the width of a pattern only when needed
- Fully programmable
- Wider patterns can be used in high occupancy regions, smaller patterns in low coverage regions (where the number of trajectories is low, thus reducing the fakes)
- The choice of wider or narrower width patterns is made layer by layer with simulation

# The patterns: a different point of view



#### Many-bits variable resolution

We can use multiple DC bits to increase the compression factor (up to 6 per layer in a AMchip04)

1-bit variable resolution



3-bit variable resolution

#### Many-bits variable resolution

We can use multiple DC bits to increase the compression factor (up to 6 per layer in a AMchip04)



#### Many-bits variable resolution

We can use multiple DC bits to increase the compression factor (up to 6 per layer in a AMchip04)



## Performance (max 1 DC/layer)

AM thin channel grouping: Pixels: 12 along  $\phi$ , 36 along  $\eta$ Strips: 10 strips



| Pileup<br>events | config            | Max # DC<br>bits / layer | # roads<br>/ 45° | #<br>patterns |
|------------------|-------------------|--------------------------|------------------|---------------|
| 75               | AM large patterns | 0                        | 53500            | 138M          |
| 75               | AM w/ DC          | 1                        | 8250             | 138M          |
| 75               | AM thin patterns  | 0                        | 5950             | 384M          |

- Pattern bank reduction factor: ~ 3
- AM with DC capability reduces the fakes by a large factor: ~ 7
- Good performance with almost same HW

#### Pattern shape optimization



Figure 3: Effect of different choices of pattern shape: at fixed number of patterns the shape obtained with the formula described in the text yield the smallest volume (continuous line)

#### A working configuration for the Fast Tracker

- High resolution patterns: (15x36)<sub>pix</sub>x16<sub>sct</sub>
  - Pixels: 15 channels along  $\phi$ , 36 ch. along  $\eta$
  - Strips: 16 strips

DC bits group detector channels together and increase the pattern resolution

26

55

- Background events with 69 superimposed pp collisions
  - Instantaneous luminosity 3\*10<sup>34</sup> Hz/cm<sup>2</sup>
- Hardware constraints (for each of 64  $\eta$ - $\phi$  towers) •
  - # AM patterns < 16.8 \* 10<sup>6</sup>

 $(30x72)_{\text{pix}}x32_{\text{sct}}$   $2_{\text{pix}}x1_{\text{sct}}$ 

 $(30x72)_{pix}x32_{sct}$   $2_{pix}x1_{sct}$ 

- # roads/event < 16 \* 10<sup>3</sup>
- # fits/event < 80 \* 10<sup>3</sup>

#### Work load for track fitter Max # **# AM** Efficien fits / evt \* Coarse roads / evt \* 10<sup>3</sup> **10**<sup>3</sup> resolution DC bits pattern **cy** % \* 106 roads / layer

93.3%

91.2%

3.2

6.9

Barrel

Endcap

16.8

16.8

#### Summary

- An innovative algorithm has been introduced in the new FTK AM chips
  - The pattern resolution can be configured layer-by-layer and patternby-pattern
  - The use of DC bits increases the resolution only where needed
  - High rejection of fake coincidences → the number of roads out of AM is reduced greatly
- Limited "cost": AM chip area ~+15%; power ~+5%
- Equivalent to a factor 3-5 (or more) extra patterns
  - Not fully exploited yet
- Any coincidence based trigger can profit from this feature