10.1: A 4 GSample/s 8b ADC in 0.35-um CMOS

Ken Poulton, Robert Neff, Art Muto,
Wei Liu*, Andy Burstein**, Mehrdad Heshami***

Agilent Technologies, Palo Alto, CA

*Agilent Technologies, Colorado Springs, CO

** now with Volterra Semiconductor, Fremont, CA

***now with Virata, Cupertino, CA

Contact: Ken Poulton 650-485-8461 FAX: 650-485-3637 poulton@labs.agilent.com
Presenter: Robert Neff 650-485-6220 FAX: 650-485-3637 neff@labs.agilent.com

Figures are included here for reference; final figures are submitted as TIFF files.

Abstract

A 4-GSample/s, 8-bit ADC dissipates 4.6 W in 0.35-um CMOS. It creates 32 interleaved clocks with 1.1-ps rms accuracy to drive 32 current-mode pipeline sub-ADCs. The ADC runs at up to 5.9 GSample/s. With calibration at 4 GSample/s, it achieves an accuracy of 7.0 effective bits at DC and 6.1 effective bits on a 1-GHz input.

Previous ADCs for realtime high-bandwidth waveform capture have been implemented in bipolar or III-V technologies [1][2]. This work brings CMOS into this arena with a 4 GSample/s (4 GSa/s) 8-bit ADC implemented in standard digital 0.35-um CMOS.
The strength of CMOS is in high integration levels, while its weakness is in device speed and accuracy. Therefore, this design makes extensive use of parallelism by interleaving 32 pipeline ADCs and uses comprehensive calibration of both signal voltages and timing.

The architecture of the ADC is shown in Figure 1.

The ADC achieves a 4 GSa/s sample rate by using 32 parallel ADC “slices” running at 125 MSa/s each, with each slice delayed 250 ps from the previous one. We use a current-mode pipeline ADC architecture because of its low power consumption and small area and because it needs no linear resistors or capacitors.
To allow the use of small (and therefore, poorly-matched) CMOS devices, the pipeline uses a reduced-radix architecture. A radix conversion (RC) block following each pipeline converts the 12 reduced-radix bits to 8 binary bits. Output multiplexors combine the data into 4 output streams, each at 1 GSa/s.

The analog input signal goes directly into 32 track-and-hold (T/H) circuits (one for each pipeline); these are differential NMOS pass-FET track-and-hold circuits (Figure 2). Small input signal swings of +−125 mV on each input and a low common-mode voltage of 0.5 V maximize the bandwidth of the NMOS sampling FETs. Charge compensation FETs are used to cancel the charge injected by the C<sub>GD</sub> of the sampling FETs and shorting devices are used to reset the hold nodes at the end of each cycle to minimize signal-dependent kickback onto the input pads.

![FIGURE 2. Input T/H](image)

A voltage-to-current conversion buffer (V/I) buffers the hold node and drives the current-mode pipeline. The V/I is a differential pair with a replica bias circuit set up to control the transconductance.
At 4 GSa/s, the 32 T/Hs need to operate sequentially at 250-ps intervals. To achieve timing-limited accuracy of 6 effective bits on a 1 GHz input signal, their clocks need an accuracy of 1.2 ps rms, including both jitter and systematic timing errors.

The 250-ps edge spacing is created by a delay-locked loop (DLL) locked to the input clock. In principle, we could use a 16-stage DLL with a 125 MHz clock, but the power required to achieve a given level of jitter varies as the square of the total delay. Instead (Figure 3) we use a 500 MHz input clock to reduce the DLL delay line length from 4 ns to 1 ns, reducing the power required for the DLL by 16x. We follow each of the 8 DLL outputs with a divide-by-four to get the 32 phases required.
To compensate for mismatches, the delay of each clock path can be adjusted digitally with 8 main adjustments for the 8 DLL outputs and 32 minor adjustments after the divide-by-four blocks.

For high speed and low power in the pipeline ADC, we use open-loop stages and small devices. To accommodate the poor accuracy of such circuits, we use a reduced-radix approach with 12 one-bit stages, each with gain of about 1.6x. This introduces
redundancy to allow gain and offset errors as large as 12% of the input range to be corrected by succeeding stages.

The pipeline stage (Figure 4) is based on a current-mode T/H. When the switches are closed, the NMOS devices form a differential current mirror which sends current $I_{\text{out}} = 1.6*I_{\text{in}}$ to the output. When the switch is opened, the $C_{GS}$ of the output-side FETs act as hold capacitors and the same $I_{\text{out}}$ continues to flow. The track-and-hold output current is summed with the one-bit DAC output to form the input current for the next stage.

![FIGURE 4. Simplified schematic of the current-mode pipeline stage](image)

The advantages of this circuit are:
- It does not require any linear R’s or C’s.
- It fits in a small area.
- It has a very low power/sample_rate ratio for an 8-bit ADC.

Including sampler, V/I, pipeline and radix converter, the per-slice area is 0.30 mm$^2$ and the power is 75 mW.
The logic circuits use complementary Source Coupled Logic (SCL) to minimize digital noise. The data outputs have differential ECL-like levels.

The layout of the 7.14 x 4.04 mm ADC chip is seen in Figure 5. The analog input is on the bottom and signals flow from the center to the data outputs on the left and right edges.

Calibration is controlled by software. For voltage calibration, a ramp waveform is applied to the input. A best-fit algorithm uses the raw radix-1.6 output bits to determine the bit weights for each of the ADC pipeline stages. These weights are then loaded into the radix conversion circuit; the ADC then will produce 8-bit binary data. Bit weights are computed separately for each slice, correcting per-stage gain variations, as well as slice-to-slice gain and offset mismatches.
The system environment includes a pass-through lookup table to reduce static nonlinearities. This is used to remove 3rd harmonic distortion which is introduced in the V/I stage.

For timing calibration, a pulse waveform is applied to the input. The relative timing of the slices is extracted by a separate Fourier analysis for each of the 32 ADC slices and the clock generator’s digital timing adjustments are set to achieve a 250-ps delay from slice to slice.

The residual systematic slice-to-slice timing errors are 0.5 ps rms, well below the thermal jitter level of 0.9 ps rms of a single slice. The total ADC jitter is 1.1 ps rms. DNL is +-0.20 LSBs. Intrinsic INL is +-0.75 LSBs; INL with the lookup table is +-0.25 LSBs. The amplitude response shows a 3-dB bandwidth of 1.4 GHz when driven from a doubly-terminated 50-ohm line.

Accuracy is shown in Figure 6. The rolloff in effective bits at high frequencies is due to the 1.1 ps rms total jitter.
Table 1 shows the major results.

**TABLE 1. ADC Results**

<table>
<thead>
<tr>
<th>Sample Rate - nominal</th>
<th>4 GSa/s</th>
</tr>
</thead>
<tbody>
<tr>
<td>maximum</td>
<td>5.9 GSa/s</td>
</tr>
<tr>
<td>Resolution</td>
<td>8 bits</td>
</tr>
<tr>
<td>INL (raw)</td>
<td>+/-0.75 LSBs</td>
</tr>
<tr>
<td>INL (with lookup table)</td>
<td>+/-0.25 LSBs</td>
</tr>
<tr>
<td>DNL</td>
<td>+/-0.20 LSBs</td>
</tr>
</tbody>
</table>

![Graph showing effective bits vs. input frequency](image)

**FIGURE 6. Effective bits vs. input frequency**
1-dB BW  &  1.1 GHz  
3-dB BW  &  1.4 GHz  

<table>
<thead>
<tr>
<th>Accuracy</th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>@ 200 MHz input</td>
<td>7.0 effective bits</td>
</tr>
<tr>
<td>@ 1 GHz input</td>
<td>6.1 effective bits</td>
</tr>
</tbody>
</table>

| Jitter | 1.1 ps rms  |
| Input Range | 0.25 Vpk differential  |
| Input Capacitance | 2.0 pF  |
| Power (3.3 V) | 4.6 W  |
| Chip Size | 7.14 x 4.04 mm  |
| Technology | 0.35 μm CMOS  |
| Transistors | 300,000  |
| Package | 256-ball TBGA + heatsink  |

The sample rate is three times faster than any CMOS ADC of 6 or more bits [3,4] and the accuracy with a 1 GHz input is better than any reported Nyquist ADC in any technology [2].

References


