Transcript
Automatic Delay Calibration Method for Multi-channel CMOS Formatter Ahmed Rashid Syed, Ph.D. Staff Design Engineer Credence Systems Corporation 150 Baytech Dr. San Jose, CA 95134 Abstract
This paper describes the technique used for automatically calibrating vernier delay steps in Credence CMOS formatter--RIC/DICMOS. Embedded within the timing generation IC, RIC/DICMOS provides formatted levels and internal strobe markers for eight independent pinelectronics channels at up to 800 Mbps with +/- 81ps accuracy. Utilizing the on-chip, run-time auto-calibration circuit, all eight RIC/DICMOS vernier channels can be calibrated in parallel nearly 500 times faster than the prior generation formatters. Furthermore, the same calibration circuit can also provide 16-bit time-period or frequency counts for upto eight independent off-chip signals. 1
Introduction
To address the issue of ever-rising cost-per-pin of testing complex devices with hundreds of pins, such as microprocessors and complex SOCs, a highly integrated architecture for "formatters" was introduced at ITC 2003 [1]. Formatters [2], in the context of ATE hardware, pertain to a set of complimentary Drive and Response circuits (figure 1). Briefly, the Drive side generates accurate tim-
ing critical formatted levels required by pin-electronics to drive a DUT pin to a predefined logic state-- high, low or tri-state. The Response side, on the other hand, strobes the status of a DUT pin output, via the comparator block of the pin-electronics [3,4]. DUTs with high pin counts that need to be tested at speed, demand highly integrated ATE architecture to keep the overall cost of test low. Towards achieving this goal, the new architecture includes eight independent embedded formatter channels inside the timing generator IC. In addition to avoiding the usage of discrete formatter ICs (which incur substantial penalties both in terms of design and development, time and cost), the resulting savings in PCB real-estate through the deployment of a combined timing generator-formatter IC have allowed for more functionality and higher fanout to be incorporated inside the testhead at a lower overall cost. Such an architecture is particularly suitable for functional and structural testing of high pin count SOC devices with I/O data rates of 800 Mbps (Mega bits per second) or below, and with less than +/-100ps EPA (edge placement accuracy) testing requirement. New formatter named RIC/DICMOS [1] (Response and
Timing Generation IC CPU
Local Mem
Timing Generation Circuit
RIC/ DICMOS (Formatter)
DUT Pin Electronics
CLK
Figure 1: ATE Timing Critical Edges Generation ITC INTERNATIONAL TEST CONFERENCE 0-7803-8580-2/04 $20.00 Copyright 2004 IEEE
Paper 20.2 577
Drive Circuit in CMOS), implemented in 0.18 micron CMOS process, simultaneously provide drive or strobe stimuli for up to 8 independent DUT pins at once. Each complimentary drive and response circuit pair is contained in what is called a "slice" within the timing generation IC. Furthermore, the eight slices can be independently software programmed to deliver formatted levels at upto 800 MTPS (Mega Transitions Per Second). Conversely, each slice can strobe the status of a DUT pin at upto 800 MTPS in both edge and window strobe modes. R4X/D4X [5], bipolar predecessors of RIC/DICMOS, do provide 800 MTPS drive and strobe channels per discrete IC pair. On the other hand, RIC/ DICMOS provide eight such channels integrated inside one timing generator IC, while occupying one-twentieth the area (compared to an equivalent eight R4X/D4X pairs) on the PCB. Given the highly integrated, multichannel construction of RIC/DICMOS, the issue of calibrating sixty-four veriner delays (per IC) within acceptable time limits required special consideration. Additionally, since a large number of RIC/DICMOS may be deployed inside a typical test system, a calibration strategy was devised that would take advantage of the highly parallel Sapphire NPTM platform. Whereas R4X/D4X [5] verniers were calibrated using the 200 kHz register read/write bus, RIC/DICMOS calibration circuit utilizes triggers driven off the 400 MHz system clock to achieve nearly 500 times faster system-wide (parallel) formatters’ calibration. In this paper a description of RIC/DICMOS delay generation engines is followed by an architectural discussion and operation of the new calibration technique. An overview of how the calibration circuit can be deployed to measure time-period or frequency of off-chip arbitrary signals is also included in the end.
2
Physical Description and Operation
2.1
RIC/DICMOS
RIC/DICMOS (figure 2) consists of eight identical channels [1], with the following main components-• • • • • •
Run-time interface blocks called Event Logic Interface (ELICIF) Sixty-four independent Tapped Delay Elements (TDLEs) or "barrels" Auto-cal circuit block Local and Global Registers Drive/Strobe Logic Blocks Four Timing Measurement Unit Multiplexers (TMUMUXs).
The calibration technique involves ELICIF, TDLEs, Auto-cal, and Register blocks. Paper 20.2 578
2.1.1 ELICIF During run-time operation at 400MHz, two 8 bit halfwords are transmitted over successive clock cycles from the timing generation IC core to ELICIF blocks. The sixty-four 8-bit (parallel) run-time data access ports to RIC/ DICMOS are labelled as DA0.3, DB0.3, DC0..3 and DD0..3, ...,CD0..3, in figure 2. Buses DA..CD carry 8 bits of delay data, 2 bits of event type[1:0] and 2 status bit called "tag." The "tag" bits signal the end of a digital word. The four LSB delay data bits received in the second half of the timing word are ignored, as they represent timing resolution finer than the minimum delay step size that can be achieved by DICMOS. Essentially, ELICIF decodes the timing delay information (8 bits) as well as the event type (2 bits) needed to generate the desired formatted level, DHI/DINH or an internal strobe marker; Strobe Hi, Storbe Lo, Strobe Z or Strobe Off (see table 1). In calibration mode (see section 3), ELICIF also generates the necessary trigger pulses and or levels (see table 2) to control the built-in delay line auto-calibration circuit. Type[1:0]
Drive/Strobe
01
Tri-state (DINH = 1)/Strobe Z
10
Drive Low (DHI = 0)/Strobe Lo
11
Drive High (DHI = 1)/Strobe Hi
00
Not used/Strobe Off
Table 1: Run-time Event Types TDLE/TYPE[1:0] Calibration Trigger DA/1 1
STARTOSC
DB/1 0
GCOPOLL
DC/1 1
EVCNTSLCSEL
DD/0 0
READUPWORD
CA/1 1
READUPORLOWORD
CB/1 0
READLOWORD
CC/0 1
NEXTSLC
CD/0 0
STOPSOC
Table 2: Calibration Event Types
Data 16 SLICEENBL REGADDR REGCLK RW
LOCAL REGISTERS + AUTO-CALIBRATION
8 8 DC CD
DB DA
MR
BUS OUT A B C D
BUS IN
TDLECAL
GCOSTATUS ACH BCL
TOL
TDLE_BACK
TOLRESET
8 8
8 88
FDHI FDINH DCF
3
1
10
ESM
11 10
DA(7:0)
EVENT LOGIC IF - DA
VITST
10 10
DVOUTA
TDLEDA VOTST
1 2
DB(7:0)
EVENT LOGIC IF - DB
TYPEDA
10
DVOUTB
DELAY TRIG
DINH 2
2
DC(7:0)
EVENT LOGIC IF - DC
DHI
DB
10
ROSC
TYPEDB
DC
DRIVE/ RESPONSE LOGIC
DVOUTC
STFLA
DVIC DVOC
3
STFLB
2
CD(7:0)
EVENT LOGIC IF - CD
TYPEDC
10
STFLC
CD CVOUTD
STFLD
CVID CVOD CDCLK
TYPECD
2
ACH
CDCLK
BCL DINH
FAIL
1
1
4
8
8
32
DHI
CHANNEL REPEATED 8 X 8 8 8 8
FAIL[0..31] DHI[0..7] VI_SLICE0 DINH[0..7] VO_SLICE0
VI_SLICE7 VO_SLICE7 TMUCPAIN TMUCPBIN
Figure 2: RIC/DICMOS Architecture
TMUCQA
TMU MUX
TMUCQB TMUCBS(10:0) TMUCAS(10:0) TMUDBS(10:0) TMUDAS(10:0) 11
11
11
11
Data 16 REGADDR REGCLK
8
GLOBAL REGISTERS
RW
Paper 20.2 579
2.1.2 Tapped Delay Line Elements (TDLEs) or "Barrels" Each TDLE can move edges in approximately 20 ps steps between the edges of a 385 MHz--420 MHz system clock. It generates timing markers in response to the delay and event type information relayed to it by the ELICIF. Each TDLE consists of a string of PVT (process, voltage and temperature compensated) buffers to provide coarse (~200ps) "taps" for delays, and a timing interpolator circuit to generate ~20.833ps fine time intervals. The delay value is extracted by ELICIF from the 12-bit run-time word. Initially, the run-time delay values generated in the timing generator core are used to "look-up" delay codes in reallinearization tables (RT) that in turn allow a TDLE to generate the correct timing step. The retrigger rate for each TDLE is approximately 4 ns.
3
Auto-calibration
Programmable delays are established by connecting eight TDLEs in a loop, for a total of eight loops per RIC/ DICMOS (64 TDLEs). In order to initiate the calibration process, the user loads in a run-time test program which includes the event type (section 2.1.1) codes necessary for the generation of appropriate ELICIF calibration triggers. The loops are triggered (see figure 3) when the ring oscillator bit (RINGO) inside a local register is asserted, and an event to start oscillation (STARTOSC) is received from the ELICIF (table 2). The ring oscillator output is directly connected to an on-chip gate/event counter circuit for highly accurate time measurements (see architecture below). Actual TDLE delay is estimated to range from about 0.9ns to about 1.4ns under all program conditions. So the loop delay is estimated to be from about 7.2ns to about 9.7ns (assuming all TDLEs except the one being calibrated are set to their fastest position). The resolution of the time measurement is determined by the programmable width of the gate counter. Registers GATECOUNT and EVENTCOUNT are used to set the counters to a predefined count value, prior to commencing calibration. Upon reaching the desired gate count, the event counter shuts off and the time count value is available to be read off-chip into the capture memory. The user can poll the overflow status of the GATECOUNT register any time during the calibration process by issuing the trigger GCOPOLL. For diagnostic purposes, current gate and event count can always be obtained by reading back the contents of registers GATEk=2
T acquire =
10
∑
580
In calibration mode, bit settings for the delay line elements are auto generated via a state machine which adds the contents of the register TDLEOFFSET to that of TDLEINIT (figure 4). Both of these registers are loaded during chip setup, prior to calibration. For each of the TDLEs, the state machine advances the bit settings by the offset amount till the last bit combination is achieved. This process is sequentially repeated for all of the delay lines inside the loop. In case of RIC/DICMOS, once the “raw” event-count time values are linearized by software, the delay codes corresponding to the 256 out of 1024 linearized time values (spanning 0--2.5ns) are transferred back into the relinearization tables via CPU bus using block writes. To estimate how long it would take to acquire data for calibration, assume that each TDLE has a total of 10 selection bits and produces binary delays for each of the bit settings. If the minimum and maximum propagation delays are 0.9ns and 1.4ns (obtained via SPICE simulation over all relevant PVT conditions), respectively, then the resolution of the delay line is given by approximately r=0.4883 ps/bit. Now, each TDLE except the one which is being calibrated, is set to its minimum delay setting (900ps). Then, the time required to acquire the delays for all 1024 (210) values using a 14 bit gate counter is given by,
–1
k=0
Paper 20.2
COUNT and EVENTCOUNT, respectively. Once the gate counter has overflown, the contents of the EVENTCOUNT register belonging to each ring oscillator loop is read in turn, one loop at a time. The loop specific ELICIF trigger EVCNTSLCSEL latches the corresponding EVENTCOUNT register’s contents need to be transferred onto the read-back bus. Since the read-back bus is only 8-bit wide, the EVENTCOUNT register is read off chip in two 8-bit words. ELICIF generated control signals READUPWORD, READUPORLOWORD and READLOWORD help manage this transaction. Before an EVENTCOUNT register belonging to another loop can be read back, all the prior issued triggers need to be reset. A reset pulse, NEXTSLC, accomplishes this task. Finally, trigger STOPOSC can be used as an "kill switch," whenever the calibration routine needs to interrupted. STOPOSC also resets the GATECOUNT and EVENTCOUNT registers.
14
2 ( 7 ⋅ ( 900 ) + ( 900 + 0.4883 k ) )ps = 0.097s
External Input
Capture Memory (Off-chip)
Tz Clk
RINGO STARTOSC
CC1
GCO
CC0
GCO
ReadOk
GCOStatus
1
400MHz CCLK Glitch Generator
Counter
0 s0
UpWord Upper/ Lower LowWord Word
CC0 CC0
CC1 BS0
8
16-bit Event
TDLE0
Event Count Register
10 To Other Ring Oscillator Loops BS1
TDLE1
0
10 2 BS2
Counter
TDLE2
3
0
10
CCLK BS3
Gate
400MHz
TDLE3
GCO
Reset
1 s0 s1
10 CC1 BS4
Gate Count Register
CC0
TDLE4 10
BS5
TDLE5 10
BS6
Counter Configuration TDLE6
10
BS7
TDLE7 10
CC1 CC0
Action
0
0
Time count for external input
1
1
None
1
0
Time count for ring oscillator
0
1
Frequency count for external input
Figure 3: Time/Frequency Counter Circuit (x8 per RICDICMOS) Paper 20.2 581
Since there are 8 TDLEs in a loop, the total time required for acquiring data is simply 8x0.097s=0.776s. Furthermore, since there are eight such independent loops per RIC/DICMOS, it will still take 0.776s to acquire data for all of the TDLEs when calibrated in parallel. Indeed, it would take exactly 0.776s to acquire TDLE data for the entire system, when all timing generation ICs in the system are calibrated in parallel. This scheme improves the overall calibration data acquisition time by a factor of 500. It is important to note that this estimate for acquisition time doesn’t take into account data transfer off chip, or transfer of relinearized data from CPU to the relinearized tables. Also, with a 14-bit gate counter, a resolution
of 0.61ps/count (or, equivalently, 1.64 counts/ps) can be achieved. The delay time counts stored in the capture memory are sorted and linearized between 0 and 2.5ns by the software. Prior to run-time operation, for each of the TDLEs, the linearized delay values along with their corresponding bit settings are uploaded onto relinearization tables inside the timing generation IC. During runtime operation, user generated delay values (with a resolution of 20ps) are compared with the entries in the look-up tables to pick out the delay line bit settings
RINGO
12-bit Auto-Address Generator
GCO
Reset TDLEINIT
TDLEOFFSET 0 Max Code Int Fmtr 1 Max Code Ext Fmtr Ln
A11A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0
10 bit overflow detect
A0
4-bit TDLE counter Reset
Int_Ext
A1 A2 A3
Decoder 0
1
Int_Ext
A4 A5
L8 L7 L6 L5 L4 L3 L2 L1 L0
A8
Paper 20.2 582
12
BSn
A6 A7
Figure 4: Auto-Address Generation for RIC/DICMOS during Calibration (One per Ring Oscillator Loop)
(1 per Delay Line; 2-MSB ignored for RIC/DICMOS)
A9 A10 A11
XCDn (Clocked by RINGO || GCO) n = 0, 1, 2,...,8 (External) n = 0, 1, 2,...,7 (Internal)
which would most closely yield the programmed delay value. 3.1 Generalized Use of Gate and Frequency Counters Apart from measuring the loop delay during the calibration of TDLEs, the gate/frequency counter arrangement can also be used to provide either an frequency or time count for an arbitrary input signal. In diagnostic/calibration mode, register CNTRCONFIG is used to select either the external signal or the ring oscillator output for measurement purposes. The table in figure 3 shows the usage of the gate/event counter for various bit settings of the register CNTRCONFIG. For a RICDICMOS with 8 independent ACH/BCL input pairs (which can be alternatively used as pads for accepting signals from external sources during calibration), up to 8 independent external signals can be routed to each of the eight gate/frequency counter circuits (4 for drive side and 4 for compare side) inside RIC/DICMOS. Depending upon CNTRCONFIG bit settings, either a frequency or a time count value can be transferred to the off-chip capture memory or stored in register EVENTCOUNT for read back.
4
Future improvements
The upcoming version of RIC/DICMOS would include local temperature and voltage drift sensors, embedded in close proximity of the TDLEs. The goal is to correct for timing errors in the critical path due to the slow temperature and voltage variations, as the chip activity varies over time. The corresponding digitized correction terms would be added (or subtracted) from the run-time delay values generated by the relinearization tables. 5
Characterization Results
RIC/DICMOS characterization results (figures 5 and 6) show DHI EPA (Edge Placement Accuracy) of +/-45ps with respect to changes in voltage and temperature(+/35ps for a +/-2% voltage variation, and +/-10ps for a +/2C change in temperature). A minimum DHI pulsewidth of approximately 700ps was also achieved (figure 7). The typical jitter (figure 8) about a drive edge was measured at +/-10ps. Calibration data shown in figure 9 indicates a delay step linearity error of +/-25ps.
Figure 5: Change in DHI propagation delay w.r.t. +/-2% Vdd change
Paper 20.2 583
Figure 6: Change in DHI propagation delay w.r.t. 30C change in temperature
Figure 7: DHI Minimum pulse width Paper 20.2 584
Figure 8: DHI jitter
25ps 20ps 15ps
Error (ps)
10ps 5ps 0ps -5ps
0
20
40
60
80
100
120
-10ps -15ps -20ps -25ps
Delay Steps
Figure 9: Delay element linearity 6 Applications and benefits of high fanout formatter Many chipsets such as Advanced Micro Device’s AMD760 [6,7] require that different pins be tested at different frequencies (see figure 10). The North bridge interfaces that include the system bus and the PCI bus, require tests to be run at 266 MHz, and South bridge in-
terface has drive and strobe requirements at 33 MHz. Thus, for testing 168 high speed (266 MHz) and 56 lowspeed (33 MHz) pins, just 28 RIC/DICMOS pairs are required. Another typical example is of nVdia’s NV2A graphics processor used in the Microsoft XBox Video Game System. This is a logic device with 418 signal pins. FuncPaper 20.2 585
tional test vectors need to be run at 245 MHz, with some very large scan patterns at 5 MHz, as well. In order to accommodate the testing requirements of the moderate pincount of the device, toggling at moderate frequencies,
RIC/DICMOS provides unparalleled integration, with an associated reduction in test hardware cost, and at an edge placement accuracy of below +/- 81ps.
CPU
North bridge interfaces -- 266MHz System Bus [64 pins] -- 266 MHz DDR Bus [72 pins] -- 266 MHz AGP Bus [32 pins] -- 33 MHz PCI Bus [16 pins]
System Bus
Graphics Card
AGP
Host Bridge
Memory SDRAM
PCI Bus South bridge interfaces -- 33 MHz PCI Bus [16 pins] -- USB Bus [8 pins] -- EIDE Bus [8 pins] -- ISA Bus [8 pins]
EIDE I/O Ports
Figure 10: AMD 760 Chipset 7
Conclusion
RIC/DICMOS is a 800 Mbps drive and response circuit implemented in 0.18 micron CMOS, contained within Credence new timing generator IC. Its eight independent drive and strobe channels provide high fanout on testhead PCB, yielding dense hardware integration and an overall cost savings, without compromising the +/150 ps system EPA required of low to moderate cost test systems. RIC/DICMOS auto-calibration scheme is an innovation which allows the linearity of new formatter’s delay elements to be determined at least 500 times faster than its predecessors’. 8
Reference
[1] Ahmed R. Syed, "RIC/DICMOS-- Multichannel CMOS Formatter," ITC 2003, pages 175-184. [2] James A. Gasbarro and Mark A. Horowitz, "A Single-Chip, Functional Tester for VLSI Circuits," ISSCC February 1990, pages 84-85. [3] M. Barber, "Fundamental timing problems in testing MOS VLSI on modern ATE," IEEE Design & Test, August 1984, pages 482-489. [4] James A. Gasbarro and Mark A. Horowitz, "Integrated Pin Electronics for VLSI Functional Testers," IEEE J. Solid State Circuits, vol. 24, no. 2, pages 331-337. [5] Ahmed R. Syed, "R4X/D4X-- Formatters for Flexible Test System Architecture," ITC 2002, pages 885-893. Paper 20.2 586
Peripheral Controller ISA Bus
[6] Scott Wasson and Andrew Brown, "AMD’s 760 chipset with DDR SDRAM Rambuster," The Tech Report; 2000Q4 [7] A.T. Sivaram, "Split Timing Mode (STM)- Answer to Dual Frequency Domain Testing", ITC 2001, pages 140-147.