Preview only show first 10 pages with watermark. For full document please download

Ate Timing Critical Edges Generation Automatic Delay

   EMBED


Share

Transcript

Automatic Delay Calibration Method for Multi-channel CMOS Formatter Ahmed Rashid Syed, Ph.D. Staff Design Engineer Credence Systems Corporation 150 Baytech Dr. San Jose, CA 95134 Abstract This paper describes the technique used for automatically calibrating vernier delay steps in Credence CMOS formatter--RIC/DICMOS. Embedded within the timing generation IC, RIC/DICMOS provides formatted levels and internal strobe markers for eight independent pinelectronics channels at up to 800 Mbps with +/- 81ps accuracy. Utilizing the on-chip, run-time auto-calibration circuit, all eight RIC/DICMOS vernier channels can be calibrated in parallel nearly 500 times faster than the prior generation formatters. Furthermore, the same calibration circuit can also provide 16-bit time-period or frequency counts for upto eight independent off-chip signals. 1 Introduction To address the issue of ever-rising cost-per-pin of testing complex devices with hundreds of pins, such as microprocessors and complex SOCs, a highly integrated architecture for "formatters" was introduced at ITC 2003 [1]. Formatters [2], in the context of ATE hardware, pertain to a set of complimentary Drive and Response circuits (figure 1). Briefly, the Drive side generates accurate tim- ing critical formatted levels required by pin-electronics to drive a DUT pin to a predefined logic state-- high, low or tri-state. The Response side, on the other hand, strobes the status of a DUT pin output, via the comparator block of the pin-electronics [3,4]. DUTs with high pin counts that need to be tested at speed, demand highly integrated ATE architecture to keep the overall cost of test low. Towards achieving this goal, the new architecture includes eight independent embedded formatter channels inside the timing generator IC. In addition to avoiding the usage of discrete formatter ICs (which incur substantial penalties both in terms of design and development, time and cost), the resulting savings in PCB real-estate through the deployment of a combined timing generator-formatter IC have allowed for more functionality and higher fanout to be incorporated inside the testhead at a lower overall cost. Such an architecture is particularly suitable for functional and structural testing of high pin count SOC devices with I/O data rates of 800 Mbps (Mega bits per second) or below, and with less than +/-100ps EPA (edge placement accuracy) testing requirement. New formatter named RIC/DICMOS [1] (Response and Timing Generation IC CPU Local Mem Timing Generation Circuit RIC/ DICMOS (Formatter) DUT Pin Electronics CLK Figure 1: ATE Timing Critical Edges Generation ITC INTERNATIONAL TEST CONFERENCE 0-7803-8580-2/04 $20.00 Copyright 2004 IEEE Paper 20.2 577 Drive Circuit in CMOS), implemented in 0.18 micron CMOS process, simultaneously provide drive or strobe stimuli for up to 8 independent DUT pins at once. Each complimentary drive and response circuit pair is contained in what is called a "slice" within the timing generation IC. Furthermore, the eight slices can be independently software programmed to deliver formatted levels at upto 800 MTPS (Mega Transitions Per Second). Conversely, each slice can strobe the status of a DUT pin at upto 800 MTPS in both edge and window strobe modes. R4X/D4X [5], bipolar predecessors of RIC/DICMOS, do provide 800 MTPS drive and strobe channels per discrete IC pair. On the other hand, RIC/ DICMOS provide eight such channels integrated inside one timing generator IC, while occupying one-twentieth the area (compared to an equivalent eight R4X/D4X pairs) on the PCB. Given the highly integrated, multichannel construction of RIC/DICMOS, the issue of calibrating sixty-four veriner delays (per IC) within acceptable time limits required special consideration. Additionally, since a large number of RIC/DICMOS may be deployed inside a typical test system, a calibration strategy was devised that would take advantage of the highly parallel Sapphire NPTM platform. Whereas R4X/D4X [5] verniers were calibrated using the 200 kHz register read/write bus, RIC/DICMOS calibration circuit utilizes triggers driven off the 400 MHz system clock to achieve nearly 500 times faster system-wide (parallel) formatters’ calibration. In this paper a description of RIC/DICMOS delay generation engines is followed by an architectural discussion and operation of the new calibration technique. An overview of how the calibration circuit can be deployed to measure time-period or frequency of off-chip arbitrary signals is also included in the end. 2 Physical Description and Operation 2.1 RIC/DICMOS RIC/DICMOS (figure 2) consists of eight identical channels [1], with the following main components-• • • • • • Run-time interface blocks called Event Logic Interface (ELICIF) Sixty-four independent Tapped Delay Elements (TDLEs) or "barrels" Auto-cal circuit block Local and Global Registers Drive/Strobe Logic Blocks Four Timing Measurement Unit Multiplexers (TMUMUXs). The calibration technique involves ELICIF, TDLEs, Auto-cal, and Register blocks. Paper 20.2 578 2.1.1 ELICIF During run-time operation at 400MHz, two 8 bit halfwords are transmitted over successive clock cycles from the timing generation IC core to ELICIF blocks. The sixty-four 8-bit (parallel) run-time data access ports to RIC/ DICMOS are labelled as DA0.3, DB0.3, DC0..3 and DD0..3, ...,CD0..3, in figure 2. Buses DA..CD carry 8 bits of delay data, 2 bits of event type[1:0] and 2 status bit called "tag." The "tag" bits signal the end of a digital word. The four LSB delay data bits received in the second half of the timing word are ignored, as they represent timing resolution finer than the minimum delay step size that can be achieved by DICMOS. Essentially, ELICIF decodes the timing delay information (8 bits) as well as the event type (2 bits) needed to generate the desired formatted level, DHI/DINH or an internal strobe marker; Strobe Hi, Storbe Lo, Strobe Z or Strobe Off (see table 1). In calibration mode (see section 3), ELICIF also generates the necessary trigger pulses and or levels (see table 2) to control the built-in delay line auto-calibration circuit. Type[1:0] Drive/Strobe 01 Tri-state (DINH = 1)/Strobe Z 10 Drive Low (DHI = 0)/Strobe Lo 11 Drive High (DHI = 1)/Strobe Hi 00 Not used/Strobe Off Table 1: Run-time Event Types TDLE/TYPE[1:0] Calibration Trigger DA/1 1 STARTOSC DB/1 0 GCOPOLL DC/1 1 EVCNTSLCSEL DD/0 0 READUPWORD CA/1 1 READUPORLOWORD CB/1 0 READLOWORD CC/0 1 NEXTSLC CD/0 0 STOPSOC Table 2: Calibration Event Types Data 16 SLICEENBL REGADDR REGCLK RW LOCAL REGISTERS + AUTO-CALIBRATION 8 8 DC CD DB DA MR BUS OUT A B C D BUS IN TDLECAL GCOSTATUS ACH BCL TOL TDLE_BACK TOLRESET 8 8 8 88 FDHI FDINH DCF 3 1 10 ESM 11 10 DA(7:0) EVENT LOGIC IF - DA VITST 10 10 DVOUTA TDLEDA VOTST 1 2 DB(7:0) EVENT LOGIC IF - DB TYPEDA 10 DVOUTB DELAY TRIG DINH 2 2 DC(7:0) EVENT LOGIC IF - DC DHI DB 10 ROSC TYPEDB DC DRIVE/ RESPONSE LOGIC DVOUTC STFLA DVIC DVOC 3 STFLB 2 CD(7:0) EVENT LOGIC IF - CD TYPEDC 10 STFLC CD CVOUTD STFLD CVID CVOD CDCLK TYPECD 2 ACH CDCLK BCL DINH FAIL 1 1 4 8 8 32 DHI CHANNEL REPEATED 8 X 8 8 8 8 FAIL[0..31] DHI[0..7] VI_SLICE0 DINH[0..7] VO_SLICE0 VI_SLICE7 VO_SLICE7 TMUCPAIN TMUCPBIN Figure 2: RIC/DICMOS Architecture TMUCQA TMU MUX TMUCQB TMUCBS(10:0) TMUCAS(10:0) TMUDBS(10:0) TMUDAS(10:0) 11 11 11 11 Data 16 REGADDR REGCLK 8 GLOBAL REGISTERS RW Paper 20.2 579 2.1.2 Tapped Delay Line Elements (TDLEs) or "Barrels" Each TDLE can move edges in approximately 20 ps steps between the edges of a 385 MHz--420 MHz system clock. It generates timing markers in response to the delay and event type information relayed to it by the ELICIF. Each TDLE consists of a string of PVT (process, voltage and temperature compensated) buffers to provide coarse (~200ps) "taps" for delays, and a timing interpolator circuit to generate ~20.833ps fine time intervals. The delay value is extracted by ELICIF from the 12-bit run-time word. Initially, the run-time delay values generated in the timing generator core are used to "look-up" delay codes in reallinearization tables (RT) that in turn allow a TDLE to generate the correct timing step. The retrigger rate for each TDLE is approximately 4 ns. 3 Auto-calibration Programmable delays are established by connecting eight TDLEs in a loop, for a total of eight loops per RIC/ DICMOS (64 TDLEs). In order to initiate the calibration process, the user loads in a run-time test program which includes the event type (section 2.1.1) codes necessary for the generation of appropriate ELICIF calibration triggers. The loops are triggered (see figure 3) when the ring oscillator bit (RINGO) inside a local register is asserted, and an event to start oscillation (STARTOSC) is received from the ELICIF (table 2). The ring oscillator output is directly connected to an on-chip gate/event counter circuit for highly accurate time measurements (see architecture below). Actual TDLE delay is estimated to range from about 0.9ns to about 1.4ns under all program conditions. So the loop delay is estimated to be from about 7.2ns to about 9.7ns (assuming all TDLEs except the one being calibrated are set to their fastest position). The resolution of the time measurement is determined by the programmable width of the gate counter. Registers GATECOUNT and EVENTCOUNT are used to set the counters to a predefined count value, prior to commencing calibration. Upon reaching the desired gate count, the event counter shuts off and the time count value is available to be read off-chip into the capture memory. The user can poll the overflow status of the GATECOUNT register any time during the calibration process by issuing the trigger GCOPOLL. For diagnostic purposes, current gate and event count can always be obtained by reading back the contents of registers GATEk=2 T acquire = 10 ∑ 580 In calibration mode, bit settings for the delay line elements are auto generated via a state machine which adds the contents of the register TDLEOFFSET to that of TDLEINIT (figure 4). Both of these registers are loaded during chip setup, prior to calibration. For each of the TDLEs, the state machine advances the bit settings by the offset amount till the last bit combination is achieved. This process is sequentially repeated for all of the delay lines inside the loop. In case of RIC/DICMOS, once the “raw” event-count time values are linearized by software, the delay codes corresponding to the 256 out of 1024 linearized time values (spanning 0--2.5ns) are transferred back into the relinearization tables via CPU bus using block writes. To estimate how long it would take to acquire data for calibration, assume that each TDLE has a total of 10 selection bits and produces binary delays for each of the bit settings. If the minimum and maximum propagation delays are 0.9ns and 1.4ns (obtained via SPICE simulation over all relevant PVT conditions), respectively, then the resolution of the delay line is given by approximately r=0.4883 ps/bit. Now, each TDLE except the one which is being calibrated, is set to its minimum delay setting (900ps). Then, the time required to acquire the delays for all 1024 (210) values using a 14 bit gate counter is given by, –1 k=0 Paper 20.2 COUNT and EVENTCOUNT, respectively. Once the gate counter has overflown, the contents of the EVENTCOUNT register belonging to each ring oscillator loop is read in turn, one loop at a time. The loop specific ELICIF trigger EVCNTSLCSEL latches the corresponding EVENTCOUNT register’s contents need to be transferred onto the read-back bus. Since the read-back bus is only 8-bit wide, the EVENTCOUNT register is read off chip in two 8-bit words. ELICIF generated control signals READUPWORD, READUPORLOWORD and READLOWORD help manage this transaction. Before an EVENTCOUNT register belonging to another loop can be read back, all the prior issued triggers need to be reset. A reset pulse, NEXTSLC, accomplishes this task. Finally, trigger STOPOSC can be used as an "kill switch," whenever the calibration routine needs to interrupted. STOPOSC also resets the GATECOUNT and EVENTCOUNT registers. 14 2 ( 7 ⋅ ( 900 ) + ( 900 + 0.4883 k ) )ps = 0.097s External Input Capture Memory (Off-chip) Tz Clk RINGO STARTOSC CC1 GCO CC0 GCO ReadOk GCOStatus 1 400MHz CCLK Glitch Generator Counter 0 s0 UpWord Upper/ Lower LowWord Word CC0 CC0 CC1 BS0 8 16-bit Event TDLE0 Event Count Register 10 To Other Ring Oscillator Loops BS1 TDLE1 0 10 2 BS2 Counter TDLE2 3 0 10 CCLK BS3 Gate 400MHz TDLE3 GCO Reset 1 s0 s1 10 CC1 BS4 Gate Count Register CC0 TDLE4 10 BS5 TDLE5 10 BS6 Counter Configuration TDLE6 10 BS7 TDLE7 10 CC1 CC0 Action 0 0 Time count for external input 1 1 None 1 0 Time count for ring oscillator 0 1 Frequency count for external input Figure 3: Time/Frequency Counter Circuit (x8 per RICDICMOS) Paper 20.2 581 Since there are 8 TDLEs in a loop, the total time required for acquiring data is simply 8x0.097s=0.776s. Furthermore, since there are eight such independent loops per RIC/DICMOS, it will still take 0.776s to acquire data for all of the TDLEs when calibrated in parallel. Indeed, it would take exactly 0.776s to acquire TDLE data for the entire system, when all timing generation ICs in the system are calibrated in parallel. This scheme improves the overall calibration data acquisition time by a factor of 500. It is important to note that this estimate for acquisition time doesn’t take into account data transfer off chip, or transfer of relinearized data from CPU to the relinearized tables. Also, with a 14-bit gate counter, a resolution of 0.61ps/count (or, equivalently, 1.64 counts/ps) can be achieved. The delay time counts stored in the capture memory are sorted and linearized between 0 and 2.5ns by the software. Prior to run-time operation, for each of the TDLEs, the linearized delay values along with their corresponding bit settings are uploaded onto relinearization tables inside the timing generation IC. During runtime operation, user generated delay values (with a resolution of 20ps) are compared with the entries in the look-up tables to pick out the delay line bit settings RINGO 12-bit Auto-Address Generator GCO Reset TDLEINIT TDLEOFFSET 0 Max Code Int Fmtr 1 Max Code Ext Fmtr Ln A11A10 A9 A8 A7 A6 A5 A4 A3 A2 A1 A0 10 bit overflow detect A0 4-bit TDLE counter Reset Int_Ext A1 A2 A3 Decoder 0 1 Int_Ext A4 A5 L8 L7 L6 L5 L4 L3 L2 L1 L0 A8 Paper 20.2 582 12 BSn A6 A7 Figure 4: Auto-Address Generation for RIC/DICMOS during Calibration (One per Ring Oscillator Loop) (1 per Delay Line; 2-MSB ignored for RIC/DICMOS) A9 A10 A11 XCDn (Clocked by RINGO || GCO) n = 0, 1, 2,...,8 (External) n = 0, 1, 2,...,7 (Internal) which would most closely yield the programmed delay value. 3.1 Generalized Use of Gate and Frequency Counters Apart from measuring the loop delay during the calibration of TDLEs, the gate/frequency counter arrangement can also be used to provide either an frequency or time count for an arbitrary input signal. In diagnostic/calibration mode, register CNTRCONFIG is used to select either the external signal or the ring oscillator output for measurement purposes. The table in figure 3 shows the usage of the gate/event counter for various bit settings of the register CNTRCONFIG. For a RICDICMOS with 8 independent ACH/BCL input pairs (which can be alternatively used as pads for accepting signals from external sources during calibration), up to 8 independent external signals can be routed to each of the eight gate/frequency counter circuits (4 for drive side and 4 for compare side) inside RIC/DICMOS. Depending upon CNTRCONFIG bit settings, either a frequency or a time count value can be transferred to the off-chip capture memory or stored in register EVENTCOUNT for read back. 4 Future improvements The upcoming version of RIC/DICMOS would include local temperature and voltage drift sensors, embedded in close proximity of the TDLEs. The goal is to correct for timing errors in the critical path due to the slow temperature and voltage variations, as the chip activity varies over time. The corresponding digitized correction terms would be added (or subtracted) from the run-time delay values generated by the relinearization tables. 5 Characterization Results RIC/DICMOS characterization results (figures 5 and 6) show DHI EPA (Edge Placement Accuracy) of +/-45ps with respect to changes in voltage and temperature(+/35ps for a +/-2% voltage variation, and +/-10ps for a +/2C change in temperature). A minimum DHI pulsewidth of approximately 700ps was also achieved (figure 7). The typical jitter (figure 8) about a drive edge was measured at +/-10ps. Calibration data shown in figure 9 indicates a delay step linearity error of +/-25ps. Figure 5: Change in DHI propagation delay w.r.t. +/-2% Vdd change Paper 20.2 583 Figure 6: Change in DHI propagation delay w.r.t. 30C change in temperature Figure 7: DHI Minimum pulse width Paper 20.2 584 Figure 8: DHI jitter 25ps 20ps 15ps Error (ps) 10ps 5ps 0ps -5ps 0 20 40 60 80 100 120 -10ps -15ps -20ps -25ps Delay Steps Figure 9: Delay element linearity 6 Applications and benefits of high fanout formatter Many chipsets such as Advanced Micro Device’s AMD760 [6,7] require that different pins be tested at different frequencies (see figure 10). The North bridge interfaces that include the system bus and the PCI bus, require tests to be run at 266 MHz, and South bridge in- terface has drive and strobe requirements at 33 MHz. Thus, for testing 168 high speed (266 MHz) and 56 lowspeed (33 MHz) pins, just 28 RIC/DICMOS pairs are required. Another typical example is of nVdia’s NV2A graphics processor used in the Microsoft XBox Video Game System. This is a logic device with 418 signal pins. FuncPaper 20.2 585 tional test vectors need to be run at 245 MHz, with some very large scan patterns at 5 MHz, as well. In order to accommodate the testing requirements of the moderate pincount of the device, toggling at moderate frequencies, RIC/DICMOS provides unparalleled integration, with an associated reduction in test hardware cost, and at an edge placement accuracy of below +/- 81ps. CPU North bridge interfaces -- 266MHz System Bus [64 pins] -- 266 MHz DDR Bus [72 pins] -- 266 MHz AGP Bus [32 pins] -- 33 MHz PCI Bus [16 pins] System Bus Graphics Card AGP Host Bridge Memory SDRAM PCI Bus South bridge interfaces -- 33 MHz PCI Bus [16 pins] -- USB Bus [8 pins] -- EIDE Bus [8 pins] -- ISA Bus [8 pins] EIDE I/O Ports Figure 10: AMD 760 Chipset 7 Conclusion RIC/DICMOS is a 800 Mbps drive and response circuit implemented in 0.18 micron CMOS, contained within Credence new timing generator IC. Its eight independent drive and strobe channels provide high fanout on testhead PCB, yielding dense hardware integration and an overall cost savings, without compromising the +/150 ps system EPA required of low to moderate cost test systems. RIC/DICMOS auto-calibration scheme is an innovation which allows the linearity of new formatter’s delay elements to be determined at least 500 times faster than its predecessors’. 8 Reference [1] Ahmed R. Syed, "RIC/DICMOS-- Multichannel CMOS Formatter," ITC 2003, pages 175-184. [2] James A. Gasbarro and Mark A. Horowitz, "A Single-Chip, Functional Tester for VLSI Circuits," ISSCC February 1990, pages 84-85. [3] M. Barber, "Fundamental timing problems in testing MOS VLSI on modern ATE," IEEE Design & Test, August 1984, pages 482-489. [4] James A. Gasbarro and Mark A. Horowitz, "Integrated Pin Electronics for VLSI Functional Testers," IEEE J. Solid State Circuits, vol. 24, no. 2, pages 331-337. [5] Ahmed R. Syed, "R4X/D4X-- Formatters for Flexible Test System Architecture," ITC 2002, pages 885-893. Paper 20.2 586 Peripheral Controller ISA Bus [6] Scott Wasson and Andrew Brown, "AMD’s 760 chipset with DDR SDRAM Rambuster," The Tech Report; 2000Q4 [7] A.T. Sivaram, "Split Timing Mode (STM)- Answer to Dual Frequency Domain Testing", ITC 2001, pages 140-147.