doi:10.3772/j.issn.1006-6748.2014.02.005 # Verilog HDL modeling and design of 10Gb/s SerDes full rate CDR in 65nm CMOS<sup>©</sup> Chen Yingmei (陈莹梅)<sup>②\*</sup>, Chen Xuehui<sup>\*</sup>, Yi Lvfan<sup>\*\*</sup>, Wen Guanguo<sup>\*\*</sup> (\*Institute of RF- & OE-ICs, Southeast University, Nanjing 210096, P. R. China) (\*\*Zhongxing Telecom Equipment Corporation, Shenzhen 518055, P. R. China) #### Abstract Phase locked loop (PLL) is a typical analog-digital mixed signal circuit and a method of conducting a top level system verification including PLL with standard digital simulator becomes especially significant. The behavioral level model (BLM) of the PLL in Verilog-HDL for pure digital simulator is innovated in this paper, and the design of PLL based clock and data recovery (CDR) circuit aided with jitter attenuation PLL for SerDes application is also presented. The CDR employs a dual-loop architecture where a frequency-locked loop acts as an acquisition aid to the phase-locked loop. To simultaneously meet jitter tolerance and jitter transfer specifications defined in G. 8251 of optical transport network (ITU-T OTN), an additional jitter attenuation PLL is used. Simulation results show that the peak-to-peak jitter of the recovered clock and data is 5.17ps and 2.3ps respectively. The core of the whole chip consumes 72mA current from a 1.0V supply. Key words: Verilog-HDL, behavioral level model (BLM), phase locked loops(PLL), clock and data recovery (CDR) ## 0 Introduction Clock and data recovery (CDR) is a critical function in high-speed transceivers. Such transceivers serve in many applications, including optical communications and chip-to-chip interconnects. Data received in these systems are both asynchronous and noisy, requiring that a clock is extracted to allow synchronous operations. Furthermore, the data must be "retimed" such that the jitter accumulated during transmission is removed. CDR circuits operating at tens of gigabits per second pose difficult challenges with respect to device, speed, jitter, signal distribution, system architecture, and power consumption<sup>[1]</sup>. This paper dedicates a design of a $9.95 \sim 11.5 \, \mathrm{Gh/s}$ full rate CDR with jitter attenuation PLL in $65 \, \mathrm{nm}$ CMOS technology that fulfills all of jitter specifications recommended by G. 8251. Jitter characteristics critical to G. 8251 recommendations are jitter transfer (bandwidth and jitter peaking), jitter tolerance, and jitter generation. With the development of the function and scale of IC, most integrated circuits realizations include both digital and analog functions. While the method of conducting the top level system verification including the PLL with the standard digital simulator has not been developed. This paper presentes the method of behavioral level model for the PLL modules in Verilog-HDL in Section 1. In Section 2, the jitter specifications of G. 8251 are analyzed and the CDR architecture is introduced. Section 3 describes the building blocks and circuit design. Then, the post simulation results of CDR and jitter attenuation PLL circuits are given in Section 4. Section 5 summarizes the post simulation results and implementation details. ## 1 Behavioral level model of PLL The behavioral level model aims to provide a behavioral model of all the blocks within the top level module, enabling designers to verify the increasingly complex design using a standard digital simulator, such as Verilog-XL and Modelsim. Several challenges arise in creating an accurate BLM in Verilog-HDL, and the most is due to Verilog's inherent inability to model and simulate currents and voltages. To overcome these obstacles, a virtual high frequency clock is created to ① Supported by the National High Technology Research and Development Programme of China (No. 2011AA010301), the Research Foundation of Zhongxing Telecom Equipment Corporation and the National Natural Science Foundation of China (No. 60976029). To whom correspondence should be addressed. E-mail; njcym@ seu. edu. cn Received on June 26, 2012 perform discrete time simulation. The simulator treats all voltage and current values as real Verilog variables and samples or updates their values on the edge of the virtual clock. The clock provides necessary time increment to update the critical currents and voltages within the analog circuits that allow the BLM to model the real circuit accurately. For example, charge and discharge current from the charge pump can be sampled by the virtual high-frequency clock as shown in Fig. 1, so the time-domain integration function of the LPF can be transformed to time-domain for the reason of the discretization of integrant. The output voltage of LPF is also discrete because it is the summation of that of the resistor $R_{\rm P}$ and capacitor C1. It can be described by the Verilog as follows: ``` ahways@(posedge Sample_clk) begin Iout = 'CHARGEPUMP_I * ((UP < DN))? (-1.0):(UP-DN)); end always@(posedge Sample_clk) begin Vc_pre = Vc; Vc = Vc_pre + Iout * (1/(2 * 'PI * 'SAM-PLE_FREQ * 'C1)); Vetrl = Vc + Iout * 'Rp; end</pre> ``` In which CHARGEPUMP \_ I is the charge and discharge current of the charge pump, Vc is the voltage of capacitor, and SAMPLE \_ FREQ is the frequency of the sampling clock. Fig. 1 The discretization of the time-domain integration Fig. 2 shows the evolution procedure of the output frequency of VCO. Once the output voltage of LPF is generated, the corresponding oscillation frequency of the VCO can be deduced by the gain $K_{\rm VCO}$ of the VCO. Then the delay time and output frequency will be calculated by a forever function. Fig. 3 shows the simulation results of the top level verification using a Modelsim simulator. As can be seen in the picture, the control-line of VCO exhibits the track and acquisition process during normal operation. Fig. 2 The evolution procedure of VCO output frequency Fig. 3 Simulation results of the top level verification in Modelsim # 2 Jitter analyses and CDR architecture The jitter performance of a CDR is commonly characterized by jitter generation, jitter transfer and jitter tolerance. Normally, jitter tolerance and jitter generation are the performance requirements for high speed link CDRs which are used in chip-to-chip communication, and it does not need to meet jitter transfer specification. However, for those applications where the CDR is used in a repeater, as the case in SONET (synchronous optical network) systems, all the three jitter performances are needed [22]. Jitter transfer function of a CDR circuit represents that how much jitter passes through the system from its input to its output. There are two difficult specifications in jitter transfer. One is bandwidth, and the other is jitter peaking. The amount of jitter peaking must be less than $0.1 \, \mathrm{dB}$ . In order to meet the requirement of little jitter peaking, a damping factor above $4 \sim 6$ must be adopted. G. 8251<sup>[3]</sup> defines the masks for OTU2 jitter tolerance and jitter transfer. It is important to note that the 0.15UI corner of jitter tolerance mask is at 4MHz, while the – 3dB corner frequency of jitter transfer curve is 1MHz. In order to meet the jitter tolerance mask, the CDR PLL must be designed as that the closed-loop bandwidth is higher than the 0.15UI corner frequency of jitter tolerance. However, in this case, the CDR loop cannot meet jitter transfer mask. So these requirements suggest that a single CDR loop can- not be designed to meet jitter tolerance and jitter transfer mask<sup>[4]</sup> simultaneously. In this design, a standard CDR PLL focusing on meeting jitter tolerance and an additional low-pass filter called jitter attenuation PLL for jitter transfer are introduced. Fig. 4 depicts the functional block diagram of the proposed clock and data recovery architecture. The data input is expected to be a non-return-to-zero (NRZ) data pattern at a rate between 9.95 and 11.5Gb/s. The CDR PLL employs a dual-loop architecture where the frequency-locked loop acts as an acquisition aid to the phase-locked loop and is disengaged during normal operation when the CDR PLL is locked to the input serial data. Thus the structure realizes a large frequency acquisition range, while maintaining the precise control of phase alignment to achieve the ITU-T OTN G. 8251 jitter tolerance requirements. A conventional charge-pump based design is used and it is shared by the dual-loop. Fig. 4 The architecture of the proposed CDR The proposed CDR implements a full-rate architecture because of its simplicity and robustness with regard to various data patterns. In full-rate CDR, the design of the voltage-controlled oscillator is simplified and the physical layout of CDR can be very dense to minimize parasitic, which in turn helps to improve speed and reduce noise coupling. In addition, the full-rate CDR generates a low jitter full-rate clock which is applied to retime the data from the input. By the behavioral level simulation of VHDL model and the above CDR architecture, the design specifications of the module of CDR can be distributed. According to the tuning range of VCO and low control voltage range lead by low supply voltage of 65nm CMOS technology, the gain $K_{\text{VCO}}$ is set to 600MHz. The -3dB bandwidth K of CDR is 4MHz, and the -3dB bandwidth of the Jitter attenuator PLL is 1MHz. The current $I_{\rm P}$ of charge pump is $80\,\mu$ A. Therefore, the $R_{\rm p}$ and $C_{\rm p}$ parameter of LPF can be calculated from Eq. (1) and Eq. (2). $$R_{\rm p} = \frac{2\pi K}{I \cdot K_{\rm MOO}} \tag{1}$$ $$R_{\rm p} = \frac{2\pi K}{I_{\rm p} \cdot K_{\rm vco}}$$ $$C_{\rm p} = \frac{4\zeta^2}{K \cdot R_{\rm p}} = \frac{2I_{\rm p}K_{\rm vco}\zeta^2}{\pi K^2}$$ $$(1)$$ # **Building blocks and circuit design** #### 3.1 Phase detector Operating the Hogge detector at 10Gb/s is challenging when the key issues are achieving a low phase offset between the input clock and data signals (in order to achieve high jitter tolerance), and achieving a linear phase error characteristic over a wide phase error range. To address these issues, a buffer is inserted between the input data signal and the leftmost XOR gate input as shown in Fig. 5. The buffer is designed to have a delay that compensate for the clock-to-O delay of the first register. Appropriate layout techniques is applied to achieve good matching since mismatch would contribute to phase offset in the overall phase detection operation. Fig. 5 Hogge phase detector In this full-rate PD the sampling D flip-flop is the most critical building block, which has to track and sample the incoming 10Gb/s signal. So current-mode logic (CML) master-slave flip-flop is employed to meet the speed requirement. The CML circuits have a higher immunity to supply noise and generate less switching noise on the power supply. In fact, the attainable data rate in CML flipflop circuit is limited by the RC time constants of the circuit. Thus, optimizing the performance of the high speed CML flip-flop circuit becomes an exercise of minimizing these RC time constants. # Charge pump As can be seen in Fig. 6, the charge pump is a completely active differential design with a commonmode feedback circuit. Due to jitter-related considerations (low noise and jitter peaking < 0.1dB) and the practical size limitations for on-chip capacitors, the employed passive RC filter is a differential off-chip filter. Low-noise design makes it necessary to run a fairly high current in the charge pump. However, considering the relatively low power supply voltages of 65nm CMOS technology, careful optimization of the charge-pump control signal voltages in order to maximize the drive voltage range of the charge pump becomes an important requirement. That is, running high charge-pump currents with low-power supply voltages presents headroom challenges, particularly when considering process and temperature shift robustness<sup>[5,6]</sup>. Fig. 6 The proposed differential charge pump Proper differential operation is realized by the CMFB circuit, which also increases the output dynamic-range and the symmetry of charging or discharging current. In this work, the common-mode voltage of the charge pump outputs is sensed and compared to a common-mode reference CMFB. ### 3.3 LC-VCO The simplified schematic diagram of the proposed complementary cross-coupling LC-VCO is shown in Fig. 7. The complementary structure features in oscillating more symmetry waveforms which results in the improvement of low-frequency flick noise. Moreover, the complementary structure produces a higher negative-resistor and as a result, a lower power dissipation. To suppress the common-mode noise introduced by power supply, substrate and control-line, a differential tuning technology is adopted. The frequency band of the proposed LC-VCO is extended up to 1.55Gb/s by employing switch capacitor array as shown in Fig. 8 and which is selectable by an external three-bit control word. For a low duty ratio distortion, a clock buffer with AC coupling is implemented. This buffer has a band-pass type transfer function, and so cuts off both low frequency noise caused by interference from the other surrounding circuits and high frequency harmonics generated in the LC-VCO<sup>[7]</sup>. The common-mode voltage of the output can be fixed by adjusting the bias. Fig. 7 Structure of LC-VCO and buffer Fig. 8 The switch capacitor array of the VCO The simulation results show that the VCO has a phase noise of -110dBc/Hz at 1MHz offset while consuming 5mA from a 1.0V supply. This phase noise performance will reduce the deterministic jitter of the recovered clock. However, it still brings a good random jitter characteristic that is acceptable for the 10Gb/s CDR operation<sup>[8]</sup>. #### 3.4 Jitter attenuation PLL An additional jitter attenuation PLL is used to meet jitter transfer specification, and it is a traditional second order PLL which bandwidth is set to be below 1MHz. In addition, considering the jitter peaking <0.1dB required by G. 8251, the damping ratio should be designed to over-damped. The PFD and CP of the jitter attenuation PLL are carefully designed to avoid the dead zone of operation and to reduce mismatch of the system. The structure, tuning range and control word logic of the VCO are the same as the LC-VCO in the CDR PLL, excepting a single-ended "Vtune" as shown in Fig. 7. ## 4 Circuit simulation results The transistor level circuits is simulated with Spectre@RF simulator in the TSMC 65nm CMOS technology. The layout occupies an area of 0.975mm $\times$ 0.875mm as shown in Fig.9 (including Pads for testing on-chip). The simulation results show that the VCO provides a tuning range of 2 GHz ( $\approx 19\%$ ) with a best phase noise of -110.7 dBc/Hz at 1 MHz offset, and the maximum gain of the VCO is quite high, about 1.0 GHz/V. Fig. 9 Layout of the SerDes CDR Fig. 10 shows an eye diagram of the recovery clock, exhibiting a peak to peak jitter of 5.17 ps. G. 8251 specifies 0.1 UI as the maximum peak-to-peak jitter on the recovery clock, and the measured jitter meets the requirements of G. 8251. Fig. 10 Eye diagram of the recovery clock Fig. 11 depicts the eye diagram of the recovered full-rate retimed date (11.2Gb/s). The retimed data exhibits a 280 mVpp output swing and a peak-to-peak jitter of 2.3ps. G. 8251 specifies 0.15 unit interval as the maximum peak-to-peak jitter on recovered data, and the simulated results meet the requirements. Fig. 11 Eye diagram of the recovery data (11, 2Gb/s) The simulated transient response of the differential control voltage of the CDR loop is shown in Fig. 12. Based on the mentioned design parameter, the locking time of the CDR can be less than 1.8 µs by an initial frequency step. Fig. 12 The locking time of the CDR loop # 5 Conclusion This paper demonstrates a 9.95 ~ 11.5 Gb/s CDR for SerDes application in 65 nm CMOS technology. To conduct the whole CDR on the top level verification using standard digital simulator, the method for BLM in Verilog-HDL is innovated. A dual-loop architecture CDR PLL aided with jitter attenuation PLL is designed to simultaneously meet jitter tolerance and jitter transfer specifications defined in G. 8251 of optical transport network (ITU-T OTN). The proposed oscillator, full-rate PD, differential charge pump and Flip-flop topologies resolve many of circuit and architecture issues. The jitter performance of the recovered clock and data preferably meet the jitter recommendations of G. 8251. #### References Razavi B. Challenges in the design of high-speed clock and data recovery circuits. IEEE Communications Magazine, - 2002, 40(8): 94-101 - [2] Cao J, Green M, Momtaz A, et al. OC-192 transmitter and receiver in standard 0.18μm CMOS. *IEEE J Solid-State Circuits*, 2002,37(12); 1768-1780 - [3] The Control of Jitter and Wander within the Optical Transport Network, ITU-T G. 8251, Oct. 2001 - [4] Wang H, Chen Y M. Jitter analysis and modeling of a 10Gb/s SerDes CDR and jitter attenuation PLL. The Journal of China Universities of Posts and Telecommunications, 2011, 18(6):122-126 (In Chinese) - [5] Yan S C, Chen Y M, Wang T, et al. A 40Gb/s quarter rate CDR with 1:4 demultiplexer in 90nm CMOS technology. In: Proceedings of the 12th IEEE International Conference on Communication and Technology, Nanjing, China, 2010. 11-14 - [6] Henrickson L, Shen D, Nellore U, et al. Low-power fully integrated 10-Gh/s SONET/SDH transceiver in 0. 13 µm CMOS. IEEE J Solid-State Circuits, 2003, 38(10):1595-1601 - [7] Chen Y M, Wang H, Yan S C, et al. A 10GHz multi- - phase LC VCO with a ring capacitive coupling structure. Science China Information Sciences, 2012, 55(11):2656-2662 - [8] Byun S G, Lee J C, Shim J H, et al. A 10-Gb/s CMOS CDR and DEMUX IC with a quarter-rate linear phase detector. IEEE J Solid-State Circuits, 2006, 41(11):2566-2576 Chen Yingmei, born in 1970. She received her Ph. D. degree in School of Information Science and Engineering of Southeast University in 2007. She also received her B. S. and M. S. degrees from Nanjing University of Science and Technology and Southeast University in 1991 and 2003 respectively. Her research interests include the design of ultra-high speed integrated circuits for optical fiber communication and design of radio frequency or wireless communication integrated circuits.