Static timing analysis (STA) with interview questions for autumn recruitment

What are STAs?

        Static Timing Analysis: It is a method to analyze, debug and confirm the timing performance of a gate-level system design. Check the maximum delay of the gate-level circuit to ensure that the setup time requirement can be met at the specified frequency; check the minimum delay of the gate-level circuit to ensure that the hold time requirement can be met at the specified frequency.

        What STA needs to check: check whether the setup/hold of the sequential cell meets the requirements; check whether the Async Reset/Set recovery/removal meets the requirements; check whether a short pulse can be detected by the subsequent circuit; Clock gate setup/hold;

        It also includes calculating whether the design meets the requirements of DRC (Design Rule Check): the maximum capacitance of the circuit cannot exceed the maximum capacitance (capacitance) set, the transition time of the circuit (transition) cannot exceed the maximum value set, and the fanout of the circuit (fanout) The maximum value cannot be exceeded.

STA has to check a lot of content: including DFF setup hold check, ARST check, pulse check, clock gate

check etc.

DFF setup/hold 

 

        When the rising edge of the clock arrives, in order to ensure that the data can be accurately collected, the clock has requirements on the data set up time (set up time) and hold time (hold time). If the time from CK to Q is longer than the hold time, Metastable (metastability) will not appear. If the setup hold does not meet the requirements, Metastable will appear.

ARST check 

        Including during reset, when the reset signal is from 0 to 1, if it is too close to the clock, Metastable will appear. At the moment of the rising edge of the clock, it is not sure whether the value of ARST is 0 or 1. In the above figure The recovery check Violation of the ARST signal, so the timing does not meet the requirements.

        Reset signal (active low) timing check, only check the rising edge, because if Metastable appears on the rising edge, it may cause inconsistent time for each register to enter the working state. Some registers start to work, and some registers are still holding the low power of ARST. The flat signal does not enter the working state, so the timing check must be carried out in this case, but there is no need to check when ARST falls, because even if it violates the rule, you need to reset all registers, only some reset and some are in the working state, and some enter the reset first It enters the reset state after a part of the state, and it will enter the reset state at the end regardless of whether it is turned off or not, and will not affect the function result.

Signal pulse check 

        The width of the signal pulse should also be checked for timing. It is necessary to check whether the signal lasts for a period of time. For example, the reset signal mentioned above. If the reset signal is not pulled down for a long time, then the reset may fail. 

        If a violation occurs at the clk gate, for example, clk_en is only pulled high on the falling edge of clk, so that the output clk_gate of the combined logic or gate will be low first and then high, resulting in a very short glitch, which will not work, so clk_en is required here. A little bit behind the rising edge and a little bit in front of the falling edge, so that there will be no clk gate violation.

Before analyzing the timing in detail, let's talk about what the source of the timing problem is related to (delay)

Synthesis: According to the RTL design and cell library, map the RTL code to the most basic gate-level cells. According to the timing constraints of STA, try to ensure that the mapped gate level design meets the timing constraints. Try to optimize the area and power consumption of the design to the minimum (optimize the area and power consumption, there are many cells to choose from in the synthesis tool, for example, the adder has ripple carry and carry-ahead adder, and the DC tool has many choices, what to choose specifically Synthesize according to your design and constraints), have tried to repair DRC violation (maximum capacitance, maximum fan-out, etc.), and try to minimize gate-level congestion (increase line density, reduce chip area).

Synthesis will involve the concept of a process library, and the area and delay of gate circuits synthesized by different processes will be different

Introduction to cell library

        The library of various processes will provide at least three types of libraries, slow/typical/fast. In the slow process, the main focus is on the setup analysis, because the signal is relatively slow, and it is easy to setup violation, while in the fast process, the main focus is on the hold violation. The current advanced technology will also provide cell libraries of different Vt processes, which are divided into: LVT, RVT, SVT (Standard V threshold), HVT (High V threshold), and  at least three opration conors will be given under each VT. The delay of devices under different VT processes is different. The delay of LVT devices is very small, and the lower the threshold voltage, the higher the speed performance because the saturation current becomes smaller; but because the leakage current will become larger, the power consumption will be higher. big. The speeds are arranged in order of SLVT, LVT, RVT, HVT from fast to slow. Power consumption is just the opposite. (Don’t think that you can ignore the process part yourself. When repairing the set up/hold violation, you can repair it by changing the device process corner, so you need to know the differences between LVT, RVT, and HVT)        

        If a chip is made with the RVT process corner to run at a maximum frequency of 380M, and we want it to run at 400M, then we can replace some devices on the critical path with faster LVT process devices , so that the circuit The overall frequency is higher, and the corresponding power consumption will be sacrificed. When is HVT used? On those paths with very small delays that are not critical paths, some devices of RVT are replaced with HVTs, which can reduce power consumption without affecting the speed of the circuit .

        For the delay of a cell, how much it delays, there are specific settings in the cell library, and its search method is in the form of a lookup table.

        When DRC checks the settings of max capacity and max loading in DC, the maximum value cannot exceed the maximum value specified by the cell library similar to the above table.

clock characteristics

A synchronous clock is a clock that can clearly define the following relationship among multiple clocks:

        ① Clock frequency

        ② The duration of clock high and low levels (duty cycle)

        ③ The phase of each clock (waveform)

        ④ The input latency of the clock

Asynchronous clock: A clock source that cannot clearly define the above relationship between multiple clocks.

The clock has many parameters, such as frequency, duty cycle, etc. Here are some parameters related to the clock.

Clock period (clock period): The clock on the chip is generally generated by a crystal oscillator for accuracy.

Clock jitter: It refers to the difference between two clock cycles. This error is generated inside the clock generator and is related to the crystal oscillator or the internal circuit of the PLL. The wiring has no effect on it. , this general problem is not big, because the jitter is generated by the crystal oscillator, the clock cycle of the crystal oscillator may be 100ns, and the next clock cycle is 101ns, resulting in a jitter of 1ns, but this will not have a great impact on the timing, because the crystal oscillator It is the source of the clock tree. Register A has jitter, register B also has jitter, and everyone has clock jitter.

duty cycle: the duty cycle, the ratio of the high level to the entire clock cycle.

Transition time: The reversal of the clock is not straight up and down. In fact, there is a transmission delay. The transmission delay is defined as the time spent by 10%~90% of the slave clock (rising transfer delay) and 90%~10% of the slave clock. The time it takes (drop propagation delay).

phase Phase: The position of the first rising edge and the first falling edge, relative to the time at zero time.

input latency: the delay of the input terminal relative to the clock source, latency, (delay caused by wiring)

Clock skew: The distance from the clock tree to different registers is different, because the time to reach the registers is different for these distances. The difference in the time to reach different registers on the same edge of the same clock is called clock skew.

Clock uncertainty: clock jitter + pessimistic (pessimistic amount)

Create a clock in SDC (written in Tcl script) 

port: the port of the top-level design

pin:input/output   of a cell(not the top design)


STA Timing Path : ① input to register ② register to output ③ register to register ④ input to output

STA  Start/End Points

Start Points:input ports、Clock pins of sequential cells(clocks)

End Points:output ports、Data pins of sequential cells、Clock pins of sequential cells(clocks)

set up check

        If there are two registers connected, the condition for set up to meet the requirements is: the data sent by the previous register at the first rising edge of the clock can be captured by the latter register at the second rising edge (if it cannot be captured by the second rising edge If you catch it along the way, the delay is too large).

hold check

        If there are two registers connected, the condition for hold to meet the requirements is: the data sent by the previous register at the first rising edge of the clock cannot be captured by the latter register at the first rising edge (if it is captured by the first rising edge Arrived, that is, the delay is too small, and the data will soon arrive at the second register). 

setup check calculation

        The time required for the purple path data to reach the D terminal of the register C from the clk terminal as shown in the figure is:

data_arrive_time = clk_latency + clk_path1_delay + ck_to_q + logic_delay;

        The data arrival time is the delay of the clock, the path delay of the clock from the outside to the clk terminal, a delay from the clk terminal to the data output, plus the combinational logic delay.

        The corresponding register C also has a data request time. If the data delay is less than require, then there is no problem. If it is greater than require (data is late), setup violation will occur.

data_require_time = clk_period+clk_latency + clk_path2_delay - dff_set_up - clk_uncertainty;

Data demand time : add one clock cycle, because the data required by C is the data of the previous cycle.

        The synthesis tool will perform a setup check:

                             setup_slack = data_require_time - data_arrive_time 

                             if( setup_slack  >= 0)

                                    setup meet;

                            else 

                                    setup violated;

        Because the arrival time of your data must be less than the data demand time, that is, the data must reach the next register before the next clock edge grabs the data and meet the setup requirements of the next register.

hold check calculation

    data_arrive_time = clk_latency + clk_path1_delay + ck_to_q + logic_delay;

    The data arrival time is the delay of the clock, the path delay of the clock from the outside to the clk terminal, a delay from the clk terminal to the data output, plus the combinational logic delay.

    The hold check means that register C cannot capture the data of register B's current rising edge of the clock on the rising edge of the current clock.

    data_require_time = clk_latency + clk_path2_delay + dff_hold + clk_uncertainty;

    The synthesis tool will perform a hold check:

                            hold_slack = data_arrive_time - data_required_time

                            if( hold_slack > = 0 )

                                    hold meet;

                            else

                                    hold violated;

        The arrival time of the data is longer than the required time of the data, because it is the data of the same clock, the data cannot arrive too fast, and cannot be caught by the next register at the same clock edge, the actual arrival time of the data must be after the window, and cannot be caught by the next register. caught on the same clock edge.

        During setup check, it is necessary to take a path with a relatively long combinational logic path, which is easy to setup violation; during hold check, it is necessary to take a path with a relatively short combinational logic path, which is easy to hold violation, that is, both check stages Consider the worst case.

        The setup check and hold check mentioned above are for the check from the same clock, if the clocks of the two registers come from two different clock sources 

        For the setup check, we still use its meaning to understand that the data of the A register on one clock edge is captured on one clock edge of the B register, but the two clocks are asynchronous clocks, and the phase is uncertain, so the synthesis tool will find two The clocks are closest to each other (here, the closest is to find the place where the waveform waveform is closest) to check.

        For the hold analysis in the STA analysis of the asynchronous clock, the rising edge before the setup check point is taken as the hold check point , and the hold check is done at the place with the closest hold interval, as shown in the figure below:

Setup hold will find the worst case to do check.

        For two clocks from different sources, we don’t want him to analyze STA, so we can download false_path, because many such violations are false violations, and we don’t want to spend a lot of computing power to optimize these places that will not go wrong, so we can download false_path.

Under the constraints of TCL, create a clock:

下 input_delay:

The delay of the external signal is under input_delay, and the internal margin is clock_period - input_delay, which is generally at the top of the logic, and the delay of the external logic under it. for example:

set_input_delay -clock CLK $dly [get_ports D]

The same is true for output_delay

For both input_delay and output_delay, the delay constraint of the maximum and minimum values ​​can be set. The maximum value is used for setup analysis, and the minimum value is used for hold analysis.

set_input_delay

For example, give the input a maximum delay of 6ns and a minimum delay of 2ns 

Looking at the rising edge of the second clock, the data becomes unstable after 2ns of the rising edge of the second clock, because the fastest clock delay is 2ns, and the data becomes stable after 6ns, because the maximum delay of the clock is 6ns, exceeding this After 6ns, the data is stable.

If the max value and min value are not set separately, then the synthesis tool assumes that the two values ​​are the same.

set_output_delay is to set the external delay of the port, so the internal delay is only T-$delay. Of course, note here: the output_delay of the constraint can also be a negative number. The negative number means that the delay of the internal connection of the module to the external part must be greater than The absolute value of this negative number, for example:

set_output_delay -clock CLK -min -3 [get_ports OUT]

If output_delay is set to be a negative number, then the delay from your internal logic to output must be greater than 3ns. In this way, the internal delay + external delay > 0 is reasonable. It can be understood that a register is connected externally, and then the hold of the register is 3ns. On the same clock edge, the data of the previous register at this clock edge cannot be captured by the latter register, so the delay from data to the latter register must be greater than this The hold time of the register, otherwise there will be a hold violation. The purpose of setting the external delay to a negative number is to constrain the internal delay to be greater than the absolute value of this negative number.

set_false_path

If we follow this path, it means that the tool is not allowed to perform STA analysis, for example:

set_false_path -from A -to B

※The path between A→B does not have check timing; but the path between B→A still requires check timing.

set_multicycle_path

If the data lasts for two cycles, that is, during the setup analysis, the data of the first clock rising edge of the previous register can be captured by the third clock rising edge of the latter register. If you don’t want the setup analysis to follow the default, then Under set_multicycle_path, for example:

set_multicycle_path 2 -setup -from A -to B;

set_multicycle_path 1 -hold -from A -to B; 

        Generally speaking, if set_multicycle_path is used, setup constraints and hold constraints often appear in pairs. For example, the setup is set to delay one cycle (set_multicycle_path 2), and the hold is set to advance one cycle (set_multicycle_path 1). This means that the data is sent out on the first rising edge and can be captured by the third rising edge of the next register (setup is satisfied). The data is sent on the first rising edge and cannot be caught by the same rising edge of the next register. (hold is satisfied), if set_multicycle_path 2 -hold -from A - to B, then the data is sent on the first rising edge and cannot be captured by the second rising edge of the next register (hold is satisfied).

Advanced STA Concepts

on-chip-Variation:OCV

On the same chip, the PVT of cells in different positions is inconsistent. When STA analyzes the setup, when calculating data_arrive_time, it deliberately uses a process with a larger delay to make the data delay longer, and uses a delay on the clock path of the next register. A small craft, that is, to make setup easier to violate.

However, this method is a bit too pessimistic. A circuit that can run at a high frequency originally runs at a low frequency for such a pessimistic analysis, so in fact, a 5% pessimistic amount is generally added to the delay. This specific How many artificial settings.

common-Path-Pessimism-Remove :CPPR

On the common clock path, the delay of the cell is the same. For example, when calculating the setup, one clock path passes through a buffer, and when calculating the data_require_time of the next register, another clock path passes through this buffer. Then both paths pass through this buffer. According to OCV, we cannot The buffer is pessimistic with good technology and bad technology at the same time, so in this case, the cells on the common path can be set to the same technology, that is, the same delay.

STA calculation example 1: 

        For setup path calculation, the longest delay is taken. The path with the longest delay must arrive before the second clock edge and should meet the setup requirements (the setup time is subtracted from data_required, because the data must arrive before the setup window).

slack = 8.75ns, which means that the setup still has so much margin.

Hold path calculation, the data arrival time is slower than the hold window, ensuring that the data cannot be collected by the same clock edge 

STA calculation example 2:

Calculate data_require_time and then calculate data_arrive_time and then calculate whether the margin is enough.

The first question is to find the maximum operating frequency (setup analysis):

Only the setup analysis is affected by the clock frequency. When writing the path before, there is a clock_period in the data_require_time of the setup, so the setup analysis is related to the clock frequency, and it is easy to infer that the faster the clock frequency, the easier it is for setup violations to occur.

Each register has to analyze the margin of setup

For register 1:

        The data_arrive of register 1 has two paths, one is Path1 from din directly to the D terminal, and the other is from the Q terminal of F2 to the D terminal Path2.

Path1 :0.5 + 2 = 2.5 ns(din + Tandgate)

Path2: 1 + 1 + 2 + 2 = 6ns (two buffers+Tcq+Tandgate)

data_require = Tcycle + Tbuffer- Tsu = Tcycle + 1 - 3 = Tcycle - 2

The longest delay is Path2, and the delay of Path2 is less than data_require, so there is an inequality:

6 =< Tcycle - 2 so that T cycle >= 8ns

For register 2:

only one data path

data_arrive_time =  1 + 2 + 2 + 2 + 2 = 9ns

data_require_time = Tcycle + 1 + 1 - 3 = Tcycle - 1

data_require_time >= data_arrive_time 所以,Tcycle >= 10ns

In summary, Tcycle >= 10ns, so the maximum operating frequency of the clock is 100MHz;

The second question is to judge whether there is a timing violation (hold analysis) when the clock is 10MHz:

        First of all, the maximum operating frequency of the circuit is 100MHz, so when the clock is 10MHz, there must be no setup violation, but we don’t know whether there is a hold violation, so we need to calculate whether there will be timing problems in the hold of the two registers.

For register 1:

path1 :0.5 + 2 = 2.5 ns

path2 : 1 + 1 + 2 + 2 = 6ns

data_require_time = 1 + 2 = 3ns

The hold analysis requires that the data delay arrives after the rising edge of the clock, so data_arrive_time >= data_require_time

For path1, 2.5ns < 3ns, so hold violation will occur in path 1.

For register 2:

data_arrive_time = 1 + 2 + 2 + 2 + 2 = 9 ns

data_require_time = 1 + 1 + 2 = 4ns

9 > 4 satisfies the requirement, so register 2 will not hold violation.

    The actual STA analysis is analyzed by the EDA tool for us. The tool will help us analyze whether the setup/hold of each path meets the requirements, so for our actual project, we only need to look at the timing report to see if there is any A violation occurs, and if so, the cause and how to fix the violation.

STA self-test

1. If the chip tape is out, the test suspects that there is a setup violation, how to confirm? (Reduce the clock frequency, the clock_period becomes larger, and the inequality of the setup check is easier to satisfy. If the clock frequency is reduced and the circuit changes from inoperable to stable, it is likely to be setup violation. If the PLL is dead Now, the clock frequency cannot be adjusted, so you can also adjust ck_to_q (using different process libraries, PVT), and logic_delay)

2. If the chip tape is out, the test suspects that there is a hold violation, how to confirm it (same as the setup test, if the hold violation is found, then the inequality is not established, then we will find a way to make the inequality hold, the same can be changed PVT can also be increased logic delay,)?

3. What is setup time and what is hold time

4. Write the judgment formula of setup check and hold check

5. What to do if setup violation occurs?

6. What to do if hold violation occurs? (Adjust parameters according to inequality, such as changing PVT, modifying logic_delay)

7. What is the difference between clock jitter and clock skew?

8. The difference between synchronous clock and asynchronous clock

9. What should I do if I do not want STA between two asynchronous clocks? How to impose constraints?

10. If there are multiple combinational logic paths from register A to register B, how to analyze setup and hold?

11. When setting output_delay, can delay be negative? Why?

12. If you are given a circuit diagram and asked you to analyze whether the setup and hold timings meet the requirements, can you analyze it? If you are asked to judge the highest clock frequency that the circuit can run, can you calculate it?

Guess you like

Origin blog.csdn.net/qq_57502075/article/details/127551455