Reducing Power with Advanced Clock Tree Synthesis and Optimization
Designers are in need of CTS solution that concurrently addresses both power and timing by reducing power, skew, area, and buffer count.
Clock trees pose a growing challenge to advanced node IC design, particularly with regard to the chip power consumption. Clocks are the single largest source of dynamic power usage, which makes clock tree synthesis (CTS) and optimization as a good place to achieve significant power savings.
The traditional tools and techniques for CTS too often work against the goals of lower power by over-buffering clock trees in order to achieve aggressive clock skew targets. In today’s leading-edge designs, CTS is further complicated by two relatively recent developments: the explosion in the number of modes, corners, and power domains across which the clock must operate, and the increasing resistance and variation in resistance between design corners.
These factors have further complicated clock network implementation and demands new CTS technologies to address these challenges. Designers are in need of CTS solution that concurrently addresses both power and timing. It has become essential to have a power-aware, multi-corner multi-mode (MCMM) CTS with smart clock gate handling, slew shaping, register clumping, and other advanced techniques for reducing power, skew, area, and buffer count.
Multiple Modes, Corners, Power Domains Impact Clock Power
Variability associated with multiple design modes, process corners, and power states makes balancing clocks more challenging than ever. Using a CTS engine that cannot efficiently and accurately represent more than a couple of mode/corner scenarios leads to errors due to multiple manual CTS runs, longer design times, and lost performance and power because of over-buffering and over-margining required when mode/corner/voltage scenarios are processed serially. The business cost is seen in lost market windows and higher design costs as teams try to meet power and timing budgets. However, significant clock power can be reclaimed using advanced MCMM clock tree synthesis (CTS) and optimization.
Process Scaling Effects on Clock Power
A major effect of the growing variability of resistance is that it leads to large differences in clock skew across different corners. Designing a clock tree for one corner and manually iterating for other corners results in sub-optimal results. Figure 2 shows that clock skew is reduced as more corners are considered concurrently during CTS. The skew variation is much higher when only one corner is considered during CTS, but is markedly reduced as more corners are concurrently analyzed. This illustrates the need for new CTS technology that can optimize skews across all corners concurrently.
Figure 1: Resistance, shown on the left, and variations in resistance, shown on the right,
increase dramatically with shrinking process nodes.
At smaller geometries, resistance per unit length of interconnect is rapidly increasing when compared to capacitance, and if not addressed, could impact circuit performance and clock trees. In addition to the increasing resistance, the variation of these values is also increasing, as illustrated in Figure 1.
Low-Power CTS Techniques
Figure 2: Resistance variation causes large
difference in clock skews across different corners.
Clock power consumption is a factor of capacitance, switching activity, and wire length. Low-power CTS strategies include lowering overall capacitance and minimizing switching activity. Some of the advanced techniques to help address power, and also timing, are listed below.
Additional features, such as slew shaping and the ability to define skew groups are also beneficial in reigning in clock power. Slew shaping techniques push the majority of cases closer to target slew, eliminates transitions that are overly pessimistic, and meets timing requirements while minimizing dynamic power.
- Reducing functional skew and skew across corners by using MCMM CTS
- Lowering leaf cluster capacitance with register clumping and clock gate cloning and de-cloning
- Improving clock gating coverage with netlist-level gating, hierarchical gating and activity based gating
- Minimizing switching activity with smart clock gate placement
However, getting the best power results from CTS depends on the ability to synthesize the clocks for multiple corners and modes concurrently in the presence of design and manufacturing variability. Multi-corner CTS can measure early and late clock network delays over all process corners concurrently with both global and local variation accounted for. MCMM CTS can make dynamic tradeoffs between either buffering the wire, or assigning it to less resistive layers in order to achieve the best delay, area, and power.
MCMM CTS for Minimal Clock Skew
Figure 3: CTS with Olympus-SoC — MCMM vs. single-corner. MCMM CTS produces better skew
control, lower buffers, lower power, and best clock tree for all mode/corner scenarios.
The effect of MCMM CTS is to minimize functional skew and skew variation across corners. The CTS engine should allow easy setup and accurate representation of all mode/corner/power scenarios, then analyze, synthesize and optimize them concurrently using a single, unified timing graph.
The experiences of designers using MCMM CTS show significant reduction in area, number of buffers, skew, total negative slack (TNS) and worst negative slack (WNS), in addition to lower dynamic power. The example in Figure 3 shows how, for a given mode, a single-corner CTS implementation compares with a 9-corner CTS implementation for a 9-corner design.
Fast Extraction and Register Clumping to Lower Leaf Cluster Capacitance
The leaf clusters (wire and pins) carry most capacitance in the clock tree, and being able to incrementally update RC extraction during CTS allows for leaf clustering that minimizes capacitance, and therefore reduces power. The CTS tool should also work in concert with on-the-fly global routing during clock buffer insertion so that the CTS engine sees more accurate topology. Clumping registers during placement also helps minimize capacitance on the clock tree network.
Clock Gating Minimizes Clock Switching
Figure 4: Tradeoffs in clock gate placement for lower power
Clock gating is a common technique for reducing clock power by shutting off the clock to unused sinks. Designers can perform netlist-level clock restructuring to improve clock gating coverage. An advanced CTS tool can identify missed clock gating opportunities by analyzing enable functions, finding common sub-terms and intelligently determining the optimal number of gates that still meet setup constraints. Placement of the clock gates is also critical to ensure that both the timing and power targets are being met. Along with power-aware clock gate placement, CTS should also automatically perform clock gate cloning and de-cloning to optimize and balance the load on the clock tree network.
As illustrated in Figure 4, if the probability of switching is equal on both sides of a clock gate, CTS should balance the tree for the best buffer count and lowest wire length. If the toggle rates on either side of a clock gate are different, CTS should minimize the wire length on the high-frequency wires to lower power, even at the expense of higher buffer count or wire length.
Skew Groups Improve Skew Balancing
Traditionally, CTS engines attempt to achieve zero skew by balancing the signal arrival time across all the flops regardless of which level of the clock tree they inhabit. However, not all clock ends points need to be balanced with each other. To balance different clock end points, designers have to manually craft multiple CTS specs and perform multiple CTS runs. This method is time consuming and error prone.
A better way is for the CTS engine to analyze flop interactions to derive the exact skew balancing requirements at the different clock tree levels, and also across different voltage islands. From a given clock, it could then balance only the end points of the defined skew groups in a single call to the CTS engine. The tool should be able to discover skew groups by analyzing connected components in the timing data structure. Using skew groups saves processing time because the tool isn’t trying to balance clock endpoints that need not be balanced. It also reduces the number of buffers inserted, and eliminates manual CTS specifications and multiple CTS runs.
Slew Shaping for Optimal Timing-Power Tradeoff
Slew is the clock transition, or how long it takes for the clock to switch. Slower slew leads to more timing violations and signal integrity issues, but uses less power. Faster slew draws more power but can improve timing and SI issues. For any given clock tree, CTS must find the optimal slew that finds the best tradeoffs between timing and power. Traditional flows tend to be pessimistic for timing, which is safe for timing and SI considerations, but not effective for power reduction. Intelligent slew shaping techniques help achieve this optimal target where both the timing requirements and the power budgets for a given design are met.
In advanced node designs, clock trees have become extremely complex circuits with different clock tracing per circuit mode of operation. The growth of mode/corner/power states and the large variations of resistance seen across various process corners pose new challenge in minimizing the power used by clock trees. Designers need clock synthesis and optimization tools that are built to handle MCMM scenarios and that use advanced CTS techniques like slew shaping, skew groups, and intelligent clock gating. With MCMM and low-power CTS optimizations, designers can reclaim a significant amount of power from their clock trees without sacrificing area, timing, or performance, or time to closure.
Arvind Narayanan is a product marketing manager at Mentor Graphics. He holds a Masters in Electrical Engineering from Mississippi State University and a Masters in Business Administration from Duke University. He has held various design, application engineering, and marketing positions and has been very active in the UPF initiative.
Mentor Graphics Corporation