## Short Papers\_ # Estimation of Average Switching Activity in Combinational Logic Circuits Using Symbolic Simulation José Monteiro, Srinivas Devadas, Abhijit Ghosh, Kurt Keutzer, and Jacob White Abstract—We address the problem of estimating the average switching activity of combinational circuits under random input sequences. Switching activity is strongly affected by gate delays, and for this reason we use a variable delay model in estimating switching activity. Unlike most probabilistic methods that estimate switching activity, our method takes into account correlation caused at internal gates in the circuit due to reconvergence of input signals. This method assumes a particular delay model and further assumes that the primary inputs to the combinational circuit are uncorrelated. Both these assumptions can be relaxed at the cost of increased complexity. We describe extensions to handle transmission gates and inertial delays in this paper. Index Terms—Power dissipation, probabilistic analysis, symbolic simulation. #### I. Introduction For many consumer electronic applications, low average power dissipation is desirable, and for certain special applications low power dissipation is of critical importance. For personal communication applications such as hand-held mobile telephones, low-power dissipation may be the tightest constraint in the design. More generally, with the increasing scale of integration, we believe that power dissipation will assume greater importance, especially in multichip modules where heat dissipation is one of the biggest problems. We address the problem of estimating the **average switching activity** in combinational circuits given random input sequences. This measure is related to the average power dissipation of a circuit and can be used to make architectural or design-style decisions during the very large scale integrated (VLSI) circuit synthesis process. These measures can also be used to drive logic optimization methods that target low-power dissipation. Probabilistic methods for power or current estimation are attractive because statistical estimates can be obtained without recourse to time-consuming exhaustive simulation. In the past, probabilistic peak current estimation methods (e.g., [5]) that compute probabilistic current waveforms for combinational circuits have been developed. Estimation of worst case power dissipation (e.g., [7] and [8]) is a difficult problem requiring a branch-and-bound search, and these methods have been applied to small to moderate sized circuits. Manuscript received February 6, 1995; revised February 17, 1996. This work was supported in part by the Defense Advanced Research Projects Agency under contract N00014-91-J-1698 and in part by the National Science Foundation under the NYI Program with matching funds from Mitsubishi and IBM Corp. This paper was recommended by Associate Editor, F. Somenzi. J. Monteiro, S. Devadas, and J. White are with the Massachusetts Institute of Technology, Cambridge, MA 02139 USA. A. Ghosh was with Mitsubishi Electric Research Laboratories, Sunnyvale, CA 94086 USA. He is now with Synopsys Inc., Mountain View, CA 94043. K. Keutzer is with Synopsys Inc., Mountain View, CA 94043 USA. Publisher Item Identifier S 0278-0070(97)02058-7. Our main contribution in this paper is to show that based on some realistic assumptions about transistor-level behavior, the problem of estimating switching activity in combinational circuits can be reduced to one of computing signal probabilities<sup>1</sup> of a multilevel circuit derived from the original circuit by a process of **symbolic simulation**. The work closest to ours is the transition density calculation method by Najm [12]. Transition densities correspond to average switching rates for gates in the circuit. In [12], an interconnection of combinational logic modules, each with a certain delay, makes up a circuit. Transition densities are propagated through combinational logic modules without regard to their structure. Correlations between internal lines due to reconvergence are ignored during propagation. It is possible to take into account correlations by lumping all the modules into one large module, but in this case the information regarding the delay of the individual modules is lost. For an excellent review of power estimation methods, the reader is referred to [13]. Our work on switching activity estimation, originally presented in [9], has the following important characteristics. We use a variable delay model for combinational logic in our symbolic simulation method, which correctly computes the Boolean conditions that cause glitching (multiple transitions at a gate) in the circuit. In some cases, glitching may account for a significant percentage of the switching activity. Symbolic simulation produces a set of Boolean functions that represent the conditions for switching at different time points for each gate in the circuit. Given input switching rates, we can use various exact or approximate methods to compute the probability of each gate switching at any particular time point. We then sum these probabilities over all the gates to obtain the expected switching activity in the entire circuit over all the time points corresponding to a clock cycle. Our method takes into account correlation caused at internal gates in the circuit due to reconvergence of input signals (reconvergent fan-out). The rest of this paper is organized as follows. In Section II, we first review the physical model for power estimation, introduce basic terminology, and state all the key assumptions of our work. In Section III, we describe a symbolic simulation method that computes the Boolean conditions required for a gate to switch at each time point during a clock cycle under unit and variable delay models. We describe some extensions to this method in Section IV. In Section V, we present results on several circuits, including those from the ISCAS-89 benchmark set. We conclude in Section VI. #### II. PRELIMINARIES AND ASSUMPTIONS #### A. A Power Dissipation Model In this section, we observe that under a simplified model of the energy dissipation in CMOS circuits, the energy dissipation of a CMOS circuit is directly related to the switching activity. In particular the three simplifying assumptions are the following. 1) The only capacitance in a CMOS logic-gate is at the output node of the gate. <sup>1</sup>The signal probability of a wire in a circuit is defined as the probability that a randomly generated input vector will produce a one value at the wire [1]. - 2) Either current is flowing through some path from $V_{DD}$ to the output capacitor, or current is flowing from the output capacitor to ground. - 3) Any change in a logic-gate output voltage is a change from $V_{DD}$ to ground or vice-versa. All of these are reasonably accurate assumptions for well-designed CMOS gates [10] and when combined imply that the energy dissipated by a CMOS logic gate each time its output changes is roughly equal to the change in energy stored in the gate's output capacitance. If the gate is part of a synchronous digital system controlled by a global clock, it follows that the average power dissipated by the gate is given by $$P_{\text{avg}} = 0.5 \times C_{\text{load}} \times (V_{dd}^2/T_{\text{cvc}}) \times E(\text{transitions})$$ (1) where $P_{\rm avg}$ denotes the average power, $C_{\rm load}$ is the load capacitance, $V_{dd}$ is the supply voltage, $T_{\rm cyc}$ is the global clock period, and $E({\rm transitions})$ is the *expected value* of the number of gate output transitions per global clock cycle [12] or equivalently the average number of gate output transitions per clock cycle. All of the parameters in (1) can be determined from technology or circuit layout information except $E({\rm transitions})$ , which depends on both the logic function being performed and the statistical properties of the primary inputs. #### B. Static Probabilities Consider the case of dynamic CMOS logic (e.g., Domino). At the beginning of each clock cycle, all the gates are precharged, and gates make transitions only if their associated Boolean functions are satisfied. For example, a three-input AND-OR gate's Boolean function might be $$(i_1 \cdot i_2) \lor (i_2 \cdot i_3) \tag{2}$$ where $i_1$ , $i_2$ , and $i_3$ are primary inputs. In this case, the expected value of the number of transitions at this gate's output is $$E(\text{transitions}) = 2 \times P((i_1 \cdot i_2) \vee (i_2 \cdot i_3) = 1)$$ (3) where P(x) is defined as probability that x is true, and the factor of two in the equation accounts for the reset transition during precharge. To evaluate (3), it is necessary to determine the primary input probabilities. We assume that *primary inputs are uncorrelated* and that each is a waveform in time whose value is either zero or one, changing instantaneously at global clock edges. Assuming ergodicity without further comment, the probability of a particular input $i_j$ being one at a given point in time, denoted $p_j^{\rm one}$ , is given by $$p_j^{\text{one}} = \lim_{N \to \infty} \frac{\sum_{k=1}^{N} i_j(k)}{N}$$ (4) where N is the total number of global clock cycles, and $i_j(k)$ is the value of input $i_j$ during clock cycle k. Clearly, the probability that $i_j$ is zero at a given point in time, denoted $p_j^{\rm zero}$ is $$p_j^{\text{zero}} = 1 - p_j^{\text{one}}.$$ We refer to $p_i^{\text{zero}}$ and $p_i^{\text{one}}$ as **static probabilities**. Note that $$P((i_1 \cdot i_2) \lor (i_2 \cdot i_3) = 1) \neq p_1^{\text{one}} p_2^{\text{one}} + p_2^{\text{one}} p_3^{\text{one}}$$ because the first and second product terms are not independent. Rather $$P((i_1 \cdot i_2) \lor (i_2 \cdot i_3) = 1) = P((i_1 \cdot i_2) \lor (\overline{i_1} \cdot i_2 \cdot i_3) = 1)$$ $$= p_1^{\text{one}} p_2^{\text{one}} + p_1^{\text{zero}} p_2^{\text{one}} p_3^{\text{one}}$$ where the second equality holds because $i_1 \cdot i_2$ is disjoint from $\overline{i_1} \cdot i_2 \cdot i_3$ . In this example, $(i_1 \cdot i_2) \vee (\overline{i_1} \cdot i_2 \cdot i_3)$ represents a disjoint cover for the logic function, and the terms $i_1 \cdot i_2$ and $\overline{i_1} \cdot i_2 \cdot i_3$ are referred to as *cubes* in the cover [2]. The equivalent logical expression, $(i_1 \cdot i_2) \vee (i_2 \cdot i_3)$ , does not represent a disjoint cover because $i_1 \cdot i_2 \cdot i_3$ is contained in both cubes $i_1 \cdot i_2$ and $i_2 \cdot i_3$ . In general, given a disjoint cover for a Boolean function of uncorrelated inputs described by static probabilities, it is easy to determine the probability of the function evaluating to a one. The procedure is given in the following two theorems whose proof follows directly from elementary probability [14]. Theorem 2.1: Given any disjoint cover for a Boolean function, the probability of the function evaluating to a one is equal to the sum of the probabilities of each of the cubes in the cover evaluating to a one. Theorem 2.2: Given a logical function of uncorrelated inputs in the form $$f = i_{\alpha_1} \cdot i_{\alpha_2} \cdots i_{\alpha_M} \cdot \overline{i_{\beta_1}} \cdot \overline{i_{\beta_2}} \cdots \overline{i_{\beta_N}}$$ where the $i_{\alpha_j}$ 's are nonnegated inputs, and the $\overline{i_{\beta_k}}$ 's are negated inputs, then $$P(f=1) = p_{\alpha_1}^{\text{one}} \cdot p_{\alpha_2}^{\text{one}} \cdots p_{\alpha_M}^{\text{one}} \cdot p_{\beta_1}^{\text{zero}} \cdot p_{\beta_2}^{\text{zero}} \cdots p_{\beta_N}^{\text{zero}}.$$ #### C. Transition Probabilities For static CMOS combinational logic, a gate output can only change when its inputs change, and then only if the Boolean function describing the gate evaluates differently. For example, a two-input AND gate's output will change between clock cycle t and t+1 if $$(i_1(t) \cdot i_2(t)) \oplus (i_1(t+1) \cdot i_2(t+1))$$ (5) evaluates to one, where $i_1(t)$ , $i_2(t)$ , and $i_1(t+1)$ , $i_2(t+1)$ are the inputs to the gate at clock cycle t and t+1, respectively. The disjoint cover for (5) is $$(i_{1}(t) \cdot i_{2}(t)) \cdot \overline{(i_{1}(t+1))}$$ $$\vee (i_{1}(t) \cdot i_{2}(t)) \cdot (i_{1}(t+1) \cdot \overline{i_{2}(t+1)})$$ $$\vee \overline{i_{1}(t)} \cdot (i_{1}(t+1) \cdot i_{2}(t+1))$$ $$\vee (i_{1}(t) \cdot \overline{i_{2}(t)}) \cdot (i_{1}(t+1) \cdot i_{2}(t+1)).$$ (6) It is not possible to use Theorem 2.2 to evaluate the probability of (6), because an input at time t+1 is correlated to its behavior at time t (as in a sequential circuit). Instead, **transition probabilities** for the transitions $0 \to 0$ , $0 \to 1$ , $1 \to 0$ , and $1 \to 1$ must be used. We denote these transition probabilities by $p_j^{00}$ , $p_j^{01}$ , $p_j^{10}$ , and $p_j^{11}$ , respectively, where for example $p_j^{10}$ is defined by $$p_j^{10} = \lim_{N \to \infty} \frac{\sum_{k=1}^{N} i_j(k) \overline{i_j(k+1)}}{N}.$$ (7) The other transition probabilities follow similarly. Static probabilities can be computed from transition probabilities, but *not* vice-versa, because of correlation between one time frame and the next. So in general it is necessary to specify transition probabilities. The relations between static probabilities and transition probabilities follow directly from the definitions in (4) and (7), specifically $$p_j^{\text{zero}} = p_j^{00} + p_j^{01}$$ $$p_j^{\text{one}} = p_j^{11} + p_j^{10}.$$ (8) Both static and transition probabilities are used to compute $E({\rm transitions})$ for static logic circuits, as can be seen from the expression for the probability that (6) evaluates to a one $$p_1^{10} \cdot p_2^{\text{one}} + p_1^{11} \cdot p_2^{10} + p_1^{01} \cdot p_2^{\text{one}} + p_1^{11} \cdot p_2^{01}.$$ (9) For all primary inputs, we assume that successive input vectors are uncorrelated.<sup>2</sup> A one or a zero may be equally likely, in which case all transition probabilities may be assumed to be 0.25, and all static probabilities to be 0.5. It is also possible that a one or a zero at a particular primary input are not equally likely resulting in nonuniform transition and static probabilities. We assume that these probabilities are provided by the user. #### D. General Combinational Networks The algorithm for computing the average power dissipated in a dynamic CMOS combinational network follows directly from the approach in Section II-B. That is, for each gate $g_i$ in a logical network, we first determine the Boolean function of the gate, $f_i$ , in terms of the network's primary inputs. Then we find a disjoint cover for $f_i$ and use Theorems 2.1 and 2.2 to evaluate $P(f_i = 1)$ . The average power dissipation for the network is then $$\begin{split} P_{\rm avg} &= 0.5 \times \left(V_{dd}^2/T_{\rm cyc}\right) \times \sum C_i \times E \, (\text{transitions of } g_i) \\ &= \left(V_{dd}^2/T_{\rm cyc}\right) \times \sum C_i \times P(f_i = 1) \end{split} \tag{10}$$ where $C_i$ is the load capacitance of the *i*th gate, and the summation is over all gates in the circuit. Note that (10) follows from the fact that the average value of a sum is equal to the sum of the average values, regardless of correlation [14]. A similar overall approach can be used to compute the average power dissipated in a static CMOS combinational network. However, as described in Section II-C, a two-vector input sequence is required to stimulate activity in static gates. Therefore, the Boolean function for static gate output activity is different than that for a dynamic gate, and the probability of the function being satisfied requires transition probabilities for evaluation. In particular, assuming negligible gate delays, a static CMOS logic gate's output will switch with a change in the primary input vector from V0 to Vt if $$f_i = ((h_i(V0) = 0) \land (h_i(Vt) = 1))$$ $$\lor ((h_i(V0) = 1) \land (h_i(Vt) = 0))$$ (11) is satisfied, where $h_i$ is the logic function corresponding to gate $g_i$ 's output. The average power dissipated in the static network is then computed by using (10), with $f_i$ being given by (11), and $P(f_i = 1)$ is evaluated using both transition and static probabilities, as described in Section II-C. #### III. SYMBOLIC SIMULATION TO ESTIMATE SWITCHING ACTIVITY As mentioned above, for static CMOS circuits, switching activity must be analyzed based on considering an input vector pair, denoted $\langle V0,Vt\rangle$ . If the gates have appreciable delays, there may be output glitches that can contribute significantly to dissipated power. For instance, consider the circuit of Fig. 1. Assuming that the delays of the INVERTER and the AND gate are both one time unit, if we apply the vector $i_1 = 0$ , $i_2 = 0$ , followed by $i_1 = 1$ , $i_2 = 1$ , we will obtain a glitch at the output *out*, which could cause power dissipation. Below, we present a *symbolic simulation* method that can be used to generate a multiple-output function that represents the total switching activity over any possible input vector pair, assuming a variable delay model for the gates in the circuit. Fig. 1. Glitching in a static CMOS circuit. Fig. 2. A multilevel combinational circuit. #### A. Symbolic Simulation 1) An Example: Consider the multilevel combinational circuit shown in Fig. 2. It has four primary inputs and one output and consists of four CMOS gates. Fig. 3 shows the signal transitions at the primary input nodes as well as the possible transitions at the internal nodes of the network. The time points are in normalized units. The first three inputs, $i_1$ , $i_2$ , and $i_3$ , switch simultaneously between time periods zero and one. The fourth input, $i_4$ , is a late arriving signal that switches between the time points five and six. In this example, the delays of both Gate 1 and Gate 3 are one time unit. Gate 2 has a delay of two units while Gate 4 has a delay of four units. The $e_i$ waveforms in Fig. 3 represent the signals at the output of the ith logic gate. Each of the possible transitions, $e_{i,j}$ , represents either a low-to-high or high-to-low signal transition between $[j]^{\rm th}$ and $[j+1]^{\rm th}$ time points. The number of all possible transitions at a gate output may equal the sum of all possible transitions at the gate inputs. These transitions are delayed by the gate's propagation delay. 2) Unit-Delay Model: Even under the idealization of a unit-delay model, the gate output nodes of a multilevel network can have multiple transitions in response to a two-vector input sequence. In fact, it is possible for a gate output to have as many transitions as levels in the network. We construct the Boolean functions describing the gate outputs at the discrete points in time implied by the unit-delay model. That is, we consider only discrete times $t, t+1, \cdots, t+l$ , where t is the time when the inputs change from V0 to Vt, and l is the number of levels in the network. For each gate output i, we use symbolic simulation to construct the l+1 Boolean functions $f_i(t+j)$ , $j \in \{0, \cdots, l\}$ which evaluate to one if the gate's output is one at time t+j. Note that as we assume no gate has zero delay and that the network has settled before the inputs are changed from V0 to Vt, $f_i(t)$ is the logic function performed for V0 at the ith gate output. Finally, we can determine whether a transition occurs at a boundary between discrete time intervals t+j and t+j+1 by exclusive-OR'ing $f_i(t+j)$ with $f_i(t+j+1)$ . Consider the network in Fig. 1. For this network $$f_1(t) = \overline{V0_1}$$ $$f_2(t) = \overline{V0_1} \wedge V0_2.$$ Assuming both gates have unit delay $$f_1(t+1) = \overline{i_1(t)} = \overline{Vt_1}$$ $f_2(t+1) = f_1(t) \land i_2(t) = \overline{V0_1} \land Vt_2.$ <sup>&</sup>lt;sup>2</sup>This assumption is relaxed in the work of [15]. Fig. 3. Signal waveforms for the primary inputs and the outputs of the logic gates. Finally $$f_2(t+2) = f_1(t+1) \wedge i_2(t+1) = \overline{Vt_1} \wedge Vt_2.$$ For this circuit there are three possible transitions: that the IN-VERTER changes state from t to t+1, that the AND gate changes state from t to t+1, and that the AND gate changes state from t+1 to t+2. The Boolean equations for these transitions are, respectively $$e_{1,1} = f_1(t) \oplus f_1(t+1)$$ $$e_{2,1} = f_2(t) \oplus f_2(t+1)$$ $$e_{2,2} = f_2(t+1) \oplus f_2(t+2).$$ As an aside, note that for this example, the two-vector sequence $V0_1=0,\ V0_2=0,\ Vt_1=1,\ Vt_2=1$ simultaneously satisfies $e_{1,1},\ e_{2,1}$ , and $e_{2,2}$ . - 3) General Delay Model: A gate with a large fan-in may have several times the delay of an inverter. If one uses normalized time units, one can always introduce unit-delay buffers at the output of gates in a circuit, which have a delay greater than unity, in order to model differing delays among logic gates. Our symbolic simulator is able to simulate circuits with arbitrary gate transport delays without introducing any unit delay buffers in the circuit. - 4) Pseudocode for Symbolic Simulation: The pseudocode for the symbolic simulation algorithm is presented in Fig. 4. The simulator processes one gate at a time, moving from the primary inputs to the primary outputs of the circuit. For each gate $g_i$ , an ordered list of the possible transition times of its inputs is first obtained. Then, possible transitions at the output of the gate are derived, taking into account transport delays from each input to the gate output. The processing done is similar to the "time-wheel" in a timing simulator. #### B. Signal Probability Evaluation Given the $f_i(t+j) \oplus f_i(t+j+1)$ functions, we can construct a disjoint cover for each of these single-output functions and compute the exact probability of a transition occurring between the boundary of t+j and t+j+1. Instead of enumerating disjoint covers, ordered binary decision diagrams (BDD's) [4] can be used for the calculation of signal probability. It has been shown in [6] and [12] that exact signal probability calculation for a given function can be performed by a linear traversal of a BDD representation of a logic function. We have implemented methods for signal probability calculation using BDD's. In some cases, the BDD's for the generated functions may be too large. The signal probability calculation can be done by a process of random logic simulation. For each Boolean function $e_{i,j}$ a large number of randomly generated vectors are simulated on the network corresponding to the Boolean function till the signal probability value converges to within 0.1%. Levelized/event-driven simulation methods that simulate 32 vectors at a time can be used in an efficient probability evaluation scheme. The probabilities thus obtained are statistical approximations. A final alternative is to use the cutting algorithm described in [1, pp. 193–195], which computes signal probability bounds for outputs of logic gates in arbitrary combinational circuits, given input signal probabilities. The exact probability is guaranteed to lie within the computed bounds. The basic strategy requires a single traversal of the circuit from inputs to outputs, and the computational complexity grows linearly with the size of the circuit. If one does not wish to underestimate the average-case power dissipation/switching activity, the upper bound on the signal probabilities of the generated Boolean functions evaluating to a one can be used. If one does not wish to overestimate the average-case switching activity, the lower bound on the signal probabilities of the generated Boolean functions evaluating to a one can be used. ``` 1. Gates = Topological_Sort( Network ); 2. for each g_i in Gates { 3. \Delta = \text{delay of } g_i ; 4. TimePoints = NIL(LIST); 5. for each input g_j of g_i ( g_{i_1}, \dots, g_{i_m} ) { 6. if g_i is a primary input then TimePoints = InsertInOrder ( TimePoints, { (0, V0_j), (t, Vt_j) } ); 7. 8. else for each time point (k, f_j(k)) of g_j { 9. TimePoints = InsertInOrder ( TimePoints, (k, f_i(k)) ); 10. 11. 12. /* \tilde{g}_i is the Boolean function of gate g_i with respect to its immediate inputs */ 13. f_i(0) = \tilde{g}_i(f_{i_1}(0), \cdots, f_{i_m}(0)); 14. 15. for each new time point k in TimePoints { 16. f_i(k+\Delta) = \tilde{g}_i(f_{i_1}(k),\cdots,f_{i_m}(k)) ; 17. e_{i,k+\Delta} = f_l \oplus f_i(k+\Delta); 18. l=k+\Delta; 19. } 20. } ``` Fig. 4. Pseudocode for the symbolic simulation algorithm. #### IV. EXTENSIONS We describe extensions to the basic symbolic simulation strategy of the previous section to handle transmission gates and inertial delays. #### A. Transmission Gates In this section we describe our power estimation scheme for the case of combinational circuits with embedded transmission gates. Symbolic simulation can be extended to estimate average switching activity in such circuits as described below. Transmission gates have an input, an output, and a control line. When the control line is high, the outputs are identical to inputs. When the control line is low, however, the output is given by value stored in the previous time instant. It is this feature of having memory that makes transmission gate different from normal combinational gates like an AND gate. In mathematical terms, if a is the input, b the control, and x the output, then at any time instant t, the output of a transmission gate is given as $$x(t) = b(t) \cdot a(t) + \overline{b(t)} \cdot x(t-1) \tag{12}$$ where t-1 refers to the previous time instance. From the switching activity estimation viewpoint, the symbolic simulation approach handles transmission gates (or transparent latches) in a straightforward manner. Since x(t-1) is computed before x(t) in the simulation, we create functions corresponding to the different x(t)'s and use them in simulating the fan-out gates. We use the symbolic input b(t) during symbolic simulation of x(t). As the symbolic simulation proceeds, the known equations for the time points for each input are used, and the logic equations corresponding to the various transitions at the output of the latch are computed. As a result, in a single pass from inputs to outputs, switching activity estimation can be carried out for an acyclic circuit. If the initial value x(0) is known, it is replaced by the appropriate zero or one value during symbolic simulation. If the initial value x(0) is not known, it can be replaced by a Boolean variable with a signal probability of 0.5 during symbolic simulation. #### B. Inertial Delays Logic gates require energy to switch state. The energy in an input signal to a gate is a function of its amplitude and duration. If its duration is too small, the signal will not force the gate to switch. The minimum duration for which an input change must persist in order for the gate to switch states is called the **inertial delay** of an element and is denoted by $\delta$ [3, p. 187]. Inertial delay is usually modeled at the inputs to gates, however, for our purposes it is more convenient to model it at the gate output. We will assign an integer<sup>3</sup> $\Delta_i \geq 0$ to each gate i. $\Delta_i$ is obtained from process and device parameters like propagation delay. We then require that any pair of output transitions at i be separated by at least a duration $\Delta_i$ . The symbolic simulation proceeds as described in the previous sections to compute $f_i(t), \dots, f_i(t+l)$ . If we have $\Delta_i > 0$ , then if there is a transition between time t and t+1 we cannot have a transition between t+1 and $t+\Delta_i$ . Therefore, if we have three different time points $f_i(t_1)$ , $f_i(t_2)$ , and $f_i(t_3)$ within $\Delta_i$ from $t_1$ we make sure there are no transitions by making $f_i(t_2) = f_i(t_1)$ when $f_i(t_1) = f_i(t_3)$ . We create $$f_i'(t_2) = f_i(t_2) \wedge (f_i(t_1) \vee f_i(t_3)) \vee (f_i(t_1) \wedge f_i(t_3))$$ for every three time points within $\Delta_i$ . We compute $f_i'(t_3)$ using $f_i'(t_2)$ and $f_i(t_4)$ and so on. The $f_i'(t)$ functions are used as the inputs to the exclusive-or gates to compute the switching activities. Also, we use the $f_i'(t)$ functions for the next logic level, thus any transitions eliminated at the output of a gate are not propagated to its transitive fan-out. ### V. Power Estimation Results Throughout this section, we will be measuring the average power dissipation of the circuit by using (1) summed over all the gates in the circuit. The $E({\rm transitions})$ values are computed for the gates in the circuit under different delay models. Since the circuits are technology-mapped circuits, the load capacitance values of the gates are known. The statistics of the examples used are shown in Table I. All of the examples except the last two belong to the ISCAS-89 sequential benchmark set. Example **add16** is a 16-bit adder and **max16** is a 16-bit maximum function. <sup>3</sup>This number is obtained from process and device parameters like propogation delay. TABLE I STATISTCS OF EXAMPLES | CKT | Inputs | Outputs | Latches | Gates | |-------|--------|---------|---------|-------| | s27 | 4 | 1 | 3 | 10 | | s298 | 3 | 6 | 14 | 119 | | s349 | 9 | 11 | 15 | 150 | | s386 | 7 | 7 | 6 | 159 | | s420 | 19 | 2 | 16 | 196 | | s510 | 19 | 7 | 6 | 211 | | s641 | 35 | 24 | 19 | 379 | | s713 | 35 | 23 | 19 | 393 | | s838 | 35 | 2 | 32 | 390 | | s1238 | 14 | 14 | 18 | 508 | | s1494 | 8 | 19 | 6 | 647 | | add16 | 33 | 17 | 16 | 288 | | max16 | 33 | 16 | 16 | 154 | TABLE II POWER ESTIMATION FOR COMBINATIONAL LOGIC | CKT | Zero Delay | Unit Delay | Variable Delay | Time | | |-------|------------|------------|----------------|-------|-------| | - | Power | Power | Power | BDD | LOGIC | | s27 | 82 | 93 | 93 | 0.1 | 0.2 | | s298 | 922 | 1033 | 1069 | 5.2 | 2.3 | | s349 | 777 | 1094 | 1110 | 9.7 | 6.1 | | s386 | 1070 | 1183 | 1250 | 9.2 | 4.9 | | s420 | 877 | 940 | 958 | 12.0 | 5.2 | | s510 | 993 | 1236 | 1331 | 11.2 | 5.5 | | s641 | 1228 | 1594 | 1665 | 62.6 | 36.3 | | s713 | 1338 | 1847 | 1932 | 151.6 | 92.6 | | s838 | 1727 | 1822 | 1847 | 52.6 | 16.9 | | s1238 | 2394 | 3013 | 3158 | 115.1 | 43.7 | | s1494 | 3808 | 4762 | 5045 | 68.9 | 32.2 | | add16 | 1258 | 1725 | 1741 | 10.4 | 6.1 | | max16 | 599 | 713 | 713 | 4.2 | 1.6 | All the circuits considered are technology-mapped static CMOS circuits. For all the circuits, we assumed uniform static (0.5) and transition (0.25) probabilities for the primary inputs. Note, however, that user-provided nonuniform probabilities could just as easily been used. We focus on estimating switching activity and power dissipation in the *combinational logic* of the given circuits. In Table II, the effects of various delay models on the power estimate are illustrated. In the zero delay model, all gates have zero delay and, therefore, they switch instantaneously. In the unit delay model, all gates have one unit delay. Using the zero delay model ignores glitches in the circuit, and, therefore, power dissipation due to glitches is not taken into account. The unit delay model takes into account glitches, but a constant delay value is assumed for all gates. The variable delay model uses different delays for different gates, thus is the most realistic model. Only the times required to obtain the power estimate for the variable delay model are shown in the last column. The variable delay computations are the most complex and, therefore, power estimation under this model takes the most time. The central processing unit (CPU) times correspond to a DEC 3000/900 with 256 megabytes of memory and are in seconds. The signal probability calculation was done using two different methods. The column BDD corresponds to exact signal probability evaluation of the $e_{i,j}$ functions of Section III using ordered binary decision diagrams. The orderings were obtained automatically using a heuristic strategy. It is possible to speed up the power estimation by 2X to 5X by using hand orderings. Using random logic simulation to evaluate signal probabilities required substantially less CPU time for the large examples as shown in the column LOGIC. Random logic simulation was carried out until the signal probability of each $e_{i,j}$ function converged to within 0.1%. This required the simulation of between 1000–50 000 vectors for the different examples. The power measures obtained using the two methods BDD and LOGIC are identical. These estimates, in microwatts, have been obtained assuming a clock frequency of 20 MHz and a supply voltage of $5~\rm V$ For each circuit, we compared the power estimates obtained by our method to that obtained by timing simulation. A power estimate was obtained by timing simulation using randomly generated input vectors. The same power dissipation model and the same delay model was used in the timing simulation and in symbolic simulation. In order to get within 2% of the BDD-based symbolic simulation estimate, timing simulation sometimes required substantially larger CPU times. Note that under our delay model assumption, it is the timing simulation estimate that is in error, since the symbolic simulation takes into account correlation in internal gates and the BDD-based method computes exact signal probabilities. If random timing simulation is run for a very long time (e.g., 5 h for the large circuits in Table I), the results obtained become identical to the symbolic simulation results. Timing simulation can easily take into account effects such as rise times and fall times. This is more difficult to do in symbolic simulation and increases the complexity of the approach. #### VI. CONCLUSION, LIMITATIONS, AND ONGOING WORK We have presented algorithms for probabilistically estimating the switching activity in combinational logic circuits, implemented either as dynamic or as static circuits. Results indicate that our algorithms are applicable to circuits of moderate size. The most desirable feature of our algorithm is that correlation in internal signals is taken into account under a variable delay model. In order to perform exact signal probability evaluation, we use ordered BDD's. Ordered BDD's cannot be built for large multipliers (≥ 16 bits) and for very large circuits. Approximate techniques (Section III-B) have to be used in these cases. Our experience with random logic simulation for signal probability evaluation has been favorable Given the delay model chosen, our BDD-based method of estimating switching activity is exact. The assumptions stated in Section II are required in order for the switching activity of a CMOS circuit to reflect the power dissipation of the circuit. These assumptions and the delay model assumptions can be relaxed, but this will result in more complex symbolic simulation methods. For example, rise and fall times can be modeled by using a multiple-valued algebra. Under the variable delay model, BDD-based symbolic simulation is usually significantly more efficient as well as being more accurate than random timing simulation. However, this is not true for more sophisticated delay models. Correlation between primary inputs exists when a given combinational circuit is embedded in a larger sequential circuit. The techniques described have to be augmented to handle sequential circuits and primary input correlation, and efforts in this direction are presented in [11] and [15]. #### ACKNOWLEDGMENT The authors thank M. Bershetyn, J. Rabaey, and S. Murai for helpful comments and suggestions and many interesting discussions on power estimation. #### REFERENCES - [1] P. H. Bardell, W. H. McAnney, and J. Savir, *Built-In Test for VLSI: Pseudorandom Techniques*. New York: Wiley, 1987. - [2] R. Brayton, G. Hachtel, C. McMullen, and A. Sangiovanni-Vincentelli, Logic Minimization Algorithms for VLSI Synthesis. Norwell, MA: Kluwer, 1984. - [3] M. A. Breuer and A. D. Friedman, Diagnosis and Reliable Design of Digital Systems. Rockville, MD: Computer Science, 1976. - [4] R. Bryant, "Graph-based algorithms for Boolean function manipulation," IEEE Trans. Comput., vol. C-35, no. 8, pp. 677–691, Aug. 1986. - [5] R. Burch, F. Najm, P. Yang, and D. Hocevar, "Pattern-independent current estimation for reliability analysis of CMOS circuits," in *Proc.* 25th Design Automation Conf., June 1988, pp. 294–299. - [6] S. Chakravarty, "On the complexity of using BDD's for the synthesis and analysis of Boolean circuits," in *Proc. 27th Annual Allerton Conf. Communication, Control, and Computing*, Sept. 1989, pp. 730–739. - [7] S. Chowdhury and J. S. Barkatullah, "Estimation of maximum currents in MOS IC logic circuits," *IEEE Trans. Computer-Aided Design*, vol. 9, pp. 642–654, June 1990. - [8] S. Devadas, K. Keutzer, and J. White, "Estimation of power dissipation in CMOS combinational circuits," in *Proc. Custom Integrated Circuits Conf.*, May 1990, pp. 19.7.1–19.7.6. - [9] A. Ghosh, S. Devadas, K. Keutzer, and J. White, "Estimation of average switching activity in combinational and sequential circuits," in *Proc.* 29th Design Automation Conf., June 1992, pp. 253–259. - [10] L. Glasser and D. Dobberpuhl, The Design and Analysis of VLSI Circuits. Reading, MA: Addison-Wesley, 1985. - [11] J. Monteiro and S. Devadas, "Techniques for the power estimation of sequential logic circuits under user-specified input sequences and programs," in *Proc. Int. Symp. Low Power Design*, Apr. 1995, pp. 33–38. - [12] F. Najm, "Transition density: A new measure of activity in digital circuits," *IEEE Trans. Computer-Aided Design*, vol. 12, no. 2, pp. 310–323, Feb. 1993. - [13] F. Najm, "A survey of power estimation techniques in VLSI circuits (Invited Paper)," *IEEE Trans VLSI Syst.*, vol. 2, no. 4, pp. 446–455, Dec. 1994. - [14] A. Papoulis, *Probability, Random Variables and Stochastic Processes*, 3rd ed. New York: McGraw-Hill, 1991. - [15] C.-Y. Tsui, J. Monteiro, M. Pedram, S. Devadas, A. Despain, and B. Lin. "Power estimation for sequential logic circuits," *IEEE Trans. VLSI Syst.*, vol. 3, no. 3, pp. 404–416, Sept. 1995.