

Efficient Ternary Approximate Multiplier with Balanced Encoding Scheme and Optimized 4-2 Compressor

Wanbo Hu, Wanting Wen, Ziye Li, Guangchao Zhao and Mingqiang Huang

EasyChair preprints are intended for rapid dissemination of research results and are integrated with the rest of EasyChair.

February 27, 2024

# Efficient Ternary Approximate Multiplier with Balanced Encoding Scheme and Optimized 4-2 Compressor

Wanbo Hu<sup>1</sup>, Wanting Wen<sup>1</sup>, Ziye Li<sup>1</sup>, Guangchao Zhao<sup>2</sup>, Mingqiang Huang<sup>1,\*</sup>

<sup>1</sup>Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China <sup>2</sup>CNRS-NTU-THALES Research Alliances/UMI 3288, Singapore \*Email: mq.huang2@siat.ac.cn

Abstract- Approximate computing is a technique adjusting computational precision to alleviate circuit area and power consumption. This work proposes high efficiency ternary approximate multiplier in balanced ternary encoding scheme. Compared with the accurate multiplier (2-trit), the proposed multiplier shows much simpler circuit complexity with 43.5%-off in circuit area and 74.4%-off in power-delay product with only 2.25% computation error. Furthermore, optimized 4-2 compressor is proposed for the high bitwidth Wallace-tree based multiplier (6-trit). The whole design has been validated through HSPICE simulations using carbon nanotube field-effect transistors as the basic device cells. When compared with the previous works, the poroposed design shows 50.0%-off in power-delay product and 42.3%-off in circuit area, showing great potential for the future applications

Keywords—Multi-valued Logic ternary Logic Circuits, CNTFET, ternary multiplier, approximate computing, ternary adder.

#### I. INTRODUCTION

The conventional trajectory of improving chip performance by scaling down transistor features faces escalating challenges [1-3], necessitating an exploration of alternative strategies beyond Moore's Law. One promising avenue is the paradigm of 'more than Moore', wherein multi-valued logic (MVL) systems, surpassing the binary logic's limitations (0, 1), inherently offer higher data density and computational potential [4]. Among the MVL systems, ternary logic systems (logic radius r > 2) have emerged as efficient solutions [5]. These systems employ either balanced ternary logic (-1, 0, +1) or unbalanced ternary logic (0, 1, 2) to represent ternary numbers, with the former being perceived as more efficient due to its representation.

However, historical implementations of ternary arithmetic logic circuits, particularly those based on complementary metal-oxide-semiconductor (CMOS) transistors, encountered notable challenges. These challenges revolved around reduced gate control, parameter fluctuations, and notably, high power consumption attributed to static currents in the intermediate state [6]. This necessitates the exploration of more efficient electronic devices to overcome these limitations. Carbon nanotube field-effect transistors (CNTFETs) have emerged as promising alternatives due to their controllable threshold voltage and superior carrier mobility[7]. The current research primarily focuses on optimizing circuit-level efficiency using CNTFETs [8,9]. There's been emergence in the design of combination and sequential logic circuits employing ternary logic [10]. However, while circuit-level designs have been diverse, relatively fewer studies have delved into system-level design, despite the pressing need for efficient ternary systems.

Approximate computing is a technique adjusting computational precision to alleviate power constraints, notably applied in applications like image processing, video processing, and data mining. Approximate circuits simplify computations like multiplication, enhancing performance while consuming less power compared to precise circuits. Past designs of approximate adders or multipliers aimed at enhancing the energy efficiency of ternary arithmetic logic circuits [11,12]. The successful application of approximate multiplication in unbalanced ternary logic has proven the powerful impact of approximate computing on optimizing circuit energy efficiency [13]. Building upon this, we propose balanced ternary approximate adders and multipliers.

This paper introduces a design methodology suitable for balanced ternary multiplication circuits. To simplify the computation process of balanced ternary multiplication, a balanced approximate adder is firstly proposed. Then  $2\times 2$ balanced approximate multiplier is designed based on this balanced approximate adder. Finally These components are utilized in designing the  $6\times 6$  balanced approximate multiplication circuit, further optimized through the 4-2 compression method. All circuits are verified via HSPICE simulations and compared against previous designs in terms of energy efficiency.

The rest of this article is organized as follows. In Section 2, the algorithms and error comparison of balanced and unbalanced approximations are given. Section 3 presents the design method of the  $6\times6$  balanced approximate ternary multiplier and the design scheme of the  $6\times6$  balanced approximate multiplier based on the optimized 4-2 compressor. Section 4 presents conclusions and future work.

This work was supported by the Shenzhen Science and Technology Inno vation Committee (JCYJ20200109115210307), National Natural Science Fo undation of China (62106254), STI 2030.Major Projects (2022ZD0210600). Corresponding author: Mingqiang Huang (mq.huang2@siat.ac.cn)



Mean Absolute Percentage Error= 7% Mean Percentage Error = -7%

#### (b)

| ( | /                  |                    |                  |                    |      |                  |                    |  |  |  |  |
|---|--------------------|--------------------|------------------|--------------------|------|------------------|--------------------|--|--|--|--|
|   | <b>T</b>           | <b>T</b>           | Αςςι             | ırate              |      | Approximate      |                    |  |  |  |  |
|   | Ternary<br>input A | Ternary<br>input B | Ternary<br>carry | Ternary<br>product |      | Ternary<br>carry | Ternary<br>product |  |  |  |  |
|   | 0                  | 0                  | 0                | 0                  |      | 0                | 0                  |  |  |  |  |
|   | 0                  | 1                  | 0                | 0                  |      | 0                | 0                  |  |  |  |  |
|   | 0                  | 2                  | 0                | 0                  | 1    | 0                | 0                  |  |  |  |  |
|   | 1                  | 0                  | 0                | 0                  |      | 0                | 0                  |  |  |  |  |
|   | 1                  | 1                  | 0                | 1                  |      | 0                | 1                  |  |  |  |  |
|   | 1                  | 2                  | 0                | 2                  | 1    | 0                | 2                  |  |  |  |  |
|   | 2                  | 0                  | 0                | 0                  |      | 0                | 0                  |  |  |  |  |
|   | 2                  | 1                  | 0                | 2                  |      | 0                | 2                  |  |  |  |  |
|   | 2                  | 2                  | 1                | 1                  | appx | 0                | 2                  |  |  |  |  |
|   |                    |                    |                  |                    |      |                  |                    |  |  |  |  |

(c)

| 2x2<br>APPX |    | A1A0 |    |    |    |    |    |     |    |     |  |  |  |
|-------------|----|------|----|----|----|----|----|-----|----|-----|--|--|--|
|             |    | 00   | 01 | 02 | 10 | 11 | 12 | 20  | 21 | 22  |  |  |  |
|             | 00 | 0    | 0  | 0  | 0  | 0  | 0  | 0   | 0  | 0   |  |  |  |
|             | 01 | 0    | 0  | 0  | 0  | 0  | 0  | 0 0 |    | 0   |  |  |  |
|             | 02 | 0    | 0  | -2 | 0  | 0  | -2 | -6  | -6 | -8  |  |  |  |
|             | 10 | 0    | 0  | 0  | 0  | 0  | 0  | 0   | 0  | 0   |  |  |  |
| B1B0        | 11 | 0    | 0  | 0  | 0  | 0  | 0  | 0   | 0  | 0   |  |  |  |
|             | 12 | 0    | 0  | -2 | 0  | 0  | -2 | -6  | -6 | -8  |  |  |  |
|             | 20 | 0    | 0  | -6 | 0  | 0  | -6 | 0   | 0  | -6  |  |  |  |
|             | 21 | 0    | 0  | -6 | 0  | 0  | -6 | 0   | 0  | -6  |  |  |  |
|             | 22 | 0    | 0  | -8 | 0  | 0  | -8 | -6  | -6 | -14 |  |  |  |

| (d) (this work) (e)                      | )       |         |                  |                |      |                  |                | (f) |            |    |    |      |      |     |      |       |    |    |
|------------------------------------------|---------|---------|------------------|----------------|------|------------------|----------------|-----|------------|----|----|------|------|-----|------|-------|----|----|
| 2trit *2trit approximate multiplier      | Ternary | Ternary | Αςςι             | ırate          |      | Appro            | ximate         | 2   | x2         |    | A  | .1A( | (T 1 | epr | esen | ts -1 | )  |    |
| $B_1  A_1  B_1  A_0  B_0  A_1  B_0  A_0$ | input A | input B | Ternary<br>carry | Ternary<br>sum |      | Ternary<br>carry | Ternary<br>sum | AF  | PX         | 11 | 10 | 1T   | 01   | 00  | 0Т   | T1    | T0 | TT |
|                                          | -1      | -1      | -1               | 1              | appx | 0                | -1             |     | 11         | -3 | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 3  |
| TMul TMul TMul TMul                      | -1      | 0       | 0                | -1             |      | 0                | -1             |     | 10         | 0  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 0  |
|                                          | -1      | 1       | 0                | 0              |      | 0                | 0              |     | <b>1</b> T | 0  | 0  | 3    | 0    | 0   | 0    | -3    | 0  | 0  |
| APPXA                                    | 0       | -1      | 0                | -1             |      | 0                | -1             |     | 01         | 0  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 0  |
|                                          | 0       | 0       | 0                | 0              |      | 0                | 0              | B1B | 0 00       | 0  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 0  |
|                                          | 0       | 1       | 0                | 1              |      | 0                | 1              |     | 0Т         | 0  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 0  |
| $P_2$ $P_1$ $P_0$                        | 1       | -1      | 0                | 0              |      | 0                | 0              |     | T1         | 0  | 0  | -3   | 0    | 0   | 0    | 3     | 0  | 0  |
| Mean Absolute Percentage Error= 2.25%    | 1       | 0       | 0                | 1              | аррх | 0                | 1              |     | T0         | 0  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | 0  |
| Mean Percentage Error $= 0\%$            | 1       | 1       | 1                | -1             |      | 0                | 1              |     | ТТ         | 3  | 0  | 0    | 0    | 0   | 0    | 0     | 0  | -3 |

Fig. 1. (a) Schematic of the traditional 2trit unbalanced approximate ternary multiplier. (b) Error source: approximate computing in ternary multiplier. (c) Error distribution of the traditional 2trit approximate multiplier. (d) Schematic of the proposed 2trit approximate ternary multiplier in this work. (e) Error source: approximate computing in ternary adder . (f) Error distribution of the proposed 2trit approximate multiplier. Note: TMul is ternary multiplier, ATmul is approximate ternary multiplier, THA is ternary half adder, TSum is ternary summation. APPXA is ternary approximate adder in our design.

#### **II. SMALL BITWIDTH APPROXIMATE TERNARY MULTIPLIER**

Approximating multiplier can be widely used in signal processing. The previous ternary designs focused on unbalanced encoding scheme (0/1/2). Tabrizchi et al. proposed approximate ternary multiplier using a  $1 \times 1$  approximate ternary multiplier [12]. Their design, based on approximate multipliers rather than adders, ensures higher precision by employing fewer adders. Nevertheless, as the 6×6 ternary multiplier's logic design based on gates and a complex adder structure, previous designs didn't demonstrate a satisfactory precision-power tradeoff. S. Kim et al. introduced an approximation technique employing a carry-truncated ternary multiplier and error compensation circuits for energy-efficient design of the 6×6 approximate ternary multiplier [13]. However, unbalanced multipliers involving carries lack the inherent advantage seen in balanced multipliers compared to their design counterparts. Balanced (-1/0/1) multipliers solely require approximation in the addition process. In this section, we propose a carry-free balanced approximate adder, which is used to design a 2×2 balanced approximate multiplier, thus creating a 6×6 balanced approximate multiplier.

# A. Overview of the Approximate Ternary Multiplier

Fig. 1(a) depicts the gate-level circuit diagram of a  $2 \times 2$ unbalanced ternary approximate multiplier, where accuracy is maintained in addition and approximation is applied in multiplication. That is to say, the adder of THA and Tsum is accurate. Fig. 1(b) illustrates the truth table approximation method for the unbalanced approximate multiplier of ATMul. Since there is only one case for carry, namely "2×2=4 (carry=1 and product=1)". To simplify the circuit, approximate strategy can be apply by " $2 \times 2=2$  (carry=0 and product=2)". Therefore the 1-trit multiplication does not involve carry. However, such strategy makes the multiplier biasing with a negative computation error, and the system error is relative large.

Fig. 1 (d) shows the gate-level circuit of proposed balanced ternary approximate multiplier, in which the accuracy is maintained in multiplication and an attempt is made to approximate in the addition stage. The truth table for the balanced approximate adder is shown in Fig. 1(e). The approximation occurs in case of "1+1" and "-1-1". It can be observed that the approximation eliminates carry generation on combinations where carry occurs, resulting in a reduction in the portion of the circuit responsible for carry generation.

Fig. 1(c) and (f) depicts the error values of the small bit-width ternary approximate multipliers in unbalanced ternary and proposed balanced ternary, respectively. For the balanced ternary approximate multiplier, out of 81 input combinations, only 8 exhibited errors. To analyze the significance of errors, two commonly used error metrics are employed, namely the Mean Absolute Percentage Error (MAPE) and the Mean Percentage Error (MPE), defined as follows:

$$MAPE = \frac{1}{n} \sum \left| \frac{acc_i - appx_i}{acc_i} \right|$$
$$MPE = \frac{1}{n} \sum \frac{acc_i - appx_i}{acc_i}$$

Here, acc<sub>i</sub> represents the accurate result, appx<sub>i</sub> denotes the approximate result, and n is the number of different input combinations. MAPE is the expected value of errors encountered during the approximation process, and thus, precision is highly correlated with MAPE. MPE provides information about whether the errors are biased. If the absolute values of MPEs are equal to MAPEs, the approximation results are completely biased toward one side. It is evident that the MPE of the  $2\times 2$  balanced approximate multiplier scheme is 0, indicating that the approximation bias is uniform. This is significantly better than the MPE of the unbalanced ternary approximate scheme (APPX) at -7%. The calculated MAPE for the  $2\times 2$  balanced ternary approximate scheme is 2.25%, while the MAPE for the unbalanced ternary approximate scheme (APPX) is 7%.

### B. Circuit Design of the Approximate Ternary Multiplier

Based on our prior work utilizing cycling gate adder circuits, we conducted circuit designs [14]. Employing VDD, VDD/2, and GND to represent "1," "0," and "-1," respectively, both p-type and n-type CNTFETs were utilized with three different threshold voltages. Using an improved Quine-McCluskey algorithm, we devised logic gates with the minimum number of transistors, resulting in higher energy efficiency.

The truth table and transistor structure diagram of the Balanced Ternary Adder (APPXA) are depicted in Fig. 2(a) and (b), respectively. The primary functionalities of APPXA are as follows: when inputs [A, B] = [1, -1] / [0, 0] / [-1, 1], the output Y = 0; when inputs [A, B] = [1, 0] / [1, 1] / [0, 1], the output Y = 1; when inputs [A, B] = [-1, 0] / [-1, -1] / [0, -1], the output Y = -1. The computation steps are:

Step 1: Signals A and B undergo preliminary transforma-tions through NTI and PTI (using A as an example): When input A = -1, output  $A_N = 1$ , output  $A_P = 1$ ; When input A = 0, output  $A_N = -1$ , output  $A_P = 1$ ; When input A = 1, output  $A_N = -1$ , output  $A_P = -1$ ; When input A = -1, output  $A_P = -1$ ;

Step 2: Process the signals  $A_N$ ,  $A_P$ ,  $B_N$ , and  $B_P$  in two stages to obtain the final output Y. First stage: For the upper network. When  $A_N = 1 / B_N = 1 / A_P = B_P = 1$ , the upper network outputs  $V_{tpu} = -1$ ; When  $A_N = B_P = -1 / A_P = B_N = -1$ , the upper network outputs  $V_{tpu} = 1$ ; For the lower network. When  $A_P = -1 / B_P = -1 / A_N = B_N = -1$ , the lower network



**Fig. 2.** (a) Truth table of balanced approximate ternary adder. (b) Schemetic and operation mechanism of the approximate ternary adder

outputs  $V_{tpd} = 1$ ; When  $A_N = B_P = 1 / A_P = B_N = 1$ , the lower network outputs  $V_{tpd} = -1$ ; Second stage:  $V_{tpu}$  and  $V_{tpd}$  enter the transistor P1 and N1 modules. When [A, B] = [1, 0] / [1, 1] / [0, 0]1],  $[V_{tpu}, V_{tpd}] = [1, 1]$ ,  $V_{tpu} = VDD$ , the gate of transistor P1 is GND, P1 conducts;  $V_{tpd} = VDD$ , the gate of transistor N1 is VDD, N1 conducts. Output Y = logic 1; When [A, B] = [-1, 0] $/ [-1, -1] / [0, -1], [V_{tpu}, V_{tpd}] = [-1, -1], V_{tpu} = GND$ , the gate of transistor P1 is GND, P1 conducts;  $V_{tpd} = GND$ , the gate of transistor N1 is VDD, N1 conducts. Output Y = logic -1; (Note that in the above two output situations, the source-drain voltage drop across N1 and P1 is 0. Although both N1 and P1 are conducting, there is no static power consumption.) When  $[A, B] = [1, -1] / [0, 0] / [-1, 1], [V_{tpu}, V_{tpd}] = [-1, 1], V_{tpu} = 0V,$ the gate of transistor P1 is GND, P1 conducts; the gate of transistor N1 is VDD, N1 conducts. At this point, the output is the series voltage drop of P1 and N1, which is VDD/2 across transistor P1. Thus, the output Y = logic 0; note that at this moment, both P1 and N1 are slightly open, resulting in a small amount of static power consumption.

#### C. Performance Evaluation and Comparison

The circuit diagram of balanced accurate multiplier, and approximate multiplier have been shown in Figure 3. As depicted in Fig. 3(a), the  $2\times 2$  Accurate Balanced Ternary multiplier (ACC) comprises four  $1\times 1$  Balanced Ternary multipliers (TMul) and two ternary half-adders (THAs), in which the partial products generated by the multiplier are summed using the ternary adders. While in Fig. 3(b), the  $2\times 2$ Balanced Ternary Approximate multiplier consists only one Balanced Ternary Approximate adder (APPX).

To validate the energy efficiency of the proposed design, transient simulations were conducted using HSPICE with a CNTFET library similar to 32nm MOSFET. Each test circuit simulated 100 sets of random numbers at a working frequency of 0.5GHz, a power supply voltage of 0.9V, and a temperature of 27°C. Fig. 3 outlines the characteristics of the proposed design in terms of transistor count, average power consumption, worst-case propagation delay, and power-delay product (PDP). The proposed  $2\times2$  APPX multiplier reduces the PDP by 74.5%



| modules   | Transistor<br>Count | Average<br>Power (µW) | Worst<br>Delay (ns) | Power-Delay<br>Product (fJ) |  |  |  |  |
|-----------|---------------------|-----------------------|---------------------|-----------------------------|--|--|--|--|
| 1x1 Tmul  | 26                  | 0.015                 | 0.038               | 0.001                       |  |  |  |  |
| 1+1 APPXA | 18                  | 0.053                 | 0.036               | 0.002                       |  |  |  |  |
| 1+1 THA   | 56                  | 0.032                 | 0.087               | 0.003                       |  |  |  |  |
| 2x2 ACC   | 216                 | 0.277                 | 0.171               | 0.047                       |  |  |  |  |
| 2x2 APPX  | 122                 | 0.191                 | 0.074               | 0.012                       |  |  |  |  |

Fig. 3. Circuit diagram of the (a) balanced accurate multiplier, (b) balanced approximate multiplier.

TABLE I. COMPARISON WITH PREVIOUS WORK

| design of 2x2 APPX          | this work | ref.[13] |
|-----------------------------|-----------|----------|
| Transistor Count            | 122       | 224      |
| Average Power<br>(µW)       | 0.191     | 0.365    |
| Worst Delay<br>(ns)         | 0.074     | 0.326    |
| Power-Delay Product<br>(fJ) | 0.012     | 0.119    |
| MAPE                        | 2.25%     | 7%       |
| MAE                         | 0%        | -7%      |

Table I shows the comparison result of the proposed  $2\times 2$ Balanced APPX with previous work [13] of Unbalanced APPX. Under the same frequency of 0.5GHz conditions, the proposed balanced solution exhibits a 45.5% reduction in transistor count, 47.7% lower power consumption, 77.3% reduced delay, resulting in final PDP 90%-off compared to the unbalanced one. Additionally, our design also shows much smaller computation error. The total MAPE of our design is 2.25%, while that in ref.[13] is 7%. And the MAE in our design is 0%, while that of ref.[13] is -7%.

## **III. LARGE SCALE APPROXIMATE TERNARY MULTIPLIER**

In this section, we introduce the design method for large scale balanced approximate ternary multiplier. And a  $6\times 6$  approximate multiplier is designed as a case study. The partial products can be achieved using both  $2\times 2$  precise multipliers and  $2\times 2$  approximate multipliers to modulate the precision and power consumption. Besides, Wallace-tree strategy based partial products summation process and optimized 4-2 compressor are used to further increase the efficiency.



Fig. 4. Circuit diagram of the 4-2 compressors in (a) the original and naive design, and (b) the proposed design.

#### A. Design of the Optimized 4-2 Compressor

The 4-2 compression algorithm in ternary logic serves as an optimization technique for Multivalued Logic (MVL) circuits, aiming to reduce circuit complexity and hardware resources required for computations. This algorithm is primarily utilized in ternary logic circuits, enabling the processing of three distinct logic states (commonly represented as -1, 0, and 1) and minimizing the number of elements in the circuit by compressing combinations of these logic states.

The fundamental principle of this algorithm involves transforming four inputs into two outputs. Through this compression process, the number of gates and hardware resources needed in the circuit are reduced, thereby enhancing the efficiency and performance of the circuit. The 4-2 compression algorithm is a crucial optimization technique in MVL circuit design. In the case of balanced ternary logic, the three-input adder circuit only outputs values ranging from -3 (-1 0) to 3 (1 0), whereas a 2-trit ternary number can be represented from -4 (-1 -1) to 4 (1 1). Hence, unlike binary logic, ternary logic can handle addition for four 1-trit ternary number. Therefore, to enhance data density and reduce the number of transistors in ternary adders, the four-input adder (4-2 compressor) is the best choice [15].

Fig. 4 illustrates the optimization process of the 4-2 compressor. Without altering the circuit truth table, adjustments were made to the circuit structure to optimize the 4-2 compressor, reducing the number of transistors used and decreasing its latency. Within Fig. 4, SUM, NCONS, and NANY are all components of the ternary full adder (THA). SUM represents a circuit that sums two inputs of ternary numbers. NCONS functions as a not consensus gate, providing the inverse result when inputs are consistent and outputting 0 (VDD/2) when they are inconsistent, used for the carry circuitry of two-input ternary numbers. NANY acts as a not accept anything gate, outputting 0 when the sum is 0, -1 when the sum is greater than 0, and 1 when the sum is less than 0. CONS and ANY can be considered inverses of NCONS and NANY's outputs. By comparing the circuit structures, it's evident that the optimized 4-2 compressor, when compared to the balanced ternary full adder THA, differs by just one ANY gate in terms of delay. This improvement can significantly enhance the design of the addition tree for the multiplication partial products.



Fig. 5. Structure of 6×6 balance approximate ternary multipliers with optimized Wallace-tree and 4-2 compressors.

# B. Design of Large Scale Approximate Multiplier

Fig. 5 represents the optimization scheme of the 4-2 compressor for the  $6\times6$  approximate multiplier. In the figure, the dashed box denotes the  $2\times2$  balanced approximate multiplier, the green box indicates the two-input balanced ternary adder (THA), the blue box represents the three-input balanced ternary adder (TFA), and the red box signifies the four-input adder, i.e., the 4-2 compressor.

Step 0: Obtain 9 sets of partial products for the  $6 \times 6$  multiplication using  $2 \times 2$  approximate multipliers.

Step 1: Add the partial sums using two-input TAs, three-input TAs, and four-input TCs until only three partial sums remain.

Step 2: Add the last three partial sums to obtain the final result using a carry-chain adder composed of two-input TAs, three-input TAs, and four-input TCs.

It's evident that the involvement of the 4-2 compressor not only reduces the compression stages in the Wallace tree but also allows for the introduction of a 4-2 adder into the final addition carry chain, reducing the count of adders from 23 to 15 and leading to a decrease in the number of transistors.

## C. Performance Evaluation and Comparison

Based on the established model of the  $2\times2$  multiplier, partial products are obtained first using this multiplier, then compressed using a Wallace tree approach with half-adders and full-adders, and finally summed up using a carry look ahead adder to derive the final result. For accuracy assessment, traversal and random number tests were conducted using Python to analyze errors. Transient simulations were performed using HSPICE with the CNTFET library. Each test circuit simulated 20 sets of random numbers, operating at a frequency of 0.5GHz, with a power supply voltage of 0.9V and a temperature of 27°C. Table II presents characteristics of the



Fig. 6. Performance comparison on (a) transistor counts and (b) power-delay product at different designs.

proposed design, including MAPE, transistor count, average power consumption, worst-case propagation delay, and power-delay product (PDP). Design 1 represents the accurate calculation method, while Design 2 involves the partial products entirely generated using the  $2\times2$  APPX method. Compared to the  $6\times6$  balanced ternary accurate multiplier, the PDP of the  $6\times6$  balanced ternary approximate multiplier reduced by 50.6%. Compared to the  $6\times6$  unbalanced ternary approximate multiplier, the balanced ternary approximate multiplier showed reductions of 57.6%, 46.8%, and 44.0% when compared to the unbalanced exact, hybrid approximate, and approximate methods, respectively. Compared to the unbalanced full approximate scheme, the balanced full approximate scheme exhibited a random number MAPE error of only 3.54%.

|              | Design                                       | MAPE<br>(%) | Transistor<br>Counts | Average<br>Power(µW) | Worst<br>Delay(ns) | Power-Delay<br>Product(fJ) |
|--------------|----------------------------------------------|-------------|----------------------|----------------------|--------------------|----------------------------|
|              | Accurate                                     | 0.000       | 6532                 | 8.117                | 2.518              | 20.439                     |
| ref. [13]    | approximate                                  | 9.176       | 4876                 | 6.605                | 2.346              | 15.495                     |
|              | approximate<br>with bias                     | 1.649       | 5198                 | 6.948                | 2.346              | 16.300                     |
|              | Accurate                                     | 0.000       | 3946                 | 14.070               | 1.250              | 17.582                     |
| this<br>work | approximate                                  | 3.540       | 3284                 | 8.222                | 1.055              | 8.673                      |
|              | <b>approximate</b><br>with<br>4-2 compressor | 3.540       | 2996                 | 7.524                | 1.084              | 8.155                      |

TABLE II. CHARACTERISTICS OF VARIOUS 6×6 TERNARY MULTIPLIERS

## **IV. CONCLUSIONS**

We propose high performance balanced approximate multiplier with various optimization strategies. Firstly, balanced ternary encoding scheme is applied for the ternary computation. Then ternary approximate adder is designed for the integration of small bit-width multiplier. Compared with the accurate multiplier (2-trit), the proposed multiplier shows much simpler circuit complexity with 43.5%-off in circuit area and 74.4%-off in power-delay product with only 2.25% computation error. Furthermore, we propose optimized structure of 4-2 compressor together with Wallace-tree based large scale multiplier. The whole design has been validated through HSPICE simulations using carbon nanotube field-effect transistors as the basic device cells. When compared with the previous works, the proposed design shows 50.0%-off in power-delay product and 42.3%-off in circuit area, showing great potential for the practical applications.

## REFERENCES

- J. Tang, T. Ma, and Q. J. P. C. S. Luo, "Trends prediction of big data: a case study based on fusion data," vol. 174, pp. 181-190, 2020.
- [2] J. Wang, C. Xu, J. Zhang, and R. J. J. o. M. S. Zhong, "Big data analytics for intelligent manufacturing systems: A review," 2021.
- [3] G. Yeap, "Smart mobile SoCs driving the semiconductor industry: Technology trend, challenges and opportunities," 2013 IEEE International Electron Devices Meeting, Washington, DC, USA, 2013, pp. 1.3.1-1.3.8, doi: 10.1109/IEDM.2013.6724540.
- [4] V. Gaudet, "A survey and tutorial on contemporary aspects of multiple-valued logic and its application to microelectronic circuits," in IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 6, no. 1, pp. 5-12, March 2016, doi: 10.1109/JETCAS.2016.2528041.
- [5] S. Lin, Y.-B. Kim and F. Lombardi, "CNTFET-Based Design of Ternary Logic Gates and Arithmetic Circuits," in IEEE Transactions on Nanotechnology, vol. 10, no. 2, pp. 217-225, March 2011, doi: 10.1109/TNANO.2009.2036845.
- [6] K. Roy, S. Mukhopadhyay, and H. J. P. o. t. I. Mahmoodi-Meimand,"Leakage current mechanisms and leakage reduction techniques in deepsubmicrometer CMOS circuits," vol. 91, no. 2, pp. 305-327, 2003.
- [7] F. Zahoor, T. Z. A. Zulkifli, F. A. Khanday and S. A. Zainol Murad, "Carbon Nanotube and Resistive Random Access Memory Based Unbalanced Ternary Logic Gates and Basic Arithmetic Circuits," in IEEE Access, vol. 8, pp. 104701-104717, 2020, doi: 10.1109/ACCESS.2020.2997809.
- [8] B. Srinivasu and K. Sridharan, "A Synthesis Methodology for Ternary Logic Circuits in Emerging Device Technologies," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 64, no. 8, pp. 2146-2159, Aug. 2017, doi: 10.1109/TCSI.2017.2686446.
- [9] C. Vudadha, A. Surya, S. Agrawal and M. B. Srinivas, "Synthesis of Ternary Logic Circuits Using 2:1 Multiplexers," in IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 65, no. 12, pp. 4313-4325, Dec. 2018, doi: 10.1109/TCSI.2018.2838258.
- [10] R. A. Jaber, A. Kassem, A. M. El-Hajj, L. A. El-Nimri, and A. M. Haidar, "High-performance and energy-efficient CNFETbased designs for ternary logic circuits," IEEE Access, vol. 7, pp. 93871–93886, 2019.
- [11] N. H. Bastani, M. H. Moaiyeri, and K. Navi, "An energy- and areaefficient approximate ternary adder based on CNTFET switching logic," Circuits Syst. Signal Process., vol. 37, pp. 1863–1883, May 2018.
- [12] S. Tabrizchi, A. Panahi, F. Sharifi, H. Mahmoodi, and A.-H. A. Badawy, "Energy-efficient ternary multipliers using CNT transistors," Electronics, vol. 9, no. 4, p. 643, 2020.

- [13] S. Kim, Y. Kang, S. Baek, Y. Choi and S. Kang, "Low-Power Ternary Multiplication Using Approximate Computing," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 68, no. 8, pp. 2947-2951, Aug. 2021, doi: 10.1109/TCSII.2021.3068971.
- [14] G. Zhao et al., "Efficient Ternary Logic Circuits Optimized by Ternary Arithmetic Algorithms," in IEEE Transactions on Emerging Topics in Computing, doi: 10.1109/TETC.2023.3321050.
- [15] J. Yoon, S. Baek, S. Kim and S. Kang, "Optimizing Ternary Multiplier Design With Fast Ternary Adder," in IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 70, no. 2, pp. 766-770, Feb. 2023, doi: 10.1109/TCSII.2022.3210282.