Design and Assessment of Hybrid MTJ/CMOS Circuits for In-Memory-Computation

Prashanth Barla *, Hemalatha Shivarama *, Ganesan Deepa and Ujjwal Ujjwal

Department of Computer Science and Engineering, Manipal Institute of Technology, Manipal Academy of Higher Education, Manipal 576104, Karnataka, India
* Correspondence: prashanth.b@manipal.edu (P.B.); hema.shama@manipal.edu (H.S.)

Abstract: Hybrid magnetic tunnel junction/complementary metal oxide semiconductor (MTJ/CMOS) circuits based on in-memory-computation (IMC) architecture is considered as the next-generation candidate for the digital integrated circuits. However, the energy consumption during the MTJ write process is a matter of concern in these hybrid circuits. In this regard, we have developed a novel write circuit for the contemporary three-terminal perpendicular-MTJs that works on the voltage-gated spin orbit torque (VG+SOT) switching mechanism to store the information in hybrid circuits for IMC architecture. Investigation of the novel write circuit reveals a remarkable reduction in the total energy consumption (and energy delay product) of 92.59% (95.81) and 92.28% (42.03%) than the conventional spin transfer torque (STT) and spin-Hall effect assisted STT (SHE+STT) write circuits, respectively. Further, we have developed all the hybrid logic gates followed by nonvolatile full adders (NV-FAs) using VG+SOT, STT, and SHE+STT MTJs. Simulation results show that with the VG+SOT NOR-OR, NAND-AND, XNOR-XOR, and NV-FA circuits, the reduction in the total power dissipation is 5.35% (4.27%), 5.62% (3.2%), 3.51% (2.02%), and 4.46% (2.93%) compared to STT (SHE+STT) MTJs respectively.

Keywords: magnetic tunnel junction; spin-hall effect; spin transfer torque; non-volatile; voltage-gated spin orbit torque; in-memory-computation

1. Introduction

In recent years, deep learning and artificial intelligence applications such as image classification, object detection, speech recognition, natural language processing, etc., have been remarkably successful due to their efficient software algorithms [1–3]. However, the hardware supporting these applications still works based on the conventional von-Neumann architecture, where the memory and computational units are separated, and bus/interconnects facilitate communication between the two. As a result, power wastage due to the frequent movement of signals between memory and computational units is remarkably high [4,5]. To add, the delay in these signals is also considerably high for large systems. Meanwhile, at the device level, miniaturization of CMOS poses challenges such as increased power dissipation because of the secondary effects [6]. Though there is constant advancement at the device level, it is clear that one cannot solely depend on a single CMOS technology to fulfill all the present-day requirements [7]. In the present scenario, a high demand for a high-speed and power-efficient hardware design exists. To address this, in-memory-computation (IMC) is well-positioned to date [5,8–10]. The idea in IMC is to merge the memory and computation to avoid frequent data movement by providing low-power, high-speed computation. Some literature suggests SRAM as a memory for IMC [11–14]. However, the shortcomings, such as inherent leakage current, the necessity of large numbered transistors to store a single bit, the occupancy of large silicon area, etc., outweigh its other benefits [1]. This triggered a huge interest in the utilization of spintronic devices for IMC [15,16]. The spintronic devices are non-volatile memory (NVM) devices,
where along with charge, spin of electrons also carries information. Among all spintronic devices, MRAM is the most promising due to its advantages such as higher endurance, fast read/write speeds, long data retention time, scalability, 3D fabrication, and ease of integration with the existing CMOS technology [1,17–19]. To store information bit, MRAM relies on the relative spin orientation of electrons that tunnel across the barrier layer (BL) sandwiched between two ferromagnetic materials, of which one is the free layer (FL) and the other is the pinned layer (PL). Thus forming a magnetic tunnel junction (MTJ). The MTJ is in a low resistance ($R_P$) parallel (P) state when both PL and FL are in the same direction; on the contrary, MTJ is in a high resistance ($R_{AP}$) anti-parallel (AP) state when PL and FL points in the opposite direction. The material used for the PL and FL is CoFe-based, and for BL is MgO. This is due to the achievement of high tunneling magnetoresistance (TMR) with CoFeB/MgO/CoFeB MTJs [20]. Altering the state of MTJ between ($R_{AP}$) and ($R_P$) is called MTJ switching/writing. There are several p-MTJ switching mechanisms such as spin transfer torque (STT) [21,22], spin-Hall effect assisted spin transfer torque (SHE+STT) [23,24], voltage assisted MTJ switching [25–27] etc. Among all, the recently presented voltage-gated spin orbit torque (VG+SOT) switching mechanism [28–30] has attracted the attention of many researchers due to its numerous advantages and is suitable for many applications [31,32]. In VG+SOT, MTJ switching is achieved with the combination of voltage and SHE effect. Encouraged by the developments in the VG+SOT MTJs, we have developed a novel write circuit for the VG+SOT for the IMC architecture. This write circuit is compared with the conventional STT and SHE+STT write circuits in terms of write energy, worst-case write delay, and energy-delay product (EDP). Then we developed all the hybrid logic gates, such as NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP). The rest of the paper is organized as follows: The background of IMC architecture, conventional STT and SHE+STT write circuit and their brief working is presented in the Section 2. In Section 3, we have explained the switching mechanism of VG+SOT MTJ and presented the novel VG+SOT write circuit, followed by its working. Then, in the Section 4 we have developed all the hybrid logic gates, i.e., NOR-OR, NAND-AND, and XNOR-XOR followed by three non-volatile full adder (NV-FA) circuits using different MTJs such as STT, SHE+STT, and VG+SOT based on IMC architecture. The performance of all these circuits has been studied in terms of power dissipation, worst-case read delay, and power delay product (PDP).
OUT−. The OUT+ and OUT− values depend upon the $R_L$ and $R_R$, which in-turn depend on $I_L$ and $I_R$. That is, $R_L$ and $R_R$ have inverse relations with $I_L$ and $I_R$ currents to produce OUT+ and OUT−. For instance, $R_L > R_R$ results in $I_L < I_R$ and produce OUT+ = 0 and OUT− = 1. Otherwise when $R_L < R_R$ results in $I_L > I_R$ and produce OUT+ = 1 and OUT− = 0.

Based on magnetic anisotropy, there are two types of MTJs. i.e., in-plane magnetic anisotropy MTJ (i-MTJ) and perpendicular-plane magnetic anisotropy MTJ (p-MTJ). Depending upon the type of MTJ (i-MTJ/p-MTJ), several switching techniques have been developed. The p-MTJ is superior due to smaller size, larger thermal stability, smaller switching current, ease of scaling compared to the i-MTJ [33]. Hence, we have focused on the p-MTJ devices in our work.

2.2. Spin Transfer Torque Write Circuit

The conventional STT p-MTJ switching circuit is presented in Figure 2 [6]. MTJ pair MTJ0-MTJ1 are used to store single-bit information. Bi-directional current ($I_{SW}$) is used for changing the status of MTJs between AP and P. When EN = 0, the $I_{SW}$ flow is disabled. On the contrary, $I_{SW}$ flows in the writing core only when EN = 1. Assume bit 1 needs to be written into the MTJ pair, which was earlier stored with bit 0, i.e., its configuration must be changed from AP-P to P-AP. Switching is initiated upon asserting EN = 1 and IN = 1. Now the transistors MN1, and MP0 are OFF, whereas MN0 and MP1 are ON. This forces the $I_{SW}$ via the path Vdda-MP1-MTJ1-MTJ0-MN0-GND. Hence, the switching of the MTJ pair, i.e., P-AP, is achieved. Similarly, switching the MTJ pair from P-AP to AP-P to store bit 0 can be achieved by making EN = 1 and IN = 0.
Simple structure, ease of working, and potential to achieve higher density circuits make the STT-MTJs one of the most suitable commercial products [21,34]. However, the STT-MTJ suffers from several disadvantages. Frequent flow of $I_{SW}$ to switch the status of MTJ can break down the BL over a period of time. Apart from that, as the read and write paths are the same in the STT-MTJ, there is a chance of erroneous writing into the MTJ while the reading current passes through it, causing a read disturbance error. Most significantly, the STT-MTJ suffers the incubation delay, which is stochastic in nature. As a result, the switching delay of STT-MTJ is large, making it less favorable for high-speed applications.

2.3. Spin–Hall Effect Assisted Spin Transfer Torque Write Circuit

Some of the disadvantages of the STT writing mechanism are overcome with the three terminal SHE+STT p-MTJ [23,24]. For example, a separate read and write path of SHE+STT MTJ overcomes the read disturbance error. Meanwhile, the SHE effect considerably reduces the STT current, thereby eliminating the chances of BL breakdown. Furthermore, the incubation delay observed during the writing of STT-MTJ disappears in the SHE+STT p-MTJ, making it suitable for high-speed applications. Figure 3 shows the conventional SHE+STT p-MTJ switching circuit [35]. Compared with the STT p-MTJ switching circuit, an ENSHE input signal with EN and IN signals is present here. ENSHE controls the flow of SHE current. If we assume writing the MTJ pair with the information bit 1, then signals IN, EN, and ENSHE are asserted to logic 1. So the transistors MP0 and MN2 are OFF, whereas the rest are ON. This simultaneously initiates the flow of SHE current ($I_{SHE}$) and STT current ($I_{STT}$). Path for $I_{SHE}$ is Vdda-MP1-MTJ0-MN1-GND and Vdda-MP3-MTJ1-MN3-GND. Whereas the $I_{STT}$ path is, Vdda-MP1-MTJ0-MN0-GND and Vdda-MP2-MTJ1-MN3-GND. As a result, the MTJ pair enters into the meta-stable state. After a small period, the ENSHE signal is made as logic 0. Hence, the transistors MN1 and MP3 are turned OFF to cease the $I_{SHE}$ flow. Whereas the $I_{STT}$ continues to flow and completes the switching of p-MTJ into P-AP, and information bit 1 is written. In a similar way, writing the information 0 into the MTJ pair can be done by making IN = 0.

![Figure 3](image-url)

Figure 3. Schematic of SHE+STT p-MTJ write circuit consisting of (a) control circuit and (b) writing core.

Though SHE+STT p-MTJ is comparatively better than STT-MTJ, it suffers from the joule heating effect and ohmic losses due to the flow of current. Meanwhile, to handle these currents, one needs transistors with large size causing area overhead [36].

3. Voltage Gated Spin Orbit Torque: Mechanism and Novel Write Circuit

One significant method to overcome the ohmic loss and joule heating effect is incorporating the voltage to switch the p-MTJ status. Recent literature reveals that applying a voltage across p-MTJ can alter its magnetic anisotropy [37–40]. This effect is called voltage-controlled magnetic anisotropy (VCMA). The VCMA effect can switch the MTJ status. However, achieving deterministic MTJ switching with the VCMA effect requires either a precise voltage pulse or external assistance from a magnetic field or a small STT/SHE current. Since generating a precise voltage pulse or producing an external magnetic field in practical conditions is challenging [41,42], some circuits were developed with two terminal p-MTJs with VCMA+STT switching regime [42–44]. However, applying STT current poses the disadvantages of dielectric breakdown and read disturbance errors. Given this,
a combination of VCMA in conjunction with SHE is employed to switch the MTJ status deterministically. Hence, a new type of p-MTJ switching regime called VG+SOT has been developed that combines the advantages of VCMA and SHE \[45–48\]. The VG+SOT p-MTJ is formed by mounting the combination PL/BL/FL on top of antiferromagnetic material (AFM) such as IrMn3 \[49,50\] as shown in the Figure 4a. The AFM material in VG+SOT p-MTJ generates SOT and provides an in-plane exchange bias (\(\vec{H}_{EX}\)). However, the \(\vec{H}_{EX}\) developed here is weak. Hence, the SOT alone cannot switch the MTJ deterministically.

Nevertheless, with assistance from the VCMA effect, the SOT generated can accurately switch the FL’s magnetic orientation in VG+SOT p-MTJ \[30,51\]. VCMA effect is generated by applying a small voltage (\(V_b\)), greater than or equal to the critical voltage (\(V_c\)) across the p-MTJ that lowers the energy barrier (\(E_b\)) (Figure 4b). This initiates a free precession of FL’s magnetic orientation. At the same time, a SHE current is forced through AFM to fix the FL’s magnetic orientation in a specific direction, i.e., either to P and AP. It is noteworthy that applying a negative \(V_b\) increases \(E_b\). The relationship between \(E_b\) is modeled as a function of \(V_b\) and given in the Equation (1) as,

\[
E_b(V_b) \approx K_i(V_b) - 2\pi M_s^2 (N_z - N_{x,y}) t_f \cdot A.
\]

where \(K_i(V_b)\) is the voltage-dependent interfacial PMA (perpendicular magnetic anisotropy), \(M_s\) is the saturation magnetization, \(t_f\) is the thickness of the FL, \(V_b\) is the bias voltage, \(A\) is the sectional area of the MTJ, \(N_z\) and \(N_{x,y}\) are the demagnetization factors of the MTJ in the perpendicular and in-plane directions. The \(V_c\), which is minimum voltage required to completely eliminate \(E_b\) is defined by the Equation (2),

\[
V_c = \Delta(0) k_B T t_{ox} / \xi A.
\]

where \(\Delta(0)\) is thermal stability under zero voltage, \(k_B\) is the Boltzmann constant, \(T\) is temperature, \(t_{ox}\) is the thickness of the oxide layer, and \(\xi\) is the VCMA coefficient.

\[a\] Application of positive \(V_b\) causes the accumulation charges across the barrier layer resulting in \(b\) lowering of \(E_b\), thereby the FL’s magnetic orientation is in meta-stable state. This can assist in switching of MTJ status.

Figure 5 presents the VG+SOT p-MTJ switching mechanism. Consider switching VG+SOT p-MTJ from P to AP state. Here, the unpolarized in-plane electrons (\(\vec{J}_e\)) are forced through the AFM in the \(-Y\) direction. Due to a strong spin orbit coupling (SOC), SHE is observed in the AFM, which results in the deflection of opposite spin electrons (which are in \(\pm X\) direction) in transverse, \(\pm Z\) direction creating a spin current \(\vec{J}_s\) in Z-direction. Hence, there is an accumulation of electrons (oriented in \(-X\) direction) at the interface of AFM/FL. This exerts SOT to tilt the FL’s magnetic orientation from the Z-direction toward the XY-plane. The presence of in-plane \(\vec{H}_{EX}\) causes the FL’s magnetic orientation to precessionally rotate around the field and move toward the direction of \(\vec{H}_{EX}\) (here it...
is Y-direction) by damping action. Meanwhile, the application of VCMA would lower the 
$E_b$ to switch the FL's magnetic orientation towards the $-Z$-direction. Hence, the MTJ
switches from AP to P state. Similarly, the switching of the MTJ from P to AP is achieved
by reversing the direction of $\vec{J}_e$ (Figure 5b).

Figure 5. Schematic of switching action in VG+SOT. $V_b$ is applied to lower the $E_b$. Meanwhile (a) $\vec{J}_e$ is forced through through AFM in $-Y$ direction. Accumulation of spin polarized electrons (in $-x$ direction) exerts SOT and alter its magnetic orientation from Z direction towards XY-plane. Under the influence of $H_{EX}$ the magnetic orientation in the XY-plane precess and switches to $-Z$-direction, thereby completing switching of MTJ from P to AP. To switch the MTJ from (b) AP to P, direction of the $\vec{J}_e$ is reversed.

The proposed VG+SOT p-MTJ write circuit is presented in Figure 6. The write circuit
is disabled when $EN = 0$, as all the transistors in the writing core are disabled. Upon
the application of $EN = 1$, depending upon the IN signal, the MTJ pair is written with either bit
0 or 1. Assume, previously, bit 0 is stored in the MTJ pair, i.e., it is in AP-P state, and we
want to write bit 1, i.e., the MTJ pair needs to be switched to P-AP. This can be initiated by
asserting $EN = 1$, $IN = 1$, and $VC = 1$. Hence the intermediate voltages $V_1$, $V_2$, $V_3$ and $V_4$
are at 0101 which turns ON transistors $MP1$, $MN0$, $MP2$, $MN3$, $MP4$ and $MP5$, whereas
$MP0$, $MN1$, $MP3$ and $MN2$ are OFF. As a result, a VCMA voltage ($VC$) is applied to the
MTJ pair; simultaneously, a SOT current ($I_{SOT}$) flowing through the AFM applies a SOT on
the FL's magnetic orientation. The direction of $I_{SOT}$ is $Vdd-MP1-MTJ0(T3-T2)-MN0-GND$
and $Vdd-MP2-MTJ1(T2-T3)-MN3-GND$. After 200 ps, the $VC$ is disabled whereas the $I_{SOT}$
flows, completing the MTJ pair switching from AP-P to P-AP state. Hence, the writing of
bit 1 into the MTJ pair is achieved. Similarly, writing bit 0 into the MTJ pair is achieved by
making $IN = 0$.

4. Logic Gates Based on IMC

We have developed various logic gates based on the IMC architecture, and are pre-
sented in Figure 7. We have chosen the contemporary SA [52], which produces a quicker
output response and dissipates lower power than its conventional counterpart [53]. For the
purpose of explanation, we chose the operation of the NOR-OR logic gate, and the inputs
applied are $A = 0$ and $B = 1$. During the pre-charge phase, $CP = 0$, SA transistor N3 is OFF,
and P3 is ON. This facilitates sharing the NOR and OR output to reach logic 1 ($Vdd-Vth$).
Meanwhile, inputs A and B are applied to the PDN. Here, the writing of the information (input B) occurs, i.e., MTJ0-MTJ1 are at P-AP state. The circuit enters into the evaluation phase by making CP = 1. Now the SA transistors, N3, is ON whereas P3 is OFF. In the PDN, N5 is OFF; meanwhile, both N4 and N6 are ON. This creates two current paths $I_L$ and $I_R$, i.e., $I_L$ is from NOR-N1-N4-MTJ0-N3-GND. $I_R$ is from OR-N2-N6-MTJ1-N3-GND. As the MTJ0-MTJ1 pair is in P-AP status, the branch resistance $R_L < R_R$, resulting in a quicker discharge of $I_L$ compared to $I_R$. So, the NOR node is discharged to the GND, making it logic 0. On the contrary, the OR node is pulled up to logic 1. Similarly, the working of the circuit for other inputs is realized. In Figure 7a, by replacing the MOS tree with Figure 7d,e, one can realize NAND-AND and XNOR-XOR logic gates respectively. Figure 8 shows the timing diagram of the logic gates for a clock frequency of 100 MHz.

Figure 7. Circuit for NOR−OR gate based on IMC showing (a) SA, (b) PDN for NOR-OR logic gates and (c) MTJ switching block. Use MOS tree from (d,e) to obtain NAND−AND and XNOR−XOR respectively. Here two terminal p−MTJs are shown for STT logic gates. Replace them by three terminal MTJs to get SHE+STT/VG+SOT logic gates.

Figure 8. Simulated waveform for various hybrid logic gates showing inputs (a) CP, (b) A, (c) B and corresponding outputs (d) AND, (e) NAND, (f) OR, (g) NOR, (h) XOR, (i) XNOR.
5. IMC Based Full Adder

Having developed all the logic gates, we developed NV-FA based on IMC architecture with different types of MTJs shown in Figure 9. The working of this FA is in line with the IMC based logic gates, where there are two operating phases: pre-charge and evaluation. In the pre-charge (CP = 0), all the outputs, sum (SUM\textsuperscript{+}) and its complement (SUM\textsuperscript{-}), and carry (CARRY\textsuperscript{+}), and its complement (CARRY\textsuperscript{-}) are at logic 1 (Vdd-Vth). Meanwhile, input A, B, and C\textsubscript{in} are applied to the PDN. During the evaluation (CP = 1), these inputs are processed to produce outputs (SUM\textsuperscript{+}/SUM\textsuperscript{-} and CARRY\textsuperscript{+}/CARRY\textsuperscript{-}). Imagine the inputs applied in the pre-charge phase are A = 1, B = 0, and C\textsubscript{in} = 0. During the evaluation phase, both in SUM and CARRY circuits, there are two routes, Path-R1 and Path-R2, from their respective nodes to the GND. This is because, in the PDN of the SUM circuit, transistors M1, M3, M6, and M8 are ON, whereas the other transistors are OFF. The Path-R1 is via SUM\textsuperscript{-}-N1-M1-M3-MTJ1-N3-GND. Path-R2 is via SUM\textsuperscript{+}N2-M6-M8-MTJ0-N3-GND. As C\textsubscript{in} = 0, the MTJ pair is in AP-P state. So, the total resistance in Path-R1 < Path-R2. Hence, the SUM\textsuperscript{-} node is pulled down to GND, and the SUM\textsuperscript{+} node is pulled up to Vdd. Therefore, SUM\textsuperscript{+} = 1 and SUM\textsuperscript{-} = 0. Similarly, in the carry circuit, the total resistance in Path-R1 < Path-R2. Path-R1 is via CARRY\textsuperscript{-}N1-M2-M4-MTJ0-N3-GND, and Path-R2 is via CARRY\textsuperscript{+}-N2-M6-M8-MTJ1-N3-GND. So the CARRY\textsuperscript{+} = 0 and CARRY\textsuperscript{-} = 1. Likewise, the working of NV-FA for other input combinations can be realized. Figure 10 shows the timing diagram of the IMC based NV-FA for a clock frequency of 100 MHz.

![Figure 9. NV-FA based on IMC consisting of (a) SA (b) PDN for SUM and CARRY circuit with (c) MTJ switching block. Here two terminal STT p–MTJ are shown for STT NV–FA. By replacing them with three terminal SHE+STT/VG+STT p–MTJs we can obtain SHE+STT/VG+SOT NVFA circuit.](image-url)
Figure 10. Various inputs; (a) CP, (b) A, (c) B (d) Cin and corresponding outputs; (e) SUM, (f) CARRY waveform for NV-FA based on IMC.

6. Simulation Results and Discussion

To assess the performance of the write, hybrid logic gates, and NV-FA circuits, we performed the simulation using the Cadence Virtuoso tool with 45nm CMOS generic process design kit. The model for the VG+SOT p-MTJ is adopted from Ref. [30], and the parameters used during the simulation work are shown in Table 1.

Table 1. VG+SOT p-MTJ parameters set during the simulation. Rest of the parameters are retained as mentioned in ref. [30].

<table>
<thead>
<tr>
<th>Parameter</th>
<th>Description</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>TMR(0)</td>
<td>TMR ratio with zero bias</td>
<td>200%</td>
</tr>
<tr>
<td>Shape</td>
<td>MTJ Surface shape</td>
<td>circle</td>
</tr>
<tr>
<td>a</td>
<td>MTJ Surface length</td>
<td>32 nm</td>
</tr>
<tr>
<td>b</td>
<td>MTJ Surface width</td>
<td>32 nm</td>
</tr>
<tr>
<td>r</td>
<td>MTJ Surface radius</td>
<td>16 nm</td>
</tr>
<tr>
<td>w</td>
<td>AFM width</td>
<td>50 nm</td>
</tr>
<tr>
<td>d</td>
<td>AFM thickness</td>
<td>3 nm</td>
</tr>
<tr>
<td>l</td>
<td>AFM length</td>
<td>60 nm</td>
</tr>
<tr>
<td>t_{sl}</td>
<td>Free layer thickness</td>
<td>1.1 nm</td>
</tr>
<tr>
<td>t_{ox}</td>
<td>MgO barrier thickness</td>
<td>1.4 nm</td>
</tr>
<tr>
<td>RA</td>
<td>Resistance-area product</td>
<td>650 Ω·µm²</td>
</tr>
</tbody>
</table>

6.1. Write Circuits

Table 2 shows the performance comparison of the various write circuits. We can observe that the control circuit’s energy dissipation is comparatively lower than the writing core. This is because the control circuitry consists of only logic gates, whereas the writing core contains MTJ pair. The energy required to switch the MTJ pair is always higher than the rest of the circuitry. Among all the write circuitry, the least total energy consumption is observed with the VG+SOT p-MTJ circuit, followed by SHE+STT and STT. In writing core, VG+SOT write circuit consumes 92.0% and 92.68% less energy than SHE+STT and STT, respectively. The control circuit of the VG+SOT write circuit consumes 70.93% and 34.6% lower energy than SHE+STT and STT, respectively. The total energy dissipation of VG+SOT write circuit is 92.28%, and 92.59% lower than SHE+STT and STT, respectively. In VG+SOT switching, thanks to the VCMA effect, the p-MTJ’s energy barrier gets lowered, facilitating its switching with an application of I_{SOT} current through the AFM. However, the worst-case write delay of VG+SOT is 86.59% more than SHE+STT write circuitry. Meanwhile, the delay of the VG+SOT write circuit is less than STT by 43.52%. The higher delay in the VG+SOT
switching mechanism is because, when \( I_{\text{SOT}} \) current is applied, the SOT is responsible for tilting FL’s magnetic orientation from its original position towards the XY-plane. Now, the p-MTJ is in a meta-stable state. Switching is assisted by the presence of \( H_{EX} \), where the FL’s magnetic orientation initially precesses and settles down. The time taken to precess and then settling of FL’s magnetic orientation causes the delay. Meanwhile, with SHE+STT switching, once the magnetic orientation is in the XY plane, its precession is almost zero, and STT torque forces switching of p-MTJ quickly. Hence, the delay is the least. However, with STT switching, due to the stochastic nature of STT p-MTJ, there is an initial delay in misaligning the FL’s magnetic orientation from its initial position. This results in higher delay. The EDP, product of total energy dissipation and worst case write delay of VG+SOT write circuit, is lesser than the rest. The EDP of the VG+SOT write circuit is 42.03% and 95.81% lower than SHE+STT and STT write circuits, respectively. Regarding the number of transistors, the STT write circuitry uses only 22 MOS, whereas the SHE+STT write circuitry uses almost double, i.e., 46 MOS transistors. However, the VG+SOT write circuitry uses only 6 additional MOS (i.e., 28) compared to STT. Figure 11 shows a bar chart plot of the total energy consumption, worst-case write delay, and the EDP among the various write circuits. In Figure 12, we have presented the simulated waveform for all the write circuits together. To begin with, bit 1 is stored in the MTJ pair. At T1, writing of the bit 0 is initiated, followed by writing of the bit 1 at T2. We have also shown redundant write conditions at time T3. Writing the information bit 0/1 is completed after a brief delay as shown in the Table 2.

Table 2. Performance comparison between STT, SHE+STT, and VG+SOT writing.

<table>
<thead>
<tr>
<th>MTJ Switching Technique</th>
<th>STT</th>
<th>SHE+STT</th>
<th>VG+SOT</th>
</tr>
</thead>
<tbody>
<tr>
<td>Writing core Energy/bit (fJ)</td>
<td>861.9</td>
<td>819.4</td>
<td>63.07</td>
</tr>
<tr>
<td>Control circuit Energy/bit (fJ)</td>
<td>1.28</td>
<td>2.88</td>
<td>0.837</td>
</tr>
<tr>
<td>Total Energy/bit (fJ)</td>
<td>863.18</td>
<td>822.28</td>
<td>63.907</td>
</tr>
<tr>
<td>Worst case delay (ns)</td>
<td>5.1</td>
<td>0.386</td>
<td>2.88</td>
</tr>
<tr>
<td>Energy delay product (( \times 10^{-24} ))</td>
<td>4402.21</td>
<td>317.4</td>
<td>184.05</td>
</tr>
<tr>
<td>No. of MOS</td>
<td>22</td>
<td>46</td>
<td>28</td>
</tr>
</tbody>
</table>

\( a = \) Writing core energy/bit + Control circuit Energy/bit. \( b = \) Total energy/bit * Worst case delay.

Figure 11. Comparison of (a) total energy dissipation, (b) delay, and (c) EDP among different write circuits.

Overall, the VG+SOT write circuit is superior to its other counterpart except for the write delay. VG+SOT p-MTJ write circuit has an increased write delay compared to the SHE+STT. Recent research has actively identified ferrimagnetic alloy materials, such as cobalt-gadolinium (CoGd), as efficient replacements for the AFM [54], primarily due to their ability to reduce delay significantly. Ferrimagnetic materials can reduce the switching delay to subnanoseconds. However, more effort is required to integrate ferrimagnets with conventional MTJ to develop practical applications [55]. Meanwhile, efforts have been made to develop an efficient all-antiferromagnetic tunnel junction (AATJ) [56]. The AATJ structure is made of Mn3Pt/MgO/Mn3Pt/MnPt combination. With AATJ, although a switching
speed in the range of picoseconds can be achieved, the maximum TMR achieved with it is approximately 100% at room temperature. Hence, its practical application is limited.

![Simulated waveform](image)

**Figure 12.** Simulated waveform for (b–d) STT, (e–l) SHE+STT and (m–p) VG+SOT write circuits. The input data (a) IN is common whereas the other enable signals and switching of the MTJ pairs for the different write circuits are shown separately.

We have investigated the effect of TMR variation on the VG+SOT write circuit parameters, as shown in Figure 13. Here, we have increased the TMR from 200 to 350 in steps of 50. Notice that as the TMR value increases, the total energy/bit, worst case delay, and EDP decreases slightly. Hence, we can say achieving a higher TMR is vital for the design of efficient hybrid circuits.

![Performance variation](image)

**Figure 13.** VG+SOT write circuit’s performance variation with the respect to variation in TMR.
It is crucial to study the reliability of the VG+SOT write circuit for the process and mismatch variations that could happen during its fabrication. Hence, we have performed the Monte-Carlo simulations on VG+SOT write circuit by incorporating 3% variations in TMR, $t_{sl}$, and $t_{ox}$ along with CMOS variations, and the results energy/bit are tabulated in Table 3. Here, we can notice that the total energy/bit value is always less than $\approx 68 fJ$.

### Table 3. Energy consumption of VG+SOT write circuit during Monte-Carlo simulation of 200 runs.

<table>
<thead>
<tr>
<th>Design Type</th>
<th>Min</th>
<th>Max</th>
<th>Mean</th>
<th>Std Divination</th>
</tr>
</thead>
<tbody>
<tr>
<td>Writing core energy/bit (fJ)</td>
<td>56.52</td>
<td>67.08</td>
<td>62.33</td>
<td>2.782</td>
</tr>
<tr>
<td>Control circuit energy/bit (fJ)</td>
<td>0.749</td>
<td>0.883</td>
<td>0.833</td>
<td>0.031</td>
</tr>
<tr>
<td>Total energy/bit (fJ)</td>
<td>57.269</td>
<td>67.963</td>
<td>63.163</td>
<td>2.813</td>
</tr>
</tbody>
</table>

6.2. Performance of Logic Gates

Table 4 shows the performance comparison of logic gates with different p-MTJ circuits. The static power dissipation of all the logic gates is considered to be zero. This is because as a nonvolatile MTJ pair is utilized to store input B, we can turn off the supply in the standby mode without losing the stored bit. During active mode, the bit stored is readily available for computation without the write/restoration process. However, there is a steady-state power dissipation observed with gates, as shown in Table 4. In the NOR-OR gates, the dynamic power dissipation of the VG+SOT is least followed by STT and SHE+STT based gates. VG+SOT consumes 65.22% and 67.02% lower dynamic power than STT and SHE+STT NOR-OR gates respectively. The total power dissipation of VG+SOT is least followed by SHE+STT and STT. VG+SOT gate dissipates 4.27%, and 5.35% lower total power than SHE+STT and STT gates, respectively. The worst-case read delay of SHE+STT based NOR-OR gates least followed by STT, and VG+SOT. SHE+STT based NOR-OR gate’s worst-case read delay is 12.95%, and 90.16% lower than STT, and VG+SOT, respectively. The PDP is calculated by multiplying total power with worst case read delay. The PDP of the SHE+STT based NOR-OR gate is least followed by STT and VG+SOT. The PDP of the SHE+STT gate is 13.98% and 89.73% lower than the STT and VG+SOT gates, respectively.

### Table 4. Comparison between various logic gates operating at 100 MHz.

<table>
<thead>
<tr>
<th>Gate</th>
<th>NOR-OR</th>
<th>NAND-AND</th>
<th>XNOR-XOR</th>
</tr>
</thead>
<tbody>
<tr>
<td>p-MTJ</td>
<td>STT</td>
<td>SHE+STT</td>
<td>VG+SOT</td>
</tr>
<tr>
<td>Static power (nW)</td>
<td>0 $^a$</td>
<td>0 $^a$</td>
<td>0 $^a$</td>
</tr>
<tr>
<td>Dynamic power (nW)</td>
<td>25.59</td>
<td>26.99</td>
<td>8.9</td>
</tr>
<tr>
<td>Total power (nW)</td>
<td>81.42</td>
<td>80.5 $^c$</td>
<td>77.06 $^c$</td>
</tr>
<tr>
<td>Delay (ps)</td>
<td>112.43</td>
<td>97.86</td>
<td>995.02</td>
</tr>
<tr>
<td>PDP (aJ)</td>
<td>9.15</td>
<td>7.87</td>
<td>76.67</td>
</tr>
</tbody>
</table>

Note: Write circuit has been excluded in the tabulated values. Static, dynamic and total power dissipation are the average values. $^a$ In standby mode power is switched off, resulting in zero static power. $^b$ Static power dissipation in steady state condition. $^c$ Total power dissipation = static power + dynamic power in active mode.

In the NAND-AND gates, the dynamic power dissipation of the VG+SOT is least followed by SHE+STT and STT based gates. VG+SOT consumes 40.98% and 42.24% lower dynamic power than SHE+STT and STT gates respectively. The total power dissipation of VG+SOT is least followed by SHE+STT and STT. VG+SOT gate dissipates 3.2% and 5.62% lower total power than SHE+STT and STT gates, respectively. The worst-case read delay of SHE+STT based NAND-AND gates least followed by STT and VG+SOT. SHE+STT based gate’s worst-case read delay is 1.03% and 88.94% lower than STT and VG+SOT,
respectively. The PDP SHE+STT based NAND-AND gates is least followed by STT and VG+SOT. The PDP of SHE+STT gate is 3.44%, and 88.57% lower than STT and VG+SOT gates, respectively.

In the XNOR-XOR gates, the dynamic power dissipation of the VG+SOT is least followed by SHE+STT and STT based gates. VG+SOT consumes 36.38% and 40.96% lower dynamic power than SHE+STT and STT gates respectively. The total power dissipation of VG+SOT is least followed by SHE+STT and STT. VG+SOT gate dissipates 2.02% and 3.51% lower total power than SHE+STT and STT gates, respectively. The worst-case read delay of SHE+STT based XNOR-XOR gates least followed by STT and VG+SOT. SHE+STT based gate’s worst-case read delay is 38.88% and 92.26% lower than STT and VG+SOT respectively. The PDP SHE+STT based XNOR-XOR gates is least followed by STT and VG+SOT. The PDP of SHE+STT is 39.85%, 92.11% lower than STT and VG+SOT based gates respectively.

We can observe that total power dissipation in all the logic gates is the least with VG+SOT logic gates, followed by SHE+STT and STT. However, the read delay of SHE+STT based logic gates is lower. This is because the IrMn3, AFM material in VG+SOT p-MTJ has higher resistivity, i.e., 278 µΩcm [30], whereas the Ta, HM material in SHE+STT p-MTJ has a lower resistivity of 190 µΩcm [57]. A higher resistivity of IrMn3 poses a higher resistance in the discharge current path with VG+SOT based logic gates, resulting in higher read delay. However, we know that based on the inputs, during the evaluation phase, one of the outputs (either OUT+ or OUT-) is discharged to GND, whereas the other one is raised to Vdd. Here, the value of OUT+ and OUT- is constant, whereas the discharge path resistance of VG+SOT is more due to the higher resistivity of AFM. Hence, we observe a lower power dissipation with VG+SOT circuits. Nevertheless, a higher delay causes a larger PDP with VG+SOT based logic gates. Figure 14 shows the bar chart plot of the dynamic, total power dissipation, worst-case read delay, and PDP for various logic gates circuits.

![Figure 14. Comparison of (a) total power dissipation, (b) delay, and (c) PDP among all logic gates.](image)

6.3. Performance of NV-FA Circuits

Table 5 show the performance comparison of NV-FA circuits with various p-MTJs.

<table>
<thead>
<tr>
<th>NVFA</th>
<th>STT</th>
<th>SHE+STT</th>
<th>VG+SOT</th>
</tr>
</thead>
<tbody>
<tr>
<td>Static power (nW)</td>
<td>0a (119.4b)</td>
<td>0a (119.75b)</td>
<td>0a (145.7b)</td>
</tr>
<tr>
<td>Dynamic power (nW)</td>
<td>46.36</td>
<td>43.73</td>
<td>12.99</td>
</tr>
<tr>
<td>Total power (nW)</td>
<td>165.76c</td>
<td>163.48c</td>
<td>158.69c</td>
</tr>
<tr>
<td>Delay (ps)</td>
<td>114.95</td>
<td>115.67</td>
<td>876.14</td>
</tr>
<tr>
<td>PDP (aJ)</td>
<td>19.05</td>
<td>18.9</td>
<td>139.05</td>
</tr>
</tbody>
</table>

Note: Write circuit has been excluded in the tabulated values. Static, dynamic and total power dissipation are the average values. a In standby mode power is switched off, resulting in zero static power. b Static power dissipation in steady state condition. c Total power dissipation= static power + dynamic power in active mode.
The dynamic power dissipation of the VG+SOT based NV-FA is least followed by SHE+STT and STT based NV-FA. VG+SOT based NV-FA consumes 70.29% and 71.98% lower dynamic power than SHE+STT and STT NV-FAs, respectively. The total power dissipation of VG+SOT based NV-FA is least followed by SHE+STT and STT. VG+SOT NV-FA dissipates 2.93% and 4.26% lower power than SHE+STT and STT NV-FA, respectively. The worst case read delay of STT based NV-FA is least followed by SHE+STT, and VG+SOT based NV-FAs. STT based NV-FA’s worst case read delay is 0.62% and 86.87% lower than SHE+STT and VG+SOT based NV-FA, respectively. The PDP SHE+STT NV-FA is least followed by STT and VG+SOT. The PDP of SHE+STT NV-FA is 0.78% and 86.4% lower than STT and VG+SOT based NV-FAs respectively. The reason for the increased read delay and PDP is the same as that of VG+SOT logic gates discussed in the previous section.

Figure 15 shows the bar chart plot of the dynamic, total power dissipation, worst-case read delay, and PDP for various NV-FA circuits.

7. Conclusions

In this paper, we have developed a novel write circuit for VG+SOT p-MTJs based on IMC architecture. Simulation results show that the novel write circuit consumes significantly less energy and EDP than conventional STT and SHE+STT write circuits. However, the worst-case write delay of the novel write circuit is more than the others. Then we developed all the logic gates, such as NOR-OR, NAND-AND, and XNOR-XOR, followed by three different types NV-FA with VG+SOT, STT, and SHE+STT p-MTJs based on IMC architecture. Results obtained from the simulation reveal that VG+SOT-based logic gates and NV-FA dissipate lower power than their other counterparts. Based on the results of this paper, we foresee a higher usage of VG+SOT-based hybrid circuits in low-power applications in the future.

Author Contributions: P.B.: Conceptualization, Investigation, Methodology, Validation, Writing—original draft, Writing—review & editing. H.S.: Writing—review & editing. G.D.: Writing—review. U.U.: Writing—review. All authors have read and agreed to the published version of the manuscript.

Funding: Manipal Academy of Higher Education, Manipal has provided the APC for this research article.

Data Availability Statement: Data available upon reasonable request.

Conflicts of Interest: The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

P Parallel
AP Antiparallel
PL Pinned layer
FL Fixed layer
BL Barrier layer
IMC In-memory-computation
References


33. Dieny, B.; Chshiev, M. Perpendicular magnetic anisotropy at transition metal/oxide interfaces and applications. *Rev. Mod. Phys.* 2017, 89, 025008. [CrossRef]


**Disclaimer/Publisher’s Note:** The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.