Published on August 17, 2024 2:36 AM GMT
Can SRAM be stacked like NAND flash?
background
types of memory
SRAM has, for each bit, multiple transistors that connect power supplies to the gates of each other. Typically 6 transistors are used, but many variations have been proposed.
DRAM uses transistors to connect capacitors to an input/output line when they're written or read. The capacitors lose charge and must be refreshed periodically and when read.
Flash memory uses high voltage to push charge across an insulating layer, where it stays in place indefinitely. The stored-charge electrostatic field is combined with fields from wires to control transistors, 1 transistor per bit. They have a limited cycle life because the insulator layers can get damaged from charge transfer thru them.
SRAM cell sizes
Smaller transistors can carry less current than larger ones. When the size of wires is decreased, their capacitance decreases less than their diameter, so if length and switching speed is constant, the required current is similar. Smaller wires also have higher resistance; at current CPU wire sizes, conductivity is more than linear with cross-section area. With constant current, voltage drop then increases, which for the same length requires more "repeaters" along the path.
The area required to store 1 bit in SRAM is called the "cell size". Basically for the above reasons, SRAM cell sizes haven't decreased much for a few years.
layer counts
The "nm" number of process nodes no longer corresponds to any feature sizes. The current meaning of "X nm node" seems to be something like "the transistor density is similar to what a planar transistor process would have at X nm".
Yet, transistor counts have continued increasing. The only explanation, then, is more layers. Of course, that increases power usage proportionately without increasing area for heat dissipation, so a smaller fraction of transistors can be active at once.
That means performance per cost doesn't increase. Note that cost per transistor stopped going down after 28nm. Also, a few layers of transistors and wires isn't even close to the number of layers in modern flash memory.
silicon interposers
Historically, CPUs have been a single layer, with transistors on the CPU face side connected to contacts on the motherboard, and cooling on the CPU back side. The current trend is towards chiplets put on a silicon "interposer" layer.
Adding an extra semiconductor layer adds costs, so it must have some justification over alternatives. Vs a larger CPU, chiplets with problems can be discarded individually, which makes higher layer counts practical. They also have more modularity and thus design flexibility. Vs separate chips on PCBs, interposers can have much smaller wires, and can route signals around with transistors.
Apart from the extra silicon layer needed, interposers also need small holes (through-silicon vias = TSVs) to connect the chiplets on their face side to the motherboard on their back side. Making small holes thru silicon without causing other damage is hard, and narrower holes are harder to make.
high-bandwidth memory (HBM)
DRAM chiplets put directly on silicon interposer are called "HBM". If you can make TSVs, then you can stack multiple DRAM chiplets on top of each other, and run signals vertically thru them, decreasing signal travel distance. Chipmakers are now starting production of 12-layer HBM stacks.
Why stack DRAM on top of other DRAM? Why not stack DRAM on top of logic to reduce distances more? Logic chiplets use more power and reach higher temperatures that are bad for DRAM.
Why not stack SRAM caches like DRAM, since that also has lower power usage than logic? Compared to SRAM caches, the bandwidth of HBM is low and the latency is high; it's only "high-bandwidth" compared to sticks of DRAM connected by PCIe. That makes the advantages of SRAM over DRAM mostly irrelevant.
3d flash memory
Most flash memory made today is vertical NAND flash. In that, current flows thru a whole stack of transistors. 64 layers used to be typical; now, people are making 128-layer memory commercially.
my question
Given that stacking 64+ layers of flash memory is practical, why isn't DRAM or SRAM stacked like that? Increasing density would reduce signal travel distance on CPUs.
2tb of flash memory now costs about as much as 64gb of DRAM. If SRAM could be made like flash memory with 10x the size per bit, it would still be cheaper than DRAM. Why not do that?
stacked DRAM
Current DRAM uses capacitors that are cylindrical with height > width. There are some long-term plans to put the capacitors sideways and stack a lot of DRAM layers, but they're long-term plans because it's not considered economically practical now.
Also, DRAM typically uses smaller feature sizes than 3d flash, which makes stacking somewhat harder. The power usage of flash memory is lower, so cost per transistor is more important, and at this point newer nodes are more expensive per transistor. Also, larger flash memory cells makes storing multiple levels easier; 3-bit (8-level) flash memory is now standard, but that doesn't work very well for DRAM.
stacked SRAM
Suppose we want to replace DRAM chiplets with stacked SRAM that's made like flash memory. What prevents that from being done? Obviously SRAM has a more complex structure, but with photolithography, the complexity of patterns (of the same elements, at the same scale) is irrelevant.
Looking at their structures, the only relevant thing SRAM has that flash doesn't seems to be wires crossing over each other, which requires connections between small horizontal and vertical wires. Well, here's a video that goes into more detail about the fabrication process of 3d flash memory. Basically, many thin layers are stacked, deep trenches/holes are etched in that, and stuff is deposited in them.
So, can SRAM cells be redesigned so they can be fabricated by the methods used for vertical flash memory? Sort of. Here's an example paper of people trying to do that; they estimate that density matches current SRAM at ~10 layers...which I think makes it too expensive to compete with DRAM while being impractical to integrate in logic chiplets. Well, I was thinking a bit about how vertical SRAM could be implemented, and found something that seems fairly practical. You just need to have a separate chiplet that can be processed differently and think outside the conjoined triangles a bit.
Discuss