少点错误 2024年08月17日
SRAM stacking
index_new5.html
../../../zaker_core/zaker_tpl_static/wap/tpl_guoji1.html

 

本文探讨了SRAM是否可以像NAND闪存一样堆叠,以提高CPU的密度和性能。文章首先介绍了SRAM、DRAM和闪存的结构和特点,并分析了SRAM单元尺寸和层数的限制。接着,文章讨论了硅中介层和高带宽内存(HBM)技术的应用,以及3D闪存的堆叠技术。最后,文章分析了堆叠DRAM和SRAM的挑战和可能性,并提出了一个可行的SRAM堆叠方案。

🤔 **SRAM和NAND闪存的结构差异:**SRAM每个比特使用多个晶体管连接电源,而NAND闪存每个比特使用一个晶体管,通过高电压将电荷推过绝缘层。SRAM单元尺寸受限于晶体管尺寸和线宽,而NAND闪存的堆叠技术可以实现更高的密度。

🤔 **堆叠DRAM和SRAM的挑战:**堆叠DRAM面临着电容尺寸和特征尺寸的限制,以及成本和功耗的挑战。堆叠SRAM则需要重新设计单元结构,以适应3D闪存的制造工艺。

🤔 **SRAM堆叠的可行方案:**文章提出了一种可行的SRAM堆叠方案,即使用一个独立的芯片,通过不同的工艺制造,并突破传统的三角形结构。该方案可以实现与当前SRAM相同的密度,但成本可能过高,难以与DRAM竞争。

🤔 **堆叠技术的未来:**随着技术的进步,堆叠DRAM和SRAM技术可能会在未来得到发展,但目前仍面临着技术和经济方面的挑战。

🤔 **结论:**尽管SRAM堆叠技术面临着挑战,但它仍具有巨大的潜力,可以提高CPU的密度和性能。未来,随着技术的不断发展,我们可能会看到更先进的堆叠技术应用于各种存储器中。

Published on August 17, 2024 2:36 AM GMT

Can SRAM be stacked like NAND flash?

background

types of memory

SRAM has, for each bit, multiple transistors that connect power supplies to the gates of each other. Typically 6 transistors are used, but many variations have been proposed.

DRAM uses transistors to connect capacitors to an input/output line when they're written or read. The capacitors lose charge and must be refreshed periodically and when read.

Flash memory uses high voltage to push charge across an insulating layer, where it stays in place indefinitely. The stored-charge electrostatic field is combined with fields from wires to control transistors, 1 transistor per bit. They have a limited cycle life because the insulator layers can get damaged from charge transfer thru them.

SRAM cell sizes

Smaller transistors can carry less current than larger ones. When the size of wires is decreased, their capacitance decreases less than their diameter, so if length and switching speed is constant, the required current is similar. Smaller wires also have higher resistance; at current CPU wire sizes, conductivity is more than linear with cross-section area. With constant current, voltage drop then increases, which for the same length requires more "repeaters" along the path.

The area required to store 1 bit in SRAM is called the "cell size". Basically for the above reasons, SRAM cell sizes haven't decreased much for a few years.

layer counts

The "nm" number of process nodes no longer corresponds to any feature sizes. The current meaning of "X nm node" seems to be something like "the transistor density is similar to what a planar transistor process would have at X nm".

Yet, transistor counts have continued increasing. The only explanation, then, is more layers. Of course, that increases power usage proportionately without increasing area for heat dissipation, so a smaller fraction of transistors can be active at once.

That means performance per cost doesn't increase. Note that cost per transistor stopped going down after 28nm. Also, a few layers of transistors and wires isn't even close to the number of layers in modern flash memory.

silicon interposers

Historically, CPUs have been a single layer, with transistors on the CPU face side connected to contacts on the motherboard, and cooling on the CPU back side. The current trend is towards chiplets put on a silicon "interposer" layer.

Adding an extra semiconductor layer adds costs, so it must have some justification over alternatives. Vs a larger CPU, chiplets with problems can be discarded individually, which makes higher layer counts practical. They also have more modularity and thus design flexibility. Vs separate chips on PCBs, interposers can have much smaller wires, and can route signals around with transistors.

Apart from the extra silicon layer needed, interposers also need small holes (through-silicon vias = TSVs) to connect the chiplets on their face side to the motherboard on their back side. Making small holes thru silicon without causing other damage is hard, and narrower holes are harder to make.

high-bandwidth memory (HBM)

DRAM chiplets put directly on silicon interposer are called "HBM". If you can make TSVs, then you can stack multiple DRAM chiplets on top of each other, and run signals vertically thru them, decreasing signal travel distance. Chipmakers are now starting production of 12-layer HBM stacks.

Why stack DRAM on top of other DRAM? Why not stack DRAM on top of logic to reduce distances more? Logic chiplets use more power and reach higher temperatures that are bad for DRAM.

Why not stack SRAM caches like DRAM, since that also has lower power usage than logic? Compared to SRAM caches, the bandwidth of HBM is low and the latency is high; it's only "high-bandwidth" compared to sticks of DRAM connected by PCIe. That makes the advantages of SRAM over DRAM mostly irrelevant.

3d flash memory

Most flash memory made today is vertical NAND flash. In that, current flows thru a whole stack of transistors. 64 layers used to be typical; now, people are making 128-layer memory commercially.

my question

Given that stacking 64+ layers of flash memory is practical, why isn't DRAM or SRAM stacked like that? Increasing density would reduce signal travel distance on CPUs.

2tb of flash memory now costs about as much as 64gb of DRAM. If SRAM could be made like flash memory with 10x the size per bit, it would still be cheaper than DRAM. Why not do that?

stacked DRAM

Current DRAM uses capacitors that are cylindrical with height > width. There are some long-term plans to put the capacitors sideways and stack a lot of DRAM layers, but they're long-term plans because it's not considered economically practical now.

Also, DRAM typically uses smaller feature sizes than 3d flash, which makes stacking somewhat harder. The power usage of flash memory is lower, so cost per transistor is more important, and at this point newer nodes are more expensive per transistor. Also, larger flash memory cells makes storing multiple levels easier; 3-bit (8-level) flash memory is now standard, but that doesn't work very well for DRAM.

stacked SRAM

Suppose we want to replace DRAM chiplets with stacked SRAM that's made like flash memory. What prevents that from being done? Obviously SRAM has a more complex structure, but with photolithography, the complexity of patterns (of the same elements, at the same scale) is irrelevant.

Looking at their structures, the only relevant thing SRAM has that flash doesn't seems to be wires crossing over each other, which requires connections between small horizontal and vertical wires. Well, here's a video that goes into more detail about the fabrication process of 3d flash memory. Basically, many thin layers are stacked, deep trenches/holes are etched in that, and stuff is deposited in them.

So, can SRAM cells be redesigned so they can be fabricated by the methods used for vertical flash memory? Sort of. Here's an example paper of people trying to do that; they estimate that density matches current SRAM at ~10 layers...which I think makes it too expensive to compete with DRAM while being impractical to integrate in logic chiplets. Well, I was thinking a bit about how vertical SRAM could be implemented, and found something that seems fairly practical. You just need to have a separate chiplet that can be processed differently and think outside the conjoined triangles a bit.



Discuss

Fish AI Reader

Fish AI Reader

AI辅助创作,多种专业模板,深度分析,高质量内容生成。从观点提取到深度思考,FishAI为您提供全方位的创作支持。新版本引入自定义参数,让您的创作更加个性化和精准。

FishAI

FishAI

鱼阅,AI 时代的下一个智能信息助手,助你摆脱信息焦虑

联系邮箱 441953276@qq.com

相关标签

SRAM NAND闪存 堆叠技术 DRAM CPU
相关文章