Why are AMD AND INTEL using CHIPLETS (Tiles)??

Intel recently announced their new Meteor Lake architecture, and as they called it, the “biggest architectural shift in 40 years” came with a new way of making CPUs. Gone are the single, monolithic chips with everything from your cores, cache, IO and iGPU all on one bit of complicated silicon, now everything is “tiles” and “chiplets”. But… why? Why have both AMD AND Intel swapped to using this modular design? Well that’s what I’m here to explain.

I want to start with a bit of clarification – AMD and Intel’s chiplet style designs are really quite different. The base notion that you break out subsystems like “cores” or “io” into their own chiplets is the same, but the actual implementation is quite different. AMD mount at least two dies to the PCB layer itself. The PCB is what has the traces that connect the dies together, and the modularity comes from which IO die they use, such as one with or without integrated graphics, and if they use one or two core dies, meaning it’s either up to an 8 core chip, or up to 16 cores instead. This has the benefit of lower cost and complexity. The fabs are already mounting silicon dies to the PCB substrate, so there isn’t all that much more difficulty in mounting multiple – at least in the grand scheme of things anyway. The downside is that the dies are physically quite far apart, and have to send their signals through plain-old copper traces, which adds latency and technically is more vulnerable to data integrity issues.

By contrast, Intel’s solution looks overcomplicated and expensive. Intel is technically still mounting a monolithic chip to its PCB substrates, but that monolithic chip happens to have a set of “tiles” on top of the base layer. Those are the modular components. That means unlike AMD just designing a new PCB substrate to change what connections are used, Intel has to design a different piece of silicon, have it manufactured, cut, tested, and binned, then they can try mounting their dies on top. This is their “FOVEROS” packaging method, and while it is considerably more expensive and complicated, it does mean things like the core tile is right next to the memory controller meaning there should be considerably less latency when the cores need to access the memory. The fact that this is an active piece of silicon is also a pretty big deal for performance and reliable data transfer. 

I hope that explains the differences, and similarities between the two designs. Still, the base concept is the same: break each block into its own bit of silicon. But, why though? Well to understand that you’ll need to know how CPUs are made. As a gross oversimplification, the design of the CPU is made into a mask layer – much in the same way designs get printed on T-Shirts, it’s a negative of what you want cut out. The silicon is loaded in as a thin, 12” disc called a “wafer”. The design then gets etched into the wafer multiple times – as many as they can fit on a single wafer. The larger the design, the less copies you’ll get out of a wafer. The problem here is that the process of etching the design into the wafers isn’t perfect, and you get defects. Some of those defects end up being harmless, or maybe only cause a partial failure, like one core doesn’t work properly but the rest are fine so you can just sell that as a 6 core instead of an 8 core, but sometimes it’s such a big failure – or collection of them – that you can’t do anything but throw that die out.  

So, if you make your die as small as possible, you not only get more chips out at the end, but statistically you get better yields – you make more use of the precious silicon wafer – because you don’t have to throw out as big of a chunk of it because a couple of transistors didn’t get formed right. Here’s an example. Intel’s i9-12900K has a large monolithic die that’s 20.5mm by 10.5mm. Plugging that into a wafer yield calculator with a somewhat optimistic defect density, you can see that Intel would get 212 dies out of a 300mm wafer, with 20 as partial, possibly usable dies, and 44 fully defective dies. That’s a yield of 82.65%. Not too bad. Now let me show you what AMD gets with their core dies. They are around 10mm by 7mm, and with the same wafer size and defect density, AMD would get 785 good dies, 60 partial dies, and 51 defective dies, for a yield of 93.93%. AMD gets nearly 4 dies for every ONE Intel produces. That is a very, very significant advantage, and one that Intel is clearly keen to exploit now too.

One of the other major benefits is that if you are making your CPU out of multiple dies, the dies don’t have to be made from the same process node. They don’t have to have the same size transistors, or even be built by the same company or fab! Intel’s new Meteor Lake chips are made out of four different process nodes, two from Intel and two from TSMC. The core die is the only one that needs to be made with the most cutting edge, power and density efficient node, that being Intel 4. The iGPU is made by TSMC using their N5 process node, which is what AMD is using for their core dies, while the SoC and IO dies are made with TSMC’s N6 node, and the base layer is made from the now ancient Intel 16 process node – because that doesn’t need anything remotely high tech or power dense for a glorified backplane. 

The reason this is so important is the newest, most cutting edge process nodes have the least capacity. Intel can only manufacture so many Intel 4 wafers at a time, so if all you need to make with that limited capacity is the tiniest little core dies, you can make twice, thrice, or quadruple the number of chips at a time. You then make the less performance sensitive stuff on a different, older, less expensive, more mass-volume process node, and hey-presto, you’ve got yourself way more chips for less money, in a shorter time frame. For someone like AMD, who is buying space in TSMC’s production line, it’s very important for them to make the most of the limited number of wafers they can get access to. 

Of course, this level of modularity also has the benefit of making the chips you produce more easy to customise. AMD shows this trait off very well, with the best example being their new Ryzen 7000X3D chips. If you buy a 7800X3D chip, you’ll get one 8 core die with their 3D V-Cache stacked on top. If you buy a 7950X3D though, you’ll get that same 8 core 3D V-Cache die, AND a regular old 8 core Zen 4 die. AMD is literally mix-and-matching with their 3D V-Cache and regular dies for these chips – but they didn’t have to completely redesign the chip to do that. They just dropped a different core die on. That’s another gross oversimplification of course, but I hope you get the idea. 

Even Intel is getting in on the modularity party, showcasing different IO die sizes in their presentation to better suit different packages. No doubt there will be multiple core die packages – some with maybe 2 P cores and 4 or 8 E cores, all the way up to the full fat 16 E cores and 8 P cores. That’s something they previously would have had to design the entire chip around, but now they can more easily swap between them without nearly as much work – and of course they benefit from using multiple process nodes here too, making the most use of their new Intel 4 node for only the most performance and power sensitive part of the chip.

While I’m sure we could talk for hours more about this, I think that’s a good place to leave off as an introduction to chiplets and why they are now the standard. I find this really interesting, and if you do too I’d really appreciate it if you would hit the subscribe button so we can explore more stuff like this, and I’d love to hear your thoughts in the comments down below. If I got anything too heinously wrong please do let me know in the comments too, or if you have some extra information I’d love to hear that too. Of course, if you haven’t seen my video talking all about the Meteor Lake announcement, do check that out, I’ll leave that on the end cards for you!