Intel’s Meteor Lake is INSANE!!
|Intel’s new Meteor Lake based chips are going to be absolutely insane! While it looks like we’ll see the mobile chips first, this new tech will become the 15th gen desktop chips, oh boy there’s a lot to talk about. This is a really, really major change to how Intel makes CPUs, and what Intel calls their biggest architectural shift in 40 years. Yeah, even bigger than Hybrid chips like Alder Lake and Rocket Lake. There is so much to cover here, so let’s get into it.
Intel’s Meteor Lake architecture has a pretty big shift in straight up how the chips are made. Intel is employing the use of “Disaggregation” of the chip design, which is the fanciest possible way I can imagine they could describe the fact that despite Intel actively mocking AMD for using “Glued Together” dies in 2017, they are now copying AMD with chiplet – sorry “Tiled” – CPU designs. Unlike AMD, there are four total tiles in their mobile package outlines: the “Compute Tile”, AKA the core die; the “IO Tile”; the “SoC Tile”; and the Graphics Tile. Let me run you through what those actually include and do.
The Compute Tile is probably the most familiar. This contains both the P cores and the E cores – this is basically the top half of the Alder Lake or Rocket Lake chip. This is the main cores and cache. That’s connected to the SoC tile, which is by far the most complicated die in the package, at least in terms of features to explain! The SoC tile contains the memory controller, the media engine, the new NPU, two Low Power Island E-Cores, along with WiFi 6E and WiFi 7 support, Bluetooth, display connections, Ethernet, USB 2 and 3, SATA and audio. Yeah, it’s a lot. Now you might be wondering, if the SoC tile runs all that IO – including both the memory controller and things like USB and Ethernet, what does it IO Tile do? In short, PCIe and Thunderbolt 4. They actually showed how they have two size variants of this for differing SKUs, so we’ll need to keep an eye on things like how many PCIe lanes the chip’s support.
I want to jump back into the SoC tile here, because this is by far the most interesting part. I’ll get into the NPU – the Neural Processing Unit – in a second, as well as those “Island E Cores”, but what’s most interesting to me here beyond those two is the inclusion of the media engine in the SoC tile. That’s the encoder and decoder that’s normally found as part of the graphics core, but it’s now part of the SoC tile which means every single Meteor Lake chip – F SKU or not, meaning regardless of if the chip has that Graphics Tile – will have an H264/H265/AV1 encoder and decoder built in. That’s frankly huge. I suspect the reason this has been moved here is because each of these blocks can be powered on and off independently, meaning you now don’t need to wake the GPU just to watch a video or record your screen, assuming you aren’t already using it to view your screen. A lot of the changes are efficiency focused here.
Before we move onto the core madness, I want to interject a bit of speculation. Based on the materials we’ve seen, it looks like these mobile packages will be limited to 6 P cores, although perhaps the HX line might stretch that to 8 instead. Either way, I think it’s pretty conceivable that Intel will continue to follow AMD’s design with multiple compute tiles. The SoC doesn’t look like it has any more interfaces available for that, so perhaps we’ll have to wait a generation or two to see that, but it seems pretty obvious Intel wants to copy AMD’s chiplet design and it makes sense to add more cores when it’s just an extra tile to drop in. Time will tell on that one though.
Ok, so let me get into the really big stuff here. These Meteor Lake chips will now have THREE DIFFERENT TYPES OF CORES ONBOARD. Yes, you heard that right, these are triple-hybrid chips. We have the existing P-Cores – now with the new Redwood Cove architecture with better efficiency than the current Golden Cove cores – and the E-Cores which are also using a new architecture, specifically Crestmont which boast IPC gains over Gracement – but the thing that’s new here is those Low Power Island E-Cores built into the SoC tile. These are even less powerful – and of course more efficient – than the regular E-Cores, although interestingly are still built on the Crestmont architecture, but with a much lower clockspeed and “some other modifications”. That means the whole Intel Thread Director thing is even more complicated, as the new process for deciding what cores to use is this: try to contain the task on the LPE-Cores alone. It sounds like every single task will be loaded onto the LPE-Cores first, then moved if necessary. If the work does need moving, it then will wake up the compute tile and move the work to the E-Cores. If it still doesn’t fit – as in it’s a high demand task like gaming or rendering – it will then move the task again onto a free P-Core to get full performance.
Intel envisages this as being a revolution in efficiency as the LPE-Cores will do the majority of the work, only spinning up the compute tile E-Cores when needed, and only bothering the high power P-Cores if it really needs to. To me this seems like it’s going to add latency – having to start the task on the LPE-Cores, then move it to the compute tile E-Cores, AND THEN move it again to a P-Core if it needs more performance, that seems like it’s going to add latency and I’m going to be very interested to see if that’s going to introduce more microstuttering while gaming as the game engine has to wait for the task to be moved twice before actually executing.
This is the sort of complexity I was somewhat hesitant to embrace with Alder Lake – and as you can see from the slide it’s a lot simpler. In fact, this slide explains why you can’t disable all P-Cores on 12th and 13th gen chips – tasks only get loaded into P-Cores, then get downgraded to E-Cores if they aren’t high priority. Interestingly, they would also periodically move E-Core threads back to P-Cores to reclassify them, then potentially move them back to the E-Cores. The new scheduling seems MORE complicated, not less.
Ok, moving to a different tile, this time the graphics tile, this is pretty exciting too. The new graphics core, Xe LPG, has another 2x performance per watt improvement over the 12th gen Xe LP core, and is seemingly quite a different design. It now features 8 Xe-cores which are made up of 16 vector engines – basically cores – meaning it has a total of 128 ‘cores’. It’s designed to run at a lower voltage, but a considerably higher clock. The current gen graphics peak at about 1.6GHz, it looks like Xe LPG will be more like 2.4GHz, all at a lower voltage too. The biggest change by far though is the introduction of 8 dedicated Ray Tracing Units, one for each Xe-core. While Intel do technically list one of the “Ray Tracing use cases” as gaming, the only data they provide is ray tracing in Blender, so it’s safe to say you won’t be playing Cyberpunk on Ultra-Psycho Ray Tracing on these chips.
Switching gears a little, I want to head back to the SoC tile and talk about that new Neural Processing Unit – the NPU. This is all about AI – specifically running AI inferencing. When it comes to AI tech, there are two main operations – training and inference. Training is the really computationally intensive bit. That’s the thing you need compute farms out the wazzoo to do well. Really high end hardware can train models like I showed with Dreambooth and Stable Diffusion, but that’s still pretty slow and with a really small dataset. Inference on the other hand is a lot, lot easier. That’s just giving the model and input and running through the neural network to get an output. Well, I say it’s a lot easier – and compared to training it really is – but it’s still a pretty intensive task especially for non-specialised hardware.
That’s where the NPU comes in. Intel showed some results from Stable Diffusion, with 20 iterations, just the CPU drew 40W for 43 seconds. The GPU took just 14.5 seconds at 37W, but the NPU, while taking longer than the GPU at 20.7 seconds, only drew 10W to do that, meaning it was almost 8 times more efficient than the CPU alone, while leaving the CPU and GPU free to do other work at the same time. Their examples are a little corporate focused, including using Teams to do audio and visual effects via the OpenVINO interface engine exclusively on the very efficient NPU, although I’m sure more use cases will pop up as hardware support becomes available via these chips. To be clear, this NPU is going to be on every SKU, so no matter if it’s an i3 or an i9 it’ll have an NPU. That’s a pretty big deal.
The NPU itself is made up of two compute units, with each compute unit housing 2048 MAC units – that’s Matrix Multiplication and Convolution units. The also share some scratchpad RAM for faster data access. As for support, Intel is supporting a number of APIs, including WinML, DirectML, ONNX RT and OpenVINO. It seems like these are focusing quite heavily on the Windows integration side of things, as DirectML is a Windows standard for machine learning. Again we’ll have to wait and see what programs end up making use of this, but it’s pretty exciting to see.
The final thing I want to cover here is Intel 4 – the process node these Meteor Lake chips are built on. Intel quotes that the new process node is over 20% more efficient than the current Intel 7 process node that the 12th, 13th and expectedly 14th gen chips are based on. It’s more dense – by a decent margin in some cases – which is what aids that efficiency difference. What surprised me is that this is the first time Intel will be using EUV – extreme ultraviolet – machines to etch their wafers. For context, TSMC’s N7+ node was the first to move to using ASML’s EUV machines, starting in late 2019. What’s even funnier to me is that Intel is one of the major shareholders in ASML, but chose to hold off on implementing EUV until it was more stable, which allowed TSMC – and their customers, namely AMD – to take advantage of Intel’s delay and roll out some impressively efficient and performant chips before Intel.
To be clear though, only the compute tile uses Intel 4 as its process node. Much like AMD do with their Ryzen chiplets, Intel is mixing and matching different process nodes in these Meteor Lake chips. Both the IO and SoC tiles are using TSMC’s N6 process node, and the GPU is built on TSMC N5, and the base silicon tile they all bond to as part of their FOVEROS packing is Intel 16. I’m planning on making a video explaining more about why both AMD and now Intel are moving to these chiplet style dies, so make sure you are subscribed so you don’t miss that.
Something else that’s interesting is this roadmap. Intel 4 is ramping up production now, but they plan on moving to Intel 3 pretty soon, then 20A half a year later, and 18A half a year after that. Considering Intel’s history of sticking with a process node until they absolutely have to move on, this seems like a major shift in how they operate. Likely thanks to them taking delivery of reliable EUV machines, there’s a lot of process node improvements they can do with that new tech.
So to recap, Meteor Lake is a “tiled” chiplet style design, featuring an SoC tile, Compute Tile, IO Tile and Graphics tile. The SoC tile now hosts two Low Power Island E-Cores which take all tasks in first, then delegate out to the compute tile if needed, and it contains a Neural Processing Unit for AI inferencing, plus they’ve moved the media engine out of the graphics tile and into the SoC tile so every chip will have media encode and decode built in. The P and E cores are new designs, as is the graphics core, and they’ve finally moved to Intel 4, the first EUV based process node from Intel. Quite a lot that’s new, huh!