Here’s why Intel’s new CPUs are REVOLUTIONARY… Alder Lake Explained
|Intel’s upcoming Alder Lake CPUs, their 12th gen lineup, is set to take a pretty big leap forward – one that could very well rock the industry and may push its main competitor AMD to follow suit. No, I’m not talking about Intel finally moving off of its now very tired 14nm and an endless string of pluses node, no I’m talking about Intel Hybrid Technologies.
This is a pretty big deal. These new chips are going to have two different types of cores onboard, from two very different microarchitectures. You’ll get up to 8 “performance” cores, and up to 8 “efficiency” cores, with a total of up to 24 threads and up to 30MB of cache. Now I can already hear you smashing away at your keyboard to tell me that this isn’t new, Intel themselves have done this before not to mention Apple and, uh, this little company called… ARM! I know, and I’m getting to that. Chill.
But yeah, the angry people are right, this isn’t exactly new. Intel only recently killed off their last attempt at hybrid CPUs, codenamed Lakefield, although that I wouldn’t say it being a hybrid CPU was the sole reason for it’s shelving, but ARM (and therefore Apple) have been designing what they call big.LITTLE CPUs for literally a decade now (they have since launched a successor called DynamIQ with where multiple core designs are integrated into a single design).
Since ARM is the OG here, let me explain why this design can be so beneficial. As the name suggests, you have some combination of “big”, high power, high performance cores, and some number of “little” low energy, low performance cores. This is a really clever combination as it means all the slow, menial background tasks that a device like your phone needs to do like updating the time, checking for incoming notifications or even more active tasks like streaming music from Spotify can all be done using the low power, high efficiency cores. Sure it’s technically slower, but you never notice, and they draw a lot less power doing the same tasks as the ‘big’ cores would.
Then you have the big cores, say you want to start streaming a video, taking high res HDR pictures or 4K video or even play a mobile game, well then those big cores can kick in to run those tasks. Sure they’ll draw more power while doing so, but you get much better performance and they’ll only be switched on when needed since all the background tasks are done by the little cores.
This is what Intel is doing with their hybrid technologies. Lakefield was a weird design as it used 4 low power cores and just one high power core, seemingly trading off most of the performance and not gaining all that much in efficiency, but it did allow them to learn a whole lot and test out other designs like using two different process nodes (22nm and 10nm) on the same block of silicon, and stacking the cores literally on top of the iGPU and memory controller.
With Alder Lake it’s all one process node, what was once called Intel 10nm Enhanced SuperFin, but is now the more catchy Intel 7, and it’s still meant to be all on one die – one physical lump of silicon, but this time you get a much more even split of performance and efficiency cores. It looks like the “E” cores are neatly in groups of four so I’d expect most chips to offer at least four E-cores, and likely two or four “P”, performance, cores at a minimum.
Those E-cores aren’t exactly slouches either, as Intel showed in their Architecture Day the new Gracemont efficiency cores can offer as much as 40% more performance on a single thread compared to a Skylake (6th gen) core, while using 40% less power. That’s a significant improvement, and when compared in multi-threaded with a dual core, quad thread Skylake chip to a 4 core 4 thread E-core chip, the E-core can offer up to 80% more performance with 80% less power. What they are saying there is that in specific workloads, and at a given frequency, four of these new E-cores are only about 10% slower than a quad core hyperthreaded Skylake CPU (like the 6700K), while drawing nearly 60% less power – at least based on their charts. That’s absolutely incredible.
As for the P-Cores, while they aren’t quite as wholly revolutionary in their design as the E-cores, they’ve got some pretty important architecture differences compared to the backported Cypress Cove cores we saw in their current 11th generation chips. I won’t go into too much detail here, if you want to dive deeper Andrei and Dr Cutress have an excellent writeup on Anandtech I’ll link in the description you can read through, but to summarise there are changes across the core from more instruction decoders, more micro-operations cache and double the fetch bandwidth in the front end, to an additional ALU (arithmetic logic unit), new FADD (floating point add) units and improved L2 cache usage to decrease the need to read from system memory unnecessarily in the mid and back end. The end result is a claimed 19% IPC improvement, with specific tasks being up to 60% faster than the 11900K, although some do fall short at 5 or 10% slower – but on the whole it looks to be a healthy improvement.
So the chips themselves should be pretty fast, and efficient, but there’s a key piece of the puzzle missing that without it renders any benefits of having multiple types of cores somewhat irrelevant. It’s called the scheduler, and at a base level takes the instructions from your programs and splits them into queues for each of the cores. Again there is a whole lot of complexity I’m skimming over, but at a base level that’s it’s job.
The trouble is, generally speaking the scheduler is ‘dumb’, as in it’s not aware of what’s going on in the CPU, or the CPUs capabilities or design. We saw this become an issue with AMD’s Ryzen CPUs, where thanks to their more ‘modular’ design, they grouped cores into what they called a “CPU Complex” or “CCX”, and with Zen 2 then introduced chiplet designs using multiple pieces of silicon connected via their infinity fabric, chiplets they called “CCD”s. What would happen is the Windows scheduler would assign one instruction to a core in say the first core complex, then the next instruction from the same program to another core in a completely different complex or even different physical block of silicon. That meant the CPU would process the first instruction, then the second core would have to go and fetch the result of the first instruction from the other CCX or CCD which takes way, way longer than collecting from a core physically closer with shared L3 cache. Since then, Microsoft worked with AMD to create a “CCX Aware” scheduler so now the Windows Scheduler will do it’s best to not split related tasks across the various CCXs and CCDs on the CPU.
Unfortunately for Intel, knowing where cores are is a lot easier to deal with than knowing what type of cores they are, and more importantly what type of instruction should be run on each. While I’m sure with enough work the various OS schedulers could handle this just fine, Intel (I’d argue rather intelligently) decided to take a more proactive approach to help make the switch as seamless as possible. The OS scheduler will still need to add support for this paradigm – hence the Windows 11 launch – but Intel’s new Thread Director is aiming to make the OS scheduler’s job a fair bit easier.
Of course, the basic idea is that priority tasks, for example running a game or a 3D render, should be sent to a P-core, whereas the background running of the OS, data fetching and I’d imagine some amount of web browsing and word processing tasks too, they should all be sent to E-cores instead. But, on top of that it’s going to be able to move tasks from P to E cores, and vice versa, and intelligently manage not just on an application level but an instruction level where tasks get sent. Does your game have a Thread Sleep call? Cool that’ll get busy on an E-core while it’s literally doing nothing, then when it’s done it’ll kick back to the P-Cores to compute whatever’s next.
It’ll be monitoring the balance and providing feedback to the OS scheduler to help optimise the allocation mix to each core type, and is meant to adapt based on thermals, power settings and other operating conditions as well. That should mean the OS will have an easier time picking the right cores at the right times and should make for a good user experience too. So, tie that in with the impressive cores and you get a rather revolutionary (for x86) CPU design.
Now, you might be thinking that you don’t care about efficiency, I mean for a desktop CPU does it matter that the “big” cores Ryzen uses are being used for background tasks? Well, from a power draw perspective no, not really, but from a package power perspective, maybe. Take Rocket Lake’s i9-11900K, that had to drop two cores from the last gen 10900K because it’s “big” cores drew so, so much power, so much that they couldn’t handle having two more cores onboard. So, assuming Alder Lake’s P-cores are a similar level of towering infernos, it actually makes sense to include those lower power E-cores to handle the background tasks and output less heat while doing so, meaning more thermal and power budget can be available to the P-cores when you fire up a game or edit a video.
Obviously just reducing power consumption and therefore thermal output is the direct solution to the problem, but as an intermediate step, I’m pretty happy with that. In theory – and I should make it clear this is a hypothetical as I’ve not seen them talk about this at all – you could reserve some number of E-cores, perhaps one or the whole block of four, to not be allocated any intensive tasks. That would mean you could be rendering a video or 3D scene, and still have some cores effectively idling so you can use your PC like normal. Sure you’d trade some performance away, but 4 Skylake-performance cores would be plenty to play lighter games like CSGO or rocket league while still rendering in the background at near full speed. I think that’d be a great option for an enthusiast to have access to.
Either way, I’m really excited to see what Alder Lake has to offer. It’s a big step change for Intel, it’s a genuine innovation in the x86 market and based on that IPC improvement I wholly expect the 12th gen chips to trounce Ryzen 5000, potentially in both single threaded and all core workloads. Of course AMD does somewhat have the upper hand as Ryzen 5000 will be turning a year old right around Alder Lake’s launch, and the seemingly ‘magical’ Zen 4 looms ever closer, but that’s what’s so exciting! Actual competition! Each company leapfrogging the other, pushing for more performance and lower prices (ideally anyway). It’s great, and I’m excited, but what about you?