How GPUs Work

GPUs are one hell of a complex beast. It helps to know how CPUs work to be able to understand this, but hopefully I can explain it all in a relatively simple way and you’ll come away from this video a little smarter! This is going to be simplified to help it be more accessible, but if you want more details I’ve left links to all my sources in the description below – and of course if I get anything wrong or you have more information on this please do leave it in the comments below!

GPUs – Graphics processing units – are mostly produced by two big companies – AMD and Nvidia. There are plenty of other manufacturers, including Intel who now include GPUs inside almost all of their CPU lineup – but since these two are the most popular ‘discreet’ vendors, those are the ones I’ll focus on.

GPUs first came about in the 1970s as a way to accelerate 2D visuals in games like Space Invaders – yup, gaming has been pushing technology advancements for nearly 50 years – but really came to be in the 1990s. ATI, Nvidia, Imagine technologies’ PowerVR and more all created consumer add in cards, mostly with what’s known as ‘Fixed function’ units. These days, GPUs use programmable shaders – compute units that can do lots of functions but are still pretty specialised.

GPUs process data in parallel – lots and lots of computations all at the same time. CPUs mostly work serially – one thing at a time. Serial computation makes sense for most stuff – say you are doing a sum – lets say 2 x 3 – 5 – you can’t do the -5 until you’ve done the 2 x 3 right? So a CPU is a good tool to solve that problem.

Parallelisation is great for graphics though. The game – through a library like OpenGL/ DirectX or Vulkan – passes what are called vertices to the card to compute.  Vertices are points in 3d space. They can be shown as a table of X, Y & Z coordinates, and when you have 3 vertices you get a triangle – also known as a polygon or a primitive. The GPU performs any transformations needed such as moving, rotating or removing, to the vertices, then begins the rasterization stage.

The reason parallelisation is great is that for every frame you have millions – if not billions – of vertices, a lot of which need transforming per frame, so being able to calculate a whole load of vertices at the same time means faster overall computation – since you don’t need to do each point one at a time.  To give you an idea – the Titan XP has 3584 CUDA cores that can all compute simultaneously – these are built into just 28 “Cores” meaning there is a full 128 CUDA cores per full “Core”. An RX 480 has 2304 Stream processors – something a little different, but they are split into 36 “Compute Units” (CUs for short) meaning there is 64 stream processors per CU. Crazy huh?

Back to the stages – rasterization is the process or turning those primitives – the triangles – into pixels. If you’ve ever worked with Vector graphics like Adobe Illustrator before you’ll know what I mean here. In essence, you take a grid and lay it over your triangles. If the triangles appear in one of the squares, you mark it in there – but everything is currently black and white so you need to add colour.

Colour is added when the GPU computes the lighting in the scene. Some engines make the GPU do this all on the fly, most have at least some pre-compiled lighting data that makes it way faster. You also go through a ‘texture’ stage, where any texture maps – basically just an image or colour added on top of the model to add realism – are taken into account so you can get a final colour for each pixel fragment which are also processed at the same time in parallel.
cialis samples deeprootsmag.org Wheat Bran is another best food to increase testosterone naturally. This medication is a india sildenafil general medication of this Blue pill which conveys superb outcomes. Usually erectile dysfunction formulates when there is less or no ejaculation were at the greater risk of free cialis without prescription developing impotence in future. soft tabs viagra The best website for these medications also react with Sildenafil resulting in adverse side effects.
These fragments are then blended together, and by using their Z axis data the GPU can tell what is meant to be in front or behind then compiles accordingly. It’s then finalised into a frame, sent out to the display through the graphics card’s display hardware – that could involve on the fly compression or if you are using VGA, it’ll need to be converted to analogue signals before being sent out.

This is a rather simplified look at the graphics pipeline – the set of steps a GPU goes through to draw a frame. Here’s a look at the DirectX 11 & 12 graphics pipeline stages.

As you can see, we have the input assembler which gets all the vertex data needed. We then go through the Vertex shader – doing all of the transformation, skinning and lighting to the vertices that is needed. In DX11 and 12, we have the Hull Shader, Tessellator stage and the Domain stage, which aid in tessellation and ordering of the geometry.

Speaking of geometry – that’s just the name for the collection of vertices that make up the triangles in the scene. The geometry shader processes full primitives, including removing them entirely if they fall off screen.

The stream output stage is next, and is where the the data can be intercepted to either copy to memory and be used again for later, or passed back to the CPU for further processing. The data then goes to the Rasterizer stage which is known as a ‘fixed function’ meaning there is a bit of the chip specifically designed to do exactly this function and nothing else. As mentioned earlier, this is where the vectors are converted to pixel fragments – or in this case pixel shaders.

The pixel shader stage gives each fragment colour and depth values. It’s then onto the output merger stage where the GPU combines all the fragments into a single image, then is saved to the image buffer ready to be displayed on screen.

So. That is how a GPU works! If you’ve made it this far, congrats! To make it clear again, this is a skimming overview into how GPUs work – and especially thanks to AMD’s VEGA GPUs and how they use High Bandwidth memory (HBM2) as a sort of ‘high speed cache’ instead of proper memory like what we call ‘normal’ VRAM, it’s potentially going to change a bit as this becomes standard.

If you enjoyed the video and you felt you’ve learned something new, please consider subscribing and sharing the video – either on reddit, tech forums or even just with someone you know. If you want to help me out even more – if you are buying from overclockers UK or amazon, you can use the links in the description you directly support me, the channel and these videos! Otherwise leave a comment below on what you learned from the video, what you find most interesting or especially if you know anything else on this topic you’d like to share! Thank you for watching and I’ll see you on Friday for a great video on how to build a Z270 based gaming PC – and next wednesday for my video on the current state of the GPU market! See you then.