Metal #2 Rendering Pipeline

This article is mainly about the knowledge points related to the Rendering Pipeline in Metal.

Hardware Basics

First, we need to know the difference between GPU and CPU.

GPU (Graphics Processing Unit): The graphics processing unit, or graphics processor. The role of this hardware is to process massive amounts of data. Because it uses a highly parallel structure, it can quickly process large data such as images and videos.
CPU (Central Processing Unit): The central processing unit, or central processor. The role of this hardware is to quickly process ordered data, where data is processed one after another.

The CPU passes instructions to the GPU. Metal's strategy is to use an instruction buffer on the CPU to store multiple CPU instructions, and to prevent blocking, the CPU will continue to issue instructions for the next frame instead of waiting for the GPU to complete its current tasks.

Rendering Pipeline

From a top-level perspective, the rendering pipelines of various APIs actually do not have much difference. In Metal's official documentation, it summarizes the Metal rendering pipeline as the application stage - vertex stage - rasterization stage - fragment stage - pixel stage. From a low-level perspective, to implement each of the above steps, we need the program to have specific control over the abstract concepts used.

We can write a project from scratch and go through the entire process to understand the implementation of the rendering pipeline in Metal. Create a new Multiplatform App.

Initialization

MetalView

In SwiftUI, you can get MTKView via import MetalKit. We need to wrap MTKView in a UIViewRepresentable (iOS) or NSViewRepresentable (macOS) to obtain and use MTKView. For example,

The MetalViewRepresentable in the above code must conform to the ViewRepresentable protocol. On macOS, it needs to implement the makeNSView() and updateNSView() functions, while on iOS it needs to implement the makeUIView() and updateUIView() functions. The specific steps are beyond the scope of this article; a reference code snippet is provided below.

Then create a simplest window in ContentView.swift.

Renderer Class

Other APIs need to manually implement the lifecycle within a frame in some way, or the game loop. In Metal, Apple provides MetalKit, which has underlying structures to help us simplify the implementation of the game loop. Therefore, in Metal, we use MetalKit in combination with a custom Renderer class (which conforms to the MTKViewDelegate protocol) to implement rendering calls.

MTKViewDelegate defines callback methods related to MTKView. Through this protocol, we can listen to and respond to MTKView events. There are two main methods:

mtkView(_:drawableSizeWillChange:): This method is called when the drawable size of the MTKView changes. In plain words, the window size has changed.
draw(in:): This method is called every frame. Usually, Metal APIs are called within this method for rendering.

That is to say, we need such a Renderer.

Making the Renderer inherit from NSObject is mainly a historical legacy issue from Apple. Many core functions of the UIKit / Cocoa frameworks are still implemented based on Objective-C, so we still make the Renderer inherit from NSObject, and then make it conform to MTKViewDelegate via an extension.

Next, add a @State variable to MetalView so it knows who its Renderer is. During window initialization, set the metalView in the Renderer to the current metalView.

Variables Set Once

The purpose of the Initialization step is to obtain references to devices, states, commands, buffers, etc. A great feature of Metal compared to other APIs is that we can pre-set many variables during the initialization step, rather than doing these things every frame.

Among them, there are some variables we only need to set once (and should be treated as singletons):

MTLDevice: A reference to the GPU device.
MTLCommandQueue: The queue for the CPU input command buffer.
MTLLibrary: The library containing shader code.

There are some variables that can be set multiple times depending on our needs:

MTLBuffer: Buffer. Specifically, the content filled might be vertex information, etc. We pass the so-called vertex data to the GPU through this carrier.
MTLRenderPipelineState: Specific settings for the rendering state, such as which shaders to use, depth settings, color settings, vertex data reading rules, etc.

These should all be handled by the Renderer class.

Let's first declare three variables that only need to be set once. Here we set them all as implicitly unwrapped optionals, which is the ! here. Its function is to indicate that the variable can be nil, but it will be automatically unwrapped when used, without needing to manually unwrap it every time (using ? or ! to access the value). If we define it as a regular optional type (?), we need to explicitly unwrap it when using it, for example:

SwiftRenderer.device?.someMethod()

This is quite cumbersome. If defined as !, it can be used directly

SwiftRenderer.device.someMethod()

Similarly, we can also set up some variables for objects that might change during the rendering process,

Next, in the init function, before calling the superclass init, set the values of these variables.

Finally, you can also set a clear color used to clear the screen.

Set Up a Random Mesh

In a formal project, we almost never use manual declaration to create a Mesh; usually, we read a file. However, for the convenience of discussion in this article, let's just put a box here. Note that after declaring the mesh, it must be passed to the GPU.

Set Up Render Pipeline State

The render pipeline state is usually described by a pipeline state object (PSO). The state includes the currently active vertex and fragment shaders, the current vertex descriptor, pixel format, etc.

Here we assume that there are already two shader functions, vert and frag, and we can create the PSO through the following settings.

After creation is complete, set pipelineState to the state described by this PSO. The above code is before super.init(), setting the initial state of the rendering pipeline here.

Including the states set above, common states of the rendering pipeline are:

Specifying graphics functions and related data

Vertex function (vertexFunction)
Fragment function (fragmentFunction)
Maximum function call depth for the top-level vertex shader function (maxVertexCallStackDepth)
Maximum function call depth for the top-level fragment shader function (maxFragmentCallStackDepth)

Specifying render pipeline state

Attachment array for color data (colorAttachment)
Pixel format attachment for depth data (depthAttachmentPixelFormat)
Pixel format attachment for stencil data (stencilAttachmentPixelFormat)
Reset to default state (reset)

Specifying buffer layout and fetch behavior

Vertex descriptor (vertexDescriptor)

There is also a lot of data that can be set and retrieved; for details, please refer to the Apple Metal documentation.

Vertex Descriptor

We know that vertex data is transmitted to the GPU in the form of a Buffer, so ultimately they are just a large string of bytes. The GPU needs to know how to interpret this large pile of bytes, otherwise, this data will be meaningless. Metal uses a Vertex Descriptor to accomplish this task. The vertex descriptor is a data structure used to let the GPU understand the data you put in the MTLBuffer.

First, let's understand a few terms related to vertices:

Attributes: For example, position, normal, vertex coordinates, etc. A vertex may have multiple attributes. Therefore, the data we deliver to the GPU might look something like

Swiftv1 = [position_v1, normal_v1, uv_v1],
v2 = [position_v2, normal_v2, uv_v2],
//...

buffer = [v1, v2, ...]

this format.

Layouts: Specify some vertex-related data, such as stride.

The syntax used by Metal's vertex descriptor is as follows.

Swiftlet vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0

vertexDescriptor.layouts[0].stride = 
	MemoryLayout<SIMD3<Float>>.stride

Pipeline State Descriptor

Rendering

draw will be executed every frame, and in this function, we set up the instructions to be sent to the GPU. To recap, to send instructions, we need the instruction carrier commandBuffer, and the queue for the instruction carrier commandQueue (which should have already been set up during initialization),

Rendering starts with a drawing instruction, also known as a draw command. This instruction needs to inform the number of vertices and the type of primitives to draw. For example, a rendering instruction that draws three vertices starting from vertex 0 in the form of a triangle.

Swift[renderEncoder drawPrimitives: MTLPrimitiveTypeTriangle
			   vertexStart:0
			   vertexCount:3];

The vertex stage provides data for each vertex. After processing enough vertices, the rendering pipeline will start rasterizing the primitives, determining which pixels on the render target are "inside" the primitive. Then, in the fragment stage, the rendering pipeline will determine the specific color values to write into these pixels.

How the Rendering Pipeline Processes Data

Overview

We know that the Vertex Function (vertex shader) generates vertex data for each vertex, while the Fragment Function (fragment shader) provides fragment data for each fragment. However, the content of this data is entirely customizable by us, which is the reason these two shaders exist.

The Metal documentation mentions that there are usually three places where we can define what data we want to pass.

Inputs to the rendering pipeline. These inputs are provided by the application and passed to the vertex shader (i.e., the process from the application stage to the vertex stage).
Outputs of the vertex stage. These outputs are provided by the vertex shader and handed over to the fragment shader (strictly speaking, they should be passed to the rasterization stage, because there is also an interpolation step here).
Inputs to the fragment shader. Although the output of the vertex stage and the input of the fragment stage are of the same type, they are not actually the same set of data. This is because we know that in the rasterization stage, the rasterizer actually generates far more fragment function input types than the number of vertices, due to the presence of interpolation.

For example, the inputs to the rendering pipeline (from the CPU-side application) might contain data such as vertex position data and color.

Preparing Data for the Vertex Shader

For example, vertex position data and color can be wrapped in a struct using SIMD vector types.

Swifttypedef struct 
{
	vector_float2 position;
	vector_float4 color;
} AAPLVertex;

In MSL, SIMD types are very commonly used. SIMD stands for Single Instruction, Multiple Data. These vector types can perform parallel computing, meaning a single instruction can process multiple data elements simultaneously, thereby improving computational efficiency. Compared to regular vector types, their differences are mainly reflected in performance optimization for parallel computing. SIMD vector types can process multiple data elements in parallel within a single instruction, so they can significantly improve performance in large-scale data processing scenarios. Furthermore, modern GPUs and CPUs usually have hardware support for SIMD instruction sets, allowing full utilization of hardware acceleration.

References: