This article covers the Metal Rendering Pipeline.
Hardware Basics
First, we need to understand the difference between GPU and CPU.
- GPU (Graphics Processing Unit): A graphics processing unit. This hardware is designed to process large amounts of data. Due to its highly parallel structure, it can quickly process large data such as images and video.
- CPU (Central Processing Unit): A central processing unit. This hardware is designed to quickly process sequential data, where processing happens one item after another.
The CPU passes instructions to the GPU. Metal's strategy is to use command buffers on the CPU to store multiple CPU instructions, and to prevent blocking, the CPU continuously issues instructions for the next frame rather than waiting for the GPU to finish the current task.
Rendering Pipeline
From a high-level perspective, the rendering pipelines of various APIs are not very different. In Apple's Metal documentation, the Metal rendering pipeline is summarized as: Application stage — Vertex stage — Rasterization stage — Fragment stage — Pixel stage. From a low-level perspective, implementing each of these steps requires the program to have concrete control over the abstract concepts used.
We can start from scratch with a project and go through the entire flow to understand how the rendering pipeline is implemented in Metal. Create a new Multiplatform App.
Initialization
MetalView
In SwiftUI, you can obtain MTKView by importing MetalKit. We need to wrap MTKView in a UIViewRepresentable (iOS) or NSViewRepresentable (macOS) to obtain and use MTKView. For example,
The MetalViewRepresentable in the code above must conform to the ViewRepresentable protocol. On macOS, it needs to implement makeNSView() and updateNSView(), while on iOS it needs to implement makeUIView() and updateUIView(). The specific steps are outside the scope of this article; reference code is provided below.
Then create a minimal window in ContentView.swift.
Renderer Class
Other APIs require manually implementing the lifecycle within a frame, or the game loop, in some way. In Metal, Apple provides MetalKit, which has underlying structures that simplify game loop implementation. Therefore, in Metal we use MetalKit together with our own Renderer class (which conforms to the MTKViewDelegate protocol) to implement rendering calls.
MTKViewDelegate defines callback methods related to MTKView. Through this protocol, we can listen for and respond to MTKView events. There are two main methods:
- mtkView(_:drawableSizeWillChange:): This method is called when the MTKView's drawable size changes. In plain terms, the window size has changed.
- draw(in:): This method is called every frame. Typically, Metal API calls for rendering are made inside this method.
That is to say, we need a Renderer like this.
Having Renderer inherit from NSObject is mainly due to Apple's legacy design—many core features of the UIKit/Cocoa frameworks are still implemented in Objective-C. Therefore we still have Renderer inherit from NSObject, then use an extension to make it conform to MTKViewDelegate.
Next, add a @State variable to MetalView so it knows its Renderer. When the window is initialized, set the Renderer's metalView to the current metalView.
One-time Setup Variables
The purpose of the initialization step is to obtain references to devices, state, commands, buffers, etc. A great feature of Metal compared to other APIs is that we can pre-configure many variables during initialization, rather than doing these things every frame.
Some variables we only need to set once (and should be treated as singletons):
- MTLDevice: Reference to the GPU device.
- MTLCommandQueue: Queue for CPU input command buffers.
- MTLLibrary: Library containing shader code functions.
Some variables can be set multiple times depending on our needs:
- MTLBuffer: Buffer. Specifically, it may contain vertex information, etc. We pass vertex data to the GPU through this carrier.
- MTLRenderPipelineState: Specific settings for render state, such as which shaders to use, depth settings, color settings, vertex data reading rules, etc.
These should all be managed by the Renderer class.
We first declare three variables that only need to be set once. Here we set them as implicitly unwrapped optionals (implicitly unwrapped optionals), denoted by !. This allows the variable to be nil, but it will be automatically unwrapped when used, without needing to manually unwrap each time (using ? or ! to access the value). If we define it as a regular optional (?), we need to explicitly unwrap when using it, for example:
or
Plain TextRenderer.device?.someMethod()
This is more cumbersome. If defined with !, we can directly use
Plain TextRenderer.device.someMethod()
Similarly, we can also set up some variables for objects that may change during rendering,
Below, in the init function, set the values of these variables before calling the parent init.
Finally, we can set a clear color to clear the screen.
A Simple Mesh
In real projects we almost never create meshes by manually declaring them; we usually read from a file. However, for the convenience of this article's discussion, we'll put a simple box here. Note that after declaring the mesh, it must be passed to the GPU.
Setting Render Pipeline State
Render pipeline state is typically described by the pipeline state object (PSO). State includes the currently active vertex and fragment shaders, the vertex descriptor, pixel format, etc.
Here we assume we already have vert and frag shader functions. We can create a PSO with the following setup.
After creation, set pipelineState to the state described by this PSO. The above code is before super.init(); here we set the initial state of the render pipeline.
Including the states set above, common render pipeline states include:
Specify graphics functions and related data
- Vertex function (vertexFunction)
- Fragment function (fragmentFunction)
- Maximum function call depth for the top-level vertex shader (maxVertexCallStackDepth)
- Maximum function call depth for the top-level fragment shader (maxFragmentCallStackDepth)
Specify render pipeline state
- Color attachment array (colorAttachment)
- Depth attachment pixel format (depthAttachmentPixelFormat)
- Stencil attachment pixel format (stencilAttachmentPixelFormat)
- Reset default state (reset)
Specify buffer layout and fetch behavior
- Vertex descriptor (vertexDescriptor)
There are many more settings and data that can be configured and retrieved; see the Apple Metal documentation for details.
Vertex Descriptor
We know that vertex data is passed to the GPU in the form of a Buffer, so ultimately it's just a long string of bytes. The GPU needs to know how to interpret this stream of bytes, otherwise the data would be meaningless. Metal uses the Vertex Descriptor to accomplish this task. The vertex descriptor is used to let the GPU understand the data structure of what you put in the MTLBuffer.
First, let's understand a few vertex-related terms:
- Attributes: For example, position, normal, UV coordinates, etc. A vertex may have multiple attributes. Therefore, the data we send to the GPU might look like:
Plain Textv1 = [position_v1, normal_v1, uv_v1],
v2 = [position_v2, normal_v2, uv_v2],
...
buffer = [v1, v2, ...]
- Layouts: Specify data related to vertices such as stride.
Metal's vertex descriptor uses the following syntax:
Plain Textlet vertexDescriptor = MTLVertexDescriptor()
vertexDescriptor.attributes[0].format = .float3
vertexDescriptor.attributes[0].offset = 0
vertexDescriptor.attributes[0].bufferIndex = 0
vertexDescriptor.layouts[0].stride =
MemoryLayout<SIMD3<Float>>.stride
Pipeline State Descriptor
Rendering
draw is executed every frame. In this function we set up the instructions to send to the GPU. To recap, to send instructions we need the commandBuffer (carrier for storing instructions), the commandQueue (queue for instruction carriers, which should already be set up during initialization),
Rendering begins with a draw command. This command needs to specify the vertex count and the type of primitives to draw. For example, a render command that draws three vertices as triangles starting from vertex 0:
Plain Text[renderEncoder drawPrimitives: MTLPrimitiveTypeTriangle
vertexStart:0
vertexCount:3];
The vertex stage provides data for each vertex. After enough vertices are processed, the render pipeline begins rasterizing primitives to determine which pixels on the render target are "inside" the primitive. Then, in the fragment stage, the render pipeline determines the specific color values to write to these pixels.
How the Render Pipeline Processes Data
Overview
We know that the Vertex Function (vertex shader) generates vertex data for each vertex, while the Fragment Function (fragment shader) provides fragment data for each fragment. However, the content of this data is customizable—that's the purpose of these two shaders.
Metal documentation mentions that we typically have three places where we can define what data to pass:
- Input to the render pipeline. This input is provided by the application and passed to the vertex shader (the process from application stage to vertex stage).
- Output of the vertex stage. This output is provided by the vertex shader and handed to the fragment shader (strictly speaking, passed to the rasterization stage, since interpolation happens there).
- Input to the fragment shader. Although the vertex stage output and fragment stage input are the same type, they are not actually the same set of data, because we know that in the rasterization stage the rasterizer generates far more fragment function inputs than the number of vertices, due to interpolation.
For example, input to the render pipeline (from the application on the CPU side) might include vertex position data and color.
Preparing Data for the Vertex Shader
For example, vertex position data and color can be wrapped in a struct using SIMD vector types:
Plain Texttypedef struct
{
vector_float2 position;
vector_float4 color;
} AAPLVertex;
In MSL, SIMD types are commonly used. SIMD stands for Single Instruction, Multiple Data. These vector types enable parallel computation—a single instruction can process multiple data elements simultaneously, improving computational efficiency. Compared to ordinary vector types, the main difference lies in performance optimization for parallel computation. SIMD vector types can process multiple data elements in parallel with a single instruction, so they can significantly improve performance in large-scale data processing scenarios. Modern GPUs and CPUs typically have hardware support for SIMD instruction sets, allowing full utilization of hardware acceleration.
References:
- https://developer.apple.com/documentation/metal/mtlrenderpipelinedescriptor
- https://developer.apple.com/documentation/metal/using_a_render_pipeline_to_render_primitives
