Rendering

The rendering process in DirectX 12 is not all that different from other APIs, although there are several things of note. For this simple application, we will implement a simple command list to render a scene, and repopulate that command list every frame.

In order to discuss the rendering process, we first need to get into synchronization. In DirectX 11 and most other APIs, the synchronization between commands submitted on the CPU and executed on the GPU was implicit, and did not need a lot of consideration. In DirectX 12, we need to control submission and synchronization manually. What we would like to achieve is as follows:

Figure 1: Synchronization in the CPU-bound case

Figure 2: Synchronization in the GPU-bound case

This leads us to the process laid out in the diagram at the top of the section: We want to wait for the fence first, to make sure we can process the command buffer. We then process the command buffer, and submit it for execution, at which point we tell the queue to signal us back when it is done.

Neither case is optimal, and leaves a lot of potential for scalability unutilized. For instance, the case we cover neither covers multithreading nor multiple command queues, but for the basics this will be a decent implementation. More advanced methods of draw rendering and synchronization will be covered later in this guide.

Implementation details follow.

if ( renderer->frameFence->lpVtbl->GetCompletedValue ( renderer->frameFence ) < curFrame )
{
	result = renderer->frameFence->lpVtbl->SetEventOnCompletion ( renderer->frameFence, curFrame, renderer->eofEvent );
	if ( !SUCCEEDED(result) )
		RETURN_ERROR(-1, "SetEventOnCompletion failed (0x%08X)", result );

	WaitForSingleObject ( renderer->eofEvent, INFINITE );
}

Before we can be allowed to continue in the render call, we have to make sure that we are ready for the current frame. For this we use the fence: The fence value in our case signifies which frame we are ready for. curFrame starts off at 0, and in the initialization, we set the fence value to (FRAME_BUFFER_COUNT-1) after initializing. This means the check for the completed value will implicitly succeed for the first use for all frames as long as initialization is done.

The check for GetCompletedValue is not strictly necessary. SetEventOnCompletion also triggers the event if the value of the fence is higher than the value specified, but it can be safely assumed that GetCompletedValue is a less expensive operation than an event set and an event wait. Therefore, the condition is desirable in a CPU-bound case.

cmdAllocator->lpVtbl->Reset ( cmdAllocator );
cmdList->lpVtbl->Reset      ( cmdList, cmdAllocator, renderer->pso );

Here we reset the command allocator and the command list. Note the command allocator has to be reset first, as the reset on the command list will allocate a block of memory. Swapping the order of these two calls will set the command list to a “recording” state, after which it is invalid to reset the command allocator in use by that command list. Only if the command list is in a “closed” state is it valid to reset the command allocator.

In addition, note that we set the pipeline state object as the last parameter of the command list reset. This causes the initial state to be set to your pipeline state object, rather than a default object provided by DirectX 12. When you know the first pipeline state object you intend to use, it is best to set it in here, to prevent an expensive pipeline switch.

cmdList->lpVtbl->ResourceBarrier ( cmdList, 1, &(D3D12_RESOURCE_BARRIER){
    .Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION,
    .Flags = 0,
    .Transition = {
        .pResource = rtRes,
        .Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
        .StateBefore = D3D12_RESOURCE_STATE_PRESENT,
        .StateAfter  = D3D12_RESOURCE_STATE_RENDER_TARGET,
    },
});

Here we add a barrier for the current swap chain resource. This barrier is intended to switch the buffer from a “present” state to a “render target” state. This means the buffer could originally be in a present-optimized format, but as we intend to render to the buffer, it would be better for the buffer to be in a state where rendering is more efficient rather than presentation.

D3D12_RESOURCE_STATE_RENDER_TARGET is strictly speaking the only legal state that allows for a call to ClearRenderTargetView to succeed, although this rule - at the time of writing - does not appear to be enforced.

cmdList->lpVtbl->OMSetRenderTargets ( cmdList, 1, &rtDesc, FALSE, &renderer->dsDesc );
cmdList->lpVtbl->RSSetViewports    (
    cmdList,
    1,
    (D3D12_VIEWPORT[1]) {
        {
            .TopLeftX = 0.0f,
            .TopLeftY = 0.0f,
            .Width    = (float)renderer->windowWidth,
            .Height   = (float)renderer->windowHeight,
            .MinDepth = 0.0f,
            .MaxDepth = 1.0f
        }
    }
);
cmdList->lpVtbl->RSSetScissorRects (
    cmdList,
    1,
    (D3D12_RECT[1]){
        {
            .left   = 0,
            .top    = 0,
            .right  = renderer->windowWidth,
            .bottom = renderer->windowHeight,
        }
    }
);

cmdList->lpVtbl->IASetPrimitiveTopology   ( cmdList, D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
cmdList->lpVtbl->SetGraphicsRootSignature ( cmdList, renderer->rs );
cmdList->lpVtbl->SetDescriptorHeaps       ( cmdList, 2, (ID3D12DescriptorHeap*[2]){
        renderer->frame[frameBufferIdx].gpuDescriptors.cbvSrvUav,
        renderer->frame[frameBufferIdx].gpuDescriptors.sampler
    }
);

cmdList->lpVtbl->ClearRenderTargetView ( cmdList, rtDesc, (float[4]){ 0.0f, 1.0f, 0.0f, 1.0f }, 0, NULL );
cmdList->lpVtbl->ClearDepthStencilView ( cmdList, renderer->dsDesc, D3D12_CLEAR_FLAG_DEPTH, 1.0f, 0, 0, NULL );

We now set the state for the render commands. Two calls should be of note: IASetPrimitiveTopology and SetGraphicsRootSignature.

Despite us specifying we want to use triangles in the pipeline state object, IASetPrimitiveTopology is required. This is because, while we specified the primitive type is D3D12_PRIMITIVE_TOPOLOGY_TYPE_TRIANGLE, we did not yet specify how we were going to submit the triangles in the index buffer. The two valid primitive topologies are D3D_PRIMITIVE_TOPOLOGY_TRIANGLELIST and D3D_PRIMITIVE_TOPOLOGY_TRIANGLESTRIP. Triangle lists create a new triangle per 3 indices, whereas a triangle strip creates one per every index, reusing the 2 previous indices to determine the edge against which the new triangle will be constructed. In this case, we want to simply submit a triangle list.

The root signature needs to be set as well. This has to be a root signature of the same time as the pipeline state object was created with, but not necessarily the same root signature itself. This means you can bind any root signature with different bindings as long as the layout is the same, and similarly you can not rely upon the proper root signature to automatically be bound.

The descriptor heaps are required in order to use descriptors from shader visible descriptor heaps in shaders. Only a single descriptor heap of each hype type can be bound at any one time. In this case, we would like to use the descriptor heap to dynamically index texture descriptors to select the texture to use in the shader.

D3D12_GPU_DESCRIPTOR_HANDLE texDesc;
((C_D3D12DescriptorHeap_GetGPUDescriptorHandleForHeapStart)(renderer->frame[frameBufferIdx].gpuDescriptors.cbvSrvUav->lpVtbl->GetGPUDescriptorHandleForHeapStart)) (
    renderer->frame[frameBufferIdx].gpuDescriptors.cbvSrvUav,
    &texDesc
);

rvm_aos_mat4 vpMat;
rvm_aos_mat4_mul_aos_mat4 ( &vpMat, projMat, viewMat );

cmdList->lpVtbl->SetGraphicsRootDescriptorTable ( cmdList, 1, texDesc );
cmdList->lpVtbl->SetGraphicsRoot32BitConstants  ( cmdList, 0, 16, vpMat.cells, 0 );

Here we set the graphics root descriptor table to the first entry in the CBV_SRV_UAV heap of our framebuffer, and put the view-projection matrix in the implicit constant buffer of the root signature.

The descriptor table will include a range of entries starting at the specified entry and has a number of descriptors specified by the root signature. The table can be indexed dynamically by the shader, selecting any texture in the list.

Similarly, the “32-bit constants” constant buffer is defined in the root signature, and exposed as a constant buffer to the shader. The advantage of using the “32-bit constant” system is not having to double-buffer multiple constant buffers for frequently updated data, although care should be taken to not update too much data. The call here updates the first range of 32-bit constants, indicating the view-projection matrix for the vertex shader. We also have a 32-bit constant for the pixel shader to indicate the texture index to use, but this is set in the next bit.

for ( uint32_t i = 0; i < renderer->meshCount; i++ )
{
    cmdList->lpVtbl->IASetVertexBuffers (
        cmdList,
        0,
        2,
        (D3D12_VERTEX_BUFFER_VIEW[2]) { renderer->meshes[i].positionVbv, renderer->meshes[i].texcoordVbv }
    );

    for ( uint32_t j = 0; j < renderer->meshes[i].submeshCount; j++ )
    {
        cmdList->lpVtbl->SetGraphicsRoot32BitConstant (
            cmdList,
            2,
            renderer->meshes[i].submeshes[j].diffuseTexture,
            0
        );

        cmdList->lpVtbl->IASetIndexBuffer (
            cmdList,
            &renderer->meshes[i].submeshes[j].ibv
        );

        cmdList->lpVtbl->DrawIndexedInstanced (
            cmdList,
            (uint32_t)renderer->meshes[i].submeshes[j].indexCount,
            1,
            0,
            0,
            0
        );
    }
}

Here we submit the draw calls. We need to set the vertex buffer views, index buffer view and the texture index to be set in our case. The shader uses the texture index to index the array of textures from the descriptor table we bound earlier. The vertex buffer and index buffer views are being used by the API to create and draw the primitives.

After these loops, we are done with rendering for the purposes for this application.

cmdList->lpVtbl->ResourceBarrier ( cmdList, 1, &(D3D12_RESOURCE_BARRIER){
    .Type = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION,
    .Flags = 0,
    .Transition = {
        .pResource = rtRes,
        .Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
        .StateBefore = D3D12_RESOURCE_STATE_RENDER_TARGET,
        .StateAfter  = D3D12_RESOURCE_STATE_PRESENT,
    },
});

Flip the render target resource back to present state. After this, you should no longer do any render target operations on the buffer.

cmdList->lpVtbl->Close ( cmdList );
renderer->commandQueue->lpVtbl->ExecuteCommandLists ( renderer->commandQueue, 1, &(ID3D12CommandList*)cmdList );

renderer->lastSubmittedFrame = curFrame + (FRAME_BUFFER_COUNT);
result = renderer->commandQueue->lpVtbl->Signal ( renderer->commandQueue, renderer->frameFence, renderer->lastSubmittedFrame );
assert ( SUCCEEDED ( result ) );

renderer->curFrame++;

Finally, since we are done with the command list, we would like to execute the command list. Before we do so, the command list must be in a closed state. To do this, we call the Close function and immediately follow it by doing ExecuteCommandLists. This will cause the command list to be pushed onto the command queue, and be executed after all prior commands in the command queue are done. The current thread will continue, and not wait until the command list either ends or even begins. It is executed in the background, as described in the synchronization figures at the beginning of this chapter.

After ExecuteCommandLists, we call Signal with curFrame + (FRAME_BUFFER_COUNT). This is a command that is added to the command queue, and as commands are executed in sequence, this command will be executed after the command list was executed. This will cause the fence's value to be updated to the value we specified the command after our command list. This will make sure that the next time we process this framebuffer, we will succeed in the wait at the start of this section.

Last, but not least, we would like to queue the present operation:

result = renderer->dxgiSwapChain->lpVtbl->Present1 (
    renderer->dxgiSwapChain,
    0,
    DXGI_PRESENT_RESTART,
    &(DXGI_PRESENT_PARAMETERS){
        .DirtyRectsCount = 0,
        .pScrollRect = NULL,
    }
);
if ( !SUCCEEDED(result) )
    RETURN_ERROR(-1, "Present1 failed (0x%08X)", result );

This command queues the present operation on the command queue with which the swap chain was created. The 0 filled in the second parameter means we would like to present this as soon as possible, while DXGI_PRESENT_RESTART means we allow to receive buffers out of order. Normally, the back buffer index would advance as follows: (assuming 3 buffers)

0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, ...

This strict ordering rule is circumvented by DXGI_PRESENT_RESTART which can look like this as an example:

0, 1, 2, 1, 2, 0, 2, 0, 2, 0, 1, 0, ...

This can be useful, as this allows for unlocked framerates to be achieved. At any one time, one buffer will be used as a buffer to show on screen, at which time both other buffers are available for use by the application. If one of them is submitted for a present operation, the other buffer can immediately be requested regardless of the buffer index being lower than the previous backbuffer. This means the application will not need to await the next buffer to become available to render a new frame.