Rendering

This section describes the render method in the demo.

result = vkWaitForFences (
    renderer->device,
    1,
    (VkFence[1]){ commandbufferCompletedFence },
    VK_TRUE,
    UINT64_MAX
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkWaitForFences failed (0x%08X)", (uint32_t)result);

result = vkResetFences (
    renderer->device,
    1,
    (VkFence[1]){ commandbufferCompletedFence }
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkResetFences failed (0x%08X)", (uint32_t)result);

As mentioned before, a command buffer executed on the GPU can not be rerecorded on the CPU. So before the command buffer can be recorded, we will need to ensure the command buffer has completed. For this purpose, we created a fence for every command buffer to indicate its readiness for recording.

At the start of the render call, we will wait for this fence to be signalled and immediately unsignal the fence to allow the same fence to be used for signalling again. This fence will be signalled at the end of GPU submission, later on in this function. But when the fence was created, we ensured the fence was already signalled using VK_FENCE_CREATE_SIGNALED_BIT to make sure this wait passes the first time around.

result = vkAcquireNextImageKHR (
    renderer->device,
    renderer->swapChain,
    UINT64_MAX,
    frameWritableSemaphore,
    VK_NULL_HANDLE,
    &renderer->backbufferIndex
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkAcquireNextImageKHR failed (0x%08X)", (uint32_t)result);

In this function call, we ask the swapchain to give us a new backbuffer to draw into. This function will return the index of the backbuffer we are allowed to use in the last argument, meaning renderer->backbufferIndex will now contain the value of the swapchain backbuffer.

vkAcquireNextImageKHR has flexible methods of blocking behaviour. In this case, we specified UINT64_MAX as the timeout duration, meaning we will wait endlessly until a backbuffer becomes available.

Alternatively, 0 can be specified, indicating we want to know whether or not a buffer is available without waiting. If a buffer is available, the function will succeed. If the buffer is not available yet, the function will fail with error code VK_NOT_READY.

Lastly any number of nanoseconds can be specified to tune the waiting period. If no backbuffer index is retrieved in the meantime, VK_TIMEOUT will be returned.

While vkAcquireNextImageKHR gives us a backbuffer index to use, we cannot use it directly. This backbuffer index only indicates which buffer will become available. The actual availability is guaranteed by the semaphore and the fence. These two objects are signalled at the time the backbuffer will become available for processing on the GPU.

result = vkBeginCommandBuffer (
    cmdBuffer,
    &(VkCommandBufferBeginInfo){
        .sType = VK_STRUCTURE_TYPE_COMMAND_BUFFER_BEGIN_INFO,
        .flags = VK_COMMAND_BUFFER_USAGE_ONE_TIME_SUBMIT_BIT,
    }
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkBeginCommandBuffer failed (0x%08X)", (uint32_t)result);

vkCmdPipelineBarrier (
    cmdBuffer,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    0,
    0, NULL,
    0, NULL,
    1, (VkImageMemoryBarrier[1]){
        [0] = {
            .sType               = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
            .srcAccessMask       = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
            .dstAccessMask       = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
            .oldLayout           = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
            .newLayout           = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
            .image               = framebufferImage,
            .subresourceRange    = { .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, .levelCount = 1, .layerCount = 1 },
        },
    }
);

vkCmdBeginRenderPass (
    cmdBuffer,
    &(VkRenderPassBeginInfo){
        .sType           = VK_STRUCTURE_TYPE_RENDER_PASS_BEGIN_INFO,
        .pNext           = NULL,
        .renderPass      = renderer->renderPass,
        .framebuffer     = framebuffer,
        .renderArea      = { .offset = { 0, 0 }, .extent = { renderer->windowWidth, renderer->windowHeight } },
        .clearValueCount = 2,
        .pClearValues    = (VkClearValue[2]){
            [0] = { .color.float32 = { 1.0f, 0.0f, 1.0f, 1.0f } },
            [1] = { .depthStencil  = { 1.0f, 0 } },
        },
    },
    VK_SUBPASS_CONTENTS_INLINE
);

vkCmdBindPipeline ( cmdBuffer, VK_PIPELINE_BIND_POINT_GRAPHICS, renderer->pipeline );

vkCmdSetViewport (
    cmdBuffer,
    0, 1, (VkViewport[1]){
        [0] = {
            .x = 0.0f, .y = 0.0f,
            .width = (float)renderer->windowWidth, .height = (float)renderer->windowHeight,
            .minDepth = -1.0f, .maxDepth = 1.0f }
    }
);
vkCmdSetScissor  (
    cmdBuffer,
    0, 1, (VkRect2D[1]){
        [0] = {
            .offset = { 0, 0 },
            .extent = { renderer->windowWidth, renderer->windowHeight }
        }
    }
);

In here we first ensure the command buffer is recording, and as the first command insert the command to ensure we can render efficiently and safely to the backbuffer. We then start the renderpass, and bind the graphics pipeline.

Note however that since we have set the viewport and scissor to be dynamic state in the graphics pipeline, we will need to set the viewport and scissor after binding the pipeline.

Now we can start binding resources and start rendering.

vkCmdBindDescriptorSets (
    cmdBuffer,
    VK_PIPELINE_BIND_POINT_GRAPHICS,
    renderer->pipelineLayout,
    0,
    1,
    (VkDescriptorSet[1]) { renderer->descriptorSet },
    0,
    NULL
);

// Set view-projection matrix
vkCmdPushConstants (
    cmdBuffer,
    renderer->pipelineLayout,
    VK_SHADER_STAGE_VERTEX_BIT,
    0,
    16 * sizeof ( float ),
    vpMat.cells
);

for ( uint32_t i = 0; i < renderer->meshCount; i++ )
{
    for ( uint32_t j = 0; j < renderer->meshes[i].submeshCount; j++ )
    {
        vkCmdPushConstants (
            cmdBuffer,
            renderer->pipelineLayout,
            VK_SHADER_STAGE_FRAGMENT_BIT,
            16 * sizeof ( float ),
            1 * sizeof ( uint32_t ),
            &renderer->meshes[i].submeshes[j].diffuseTexture
        );

        vkCmdBindIndexBuffer (
            cmdBuffer,
            renderer->indexBuffer,
            renderer->meshes[i].submeshes[j].indexOffset,
            renderer->meshes[i].submeshes[j].indexType
        );

        vkCmdBindVertexBuffers (
            cmdBuffer,
            0,
            2,
            (VkBuffer[2]) { renderer->vertexBuffer, renderer->vertexBuffer },
            (VkDeviceSize[2]) {
                renderer->meshes[i].positionAttributeOffset,
                renderer->meshes[i].texcoordAttributeOffset
            }
        );

        vkCmdDrawIndexed (
            cmdBuffer,
            (uint32_t)renderer->meshes[i].submeshes[j].indexCount,
            1,
            0,
            0,
            0
        );
    }
}

The first thing we do is set the descriptor set we have stored all of our textures in. This allows the shaders to index these textures. Note we do not directly bind the textures themselves: While possible, this requires a specific descriptor set per drawcall. Since this would generate more complexity in the code, we opted for dynamic indexing in the shader using indices instead. These indices are specified in the call to vkCmdPushConstants in the inner loop.

The first call to vkCmdPushConstants pushes the view-projection matrix to the shader. Since the scenes we are rendering have all meshes at (0, 0, 0) we can get away with not specifying the model matrices. For dynamically uploading model matrices, you can use one of three methods:

Specify the model matrix in push constants as well
Create a buffer containing all model matrices and index dynamically
Use vkCmdUpdateBuffer to queue commands to update portions of a buffer to contain the new model data. Note however this might need a resource barrier depending upon the implementation.

Lastly, for every mesh we bind the vertex and index buffers we created on the host and we draw the meshes. Note the vertex buffer in our case contains both the position and texcoord in a single buffer in an Array-Of-Structures pattern, meaning there is a separate array for positions and array for texture coordinates. This means in this case we are using the same vertex buffer twice for the same data in a not necessarily obvious way.

vkCmdEndRenderPass ( cmdBuffer );

vkCmdPipelineBarrier (
    cmdBuffer,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    VK_PIPELINE_STAGE_TOP_OF_PIPE_BIT,
    0,
    0, NULL,
    0, NULL,
    1, (VkImageMemoryBarrier[1]){
        [0] = {
            .sType               = VK_STRUCTURE_TYPE_IMAGE_MEMORY_BARRIER,
            .pNext               = NULL,
            .srcAccessMask       = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
            .dstAccessMask       = VK_ACCESS_COLOR_ATTACHMENT_WRITE_BIT,
            .oldLayout           = VK_IMAGE_LAYOUT_COLOR_ATTACHMENT_OPTIMAL,
            .newLayout           = VK_IMAGE_LAYOUT_PRESENT_SRC_KHR,
            .srcQueueFamilyIndex = 0,
            .dstQueueFamilyIndex = 0,
            .image               = framebufferImage,
            .subresourceRange    = { .aspectMask = VK_IMAGE_ASPECT_COLOR_BIT, .levelCount = 1, .layerCount = 1 },
        },
    }
);

result = vkEndCommandBuffer ( cmdBuffer );
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkEndCommandBuffer failed (0x%08X)", (uint32_t)result);

result = vkQueueSubmit (
    renderer->renderQueue,
    1, (VkSubmitInfo[1]){
        [0] = {
            .sType                = VK_STRUCTURE_TYPE_SUBMIT_INFO,
            .pNext                = 0,
            .waitSemaphoreCount   = 1,
            .pWaitSemaphores      = (VkSemaphore[1]){ framebufferWritable },
            .signalSemaphoreCount = 1,
            .pSignalSemaphores    = (VkSemaphore[1]){ commandbufferCompletedSemaphore },
            .pWaitDstStageMask    = (VkPipelineStageFlags[1]) { VK_PIPELINE_STAGE_ALL_GRAPHICS_BIT },
            .commandBufferCount   = 1,
            .pCommandBuffers      = (VkCommandBuffer[1]){
                cmdBuffer,
            },
        },
    },
    commandbufferCompletedFence
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkQueueSubmit failed (0x%08X)", (uint32_t)result);

In the above section we indicate the end of the renderpass, and transition the backbuffer back to the layout the buffer needs to be in for presentation, as the next use for the buffer will be presentation.

When those two commands are done, we end the command buffer and immediately queue the command buffer for submission.

Note that in the submission we have framebufferWritable as a semaphore in the pWaitSemaphores list. This indicates the semaphore indicated by framebufferWritable has to be signalled before the command buffer can be allowed to execute. This semaphore is signalled by our call to vkAcquireNextImageKHR.

The call to vkQueueSubmit signals 2 objects:

commandBufferCompletedSemaphore when the command buffer has executed, this semaphore we will use in the next step
commandbufferCompletedFence when the list of command buffers has executed, although our list only includes the one command buffer. This is the fence used at the start of this function to ensure we can record the command buffer.

With the rendering work out of the way, we just need to instruct the API to show our backbuffer on screen when work is completed before moving onto the next frame:

result = renderer->vkDeviceVtbl.vkQueuePresentKHR (
    renderer->presentQueue,
    &(VkPresentInfoKHR){
        .sType              = VK_STRUCTURE_TYPE_PRESENT_INFO_KHR,
        .pNext              = NULL,
        .waitSemaphoreCount = 1,
        .pWaitSemaphores    = (VkSemaphore[1]) { commandBufferCompletedSemaphore },
        .swapchainCount     = 1,
        .pSwapchains        = (VkSwapchainKHR[1]) { renderer->swapChain },
        .pImageIndices      = &renderer->backbufferIndex,
        .pResults           = &presentResult,
    }
);
if ( result != VK_SUCCESS )
    RETURN_ERROR(-1, "vkQueuePresentKHR failed (0x%08X)", (uint32_t)result);

vkQueuePresentKHR will append a command to the command queue to flip the image specified in pImageIndices to the surface associated with pSwapchains. Since we are executing this command on a separate present queue rather than the normal graphics queue, we instruct the queue to at least wait until commandBufferCompletedSemaphore is signalled. We just instructed vkQueueSubmit to signal this semaphore as soon as the command buffer completes, so this present operation will wait until the backbuffer is properly processed before showing it to the user of the application.