Textures

With the vertex and index data uploaded, it is now time to upload a texture. The process of creating a texture is similar to the process of creating a buffer, but contains a couple of very important differences, making uploading a texture far more complicated and multi-faceted than uploading vertex- or index data. In this portion, the process of uploading a texture with multiple mipmaps will be explained.

Textures are regularly formatted as so called “linear” or “row major” formats. This means that, for every pixel coordinate, the color can be found by calculating the following:

\begin{equation} (y \times pitch + x) * pixelSize \end{equation}

If we are to lay out the bits of the address, where the base is the start of the image buffer, it would look something like this for a 256×256 texture:

yyyy yyyy xxxx xxxx

However, for internal reasons it can be more efficient for GPUs to use a swizzled format. This can be better for the cache when sampling pixels above or below a certain other pixel. Formats like those can like more similar to these:

yxyx yxyx yxyx yxyx (Z-order curve)
xyxy xyxy xyxy yyxx (D3D12_TEXTURE_LAYOUT_64KB_STANDARD_SWIZZLE)

Because every GPU might have a different preferred encoding scheme, determining the exact encoding scheme needed and converting to the wide variety of schemes is difficult. That is why the ideal texture layout - D3D12_TEXTURE_LAYOUT_UNKNOWN - is kept undefined and left up to the implementation, but selectable nonetheless.

In order to convert to this format where the properties are intentionally left unknown, you will need to give the API a format it understands so the driver can internally convert it to the format it prefers. The two formats which are well defined are as follows:

D3D12_TEXTURE_LAYOUT_ROW_MAJOR
D3D12_TEXTURE_LAYOUT_64KB_STANDARD_SWIZZLE

However, there are various problems with both formats. For a start, D3D12_TEXTURE_LAYOUT_ROW_MAJOR is defined not to support mipmaps. This means only the main mipmap can be uploaded, and with the absence of a function to let the hardware create mipmaps for you, you will need to set up a pipeline to create mipmaps using multiple render passes or using a compute shader.

While D3D12_TEXTURE_LAYOUT_64KB_STANDARD_SWIZZLE does not share this problem, this format is not common on CPUs, and the operations involved in swizzling the bits are either fairly expensive or unintuitive. In addition, not all DirectX 12 capable GPUs support this layout.

In addition, creating a texture on the upload heap is not allowed. As such, it is impossible to call Map on any texture, making the process of uploading data to a texture directly impossible.

The correct way is initially rather unintuitive: You are supposed to upload your row-major pixels into a buffer, and then copy the contents of this buffer to a texture. This will allow the driver and/or hardware to implicitly convert from the buffer to their preferred format.

Before we continue, we will want to have a command list which we can use, here referred as the preloaderCommandList. This will be required for uploading the data a couple of steps in.

ID3D12GraphicsCommandList* preloaderCommandList = renderer->frame[0].commandList;
result = preloaderCommandList->lpVtbl->Reset (
    preloaderCommandList,
    renderer->frame[0].commandAllocator,
    NULL
);
if ( !SUCCEEDED(result) )
    RETURN_ERROR(-1, "Reset failed (0x%08X)", result );

In this example, we are using the command list for the first frame as our preload command buffer. A separate command list can also be used, but there is no implicit penalty to using the first frame as long as you wait for completion at the end.

ID3D12Resource* texRes;

D3D12_RESOURCE_DESC texDesc = {
    .Dimension = D3D12_RESOURCE_DIMENSION_TEXTURE2D,
    .Alignment = D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT,
    .Width     = imageDesc->width, .Height = imageDesc->height, .DepthOrArraySize = 1,
    .MipLevels = imageDesc->mipCount,
    .Format    = DXGI_FORMAT_R8G8B8A8_UNORM,
    .Layout    = D3D12_TEXTURE_LAYOUT_UNKNOWN,
    .SampleDesc.Count = 1,
};

result = renderer->device->lpVtbl->CreateCommittedResource (
    renderer->device,
    &(D3D12_HEAP_PROPERTIES){ .Type = D3D12_HEAP_TYPE_DEFAULT },
    0,
    &texDesc,
    D3D12_RESOURCE_STATE_COPY_DEST,
    NULL,
    &IID_ID3D12Resource,
    &texRes
);
if ( !SUCCEEDED(result) )
    RETURN_ERROR(-1, "CreateCommittedResource failed (0x%08X)", result );

A couple of things to point out here: First of all, you will note that while normally we embed the description structures within the function calls themselves, here we will actually need the same description desc for later. Furthermore, you will note that the texture is created on D3D12_HEAP_TYPE_DEFAULT this time. As you might remember, this means we cannot access it directly with the CPU. And lastly, we set the resource state to D3D12_RESOURCE_STATE_COPY_DEST. This means that the resource is to be prepared and optimized for storing data received from a copy operation, and not required to be capable of doing any other operation. We will switch this state later when we are done with the copy operation.

Now as we established before, we will need to have an alternate buffer to actually upload the pixel data. Before we do however, we need to know how large this buffer should be, and where we should put the data for the driver and/or hardware to process the data we are putting in. For this purpose, the GetCopyableFootprints function was created. It is used as follows:

uint64_t uploadSize = 0;
D3D12_PLACED_SUBRESOURCE_FOOTPRINT fp[16] = { 0 };
renderer->device->lpVtbl->GetCopyableFootprints ( renderer->device, &texDesc, 0, imageDesc->mipCount, 0, fp, NULL, NULL, &uploadSize );

This function will return two key properties we are interested in: The total upload size required, and the placed footprints of all of the mipmaps. The placed footprint is worth expanding upon: This structure contains the offset, dimensions, format and pitch of each of the mipmaps in the upload buffer. This structure will be required in order to make sure we are filling in the mipmaps properly. The array will need to be large enough to contain the amount of subresources you entered as part of the NumSubresources property. In this case, that is the imageDesc->mipCount property.

ID3D12Resource* texUploadRes;
result = renderer->device->lpVtbl->CreateCommittedResource (
    renderer->device,
    &(D3D12_HEAP_PROPERTIES){ .Type = D3D12_HEAP_TYPE_UPLOAD },
    0,
    &(D3D12_RESOURCE_DESC) {
        .Dimension = D3D12_RESOURCE_DIMENSION_BUFFER,
        .Alignment = D3D12_DEFAULT_RESOURCE_PLACEMENT_ALIGNMENT,
        .Width     = uploadSize, .Height = 1, .DepthOrArraySize = 1,
        .MipLevels = 1,
        .Format    = DXGI_FORMAT_UNKNOWN,
        .Layout    = D3D12_TEXTURE_LAYOUT_ROW_MAJOR,
        .SampleDesc.Count = 1,
    },
    D3D12_RESOURCE_STATE_GENERIC_READ,
    NULL,
    &IID_ID3D12Resource,
    &texUploadRes
);
if ( !SUCCEEDED(result) )
    RETURN_ERROR(-1, "CreateCommittedResource failed (0x%08X)", result );

Here we are creating the upload buffer in similar fashion to how we created the vertex and index buffer before. We are using the uploadSize property returned from GetCopyableFootprints to specify the size of the buffer.

uint8_t* texUploadData = NULL;
result = texUploadRes->lpVtbl->Map ( texUploadRes, 0, NULL, &texUploadData );
if ( !SUCCEEDED(result) )
    RETURN_ERROR(-1, "Map failed (0x%08X)", result );

for ( uint32_t mip = 0; mip < imageDesc->mipCount; mip++ )
{
    uint8_t* uploadStart = texUploadData + fp[mip].Offset;
    uint8_t* sourceStart = pixelData     + imageDesc->mips[mip].offset;
    uint32_t sourcePitch = (imageDesc->mips[mip].width * sizeof(uint32_t));
    for ( uint32_t i = 0; i < fp[mip].Footprint.Height; i++ )
    {
        memcpy (
            uploadStart + i * fp[mip].Footprint.RowPitch,
            sourceStart + i * sourcePitch,
            sourcePitch
        );
    }
}

texUploadRes->lpVtbl->Unmap ( texUploadRes, 0, NULL );

This block of code maps the upload resource, fills the individual mips with data, and then unmaps. This will make sure the upload resource contains the data we need to copy to the actual resource.

Now we just need to do the copies themselves. This is done using the ID3D12GraphicsCommandList::CopyTextureRegion.

for ( uint32_t mip = 0; mip < imageDesc->mipCount; mip++ )
{
    preloaderCommandList->lpVtbl->CopyTextureRegion (
        preloaderCommandList,
        &(D3D12_TEXTURE_COPY_LOCATION){
            .pResource = texRes,
            .Type      = D3D12_TEXTURE_COPY_TYPE_SUBRESOURCE_INDEX,
            .SubresourceIndex = mip,
        },
        0,0,0,
        &(D3D12_TEXTURE_COPY_LOCATION){
            .pResource = texUploadRes,
            .Type      = D3D12_TEXTURE_COPY_TYPE_PLACED_FOOTPRINT,
            .PlacedFootprint = fp[mip],
        },
        NULL
    );
}

First we specify the destination. The resource we want to copy to is the texture, so we specify that as the resource. We then specify the mip index as the subresource index, as each mip level is a separate subresource.

We then specify the texture upload buffer as the source, with the placed footprint for the current mipmap as the copy location within that resource.

We have now prepared our command list to copy the data for our texture into the texture in the proper format. Now we just need to do two more things before executing the command list. First of all, we want to stop copying to the texture and be able to use the texture for rendering now. This is done using a resource barrier.

preloaderCommandList->lpVtbl->ResourceBarrier (
    preloaderCommandList,
    1,
    (D3D12_RESOURCE_BARRIER[1]){
        {
            .Type       = D3D12_RESOURCE_BARRIER_TYPE_TRANSITION,
            .Transition = {
                .pResource   = texRes,
                .Subresource = D3D12_RESOURCE_BARRIER_ALL_SUBRESOURCES,
                .StateBefore = D3D12_RESOURCE_STATE_COPY_DEST,
                .StateAfter  = D3D12_RESOURCE_STATE_GENERIC_READ
            }
        },
    }
);

This function adds a command to the command list to transition the resource state of a resource. In this case, we transition from D3D12_RESOURCE_STATE_COPY_DEST (the state we were in before) to D3D12_RESOURCE_STATE_GENERIC_READ. The D3D12_RESOURCE_STATE_GENERIC_READ allows the texture to be used for most purposes that do not include writing to the resource. It might prove beneficial to use a more specific set of properties instead. (eg D3D12_RESOURCE_STATE_PIXEL_SHADER_RESOURCE and D3D12_RESOURCE_STATE_NON_PIXEL_SHADER_RESOURCE for exclusively allowing use as a texture)

The last thing left to do is to update our descriptor heaps to contain references to our created texture resources.

for ( uint32_t j = 0; j < FRAME_BUFFER_COUNT; j++ )
{
    D3D12_CPU_DESCRIPTOR_HANDLE cpuDesc;
    //TODO(Rick): The function calling signature is broken in the C interface of Direct3D 12, so we cast the function to the proper function signature
    ((C_D3D12DescriptorHeap_GetCPUDescriptorHandleForHeapStart)(renderer->frame[j].gpuDescriptors.cbvSrvUav->lpVtbl->GetCPUDescriptorHandleForHeapStart)) ( renderer->frame[j].gpuDescriptors.cbvSrvUav, &cpuDesc );
    cpuDesc.ptr += i * renderer->device->lpVtbl->GetDescriptorHandleIncrementSize ( renderer->device, D3D12_DESCRIPTOR_HEAP_TYPE_CBV_SRV_UAV );

    renderer->device->lpVtbl->CreateShaderResourceView (
        renderer->device,
        texRes,
        &(D3D12_SHADER_RESOURCE_VIEW_DESC){
            .Format                  = texDesc.Format,
            .ViewDimension           = D3D12_SRV_DIMENSION_TEXTURE2D,
            .Shader4ComponentMapping = D3D12_DEFAULT_SHADER_4_COMPONENT_MAPPING,
            .Texture2D.MipLevels     = imageDesc->mipCount,
        },
        cpuDesc
    );
}

This portion creates a descriptor in each of the descriptor heaps to use as a texture available for sampling in shaders using the proper format.

With all command list operations done, we can now close the command list and execute it.

result = preloaderCommandList->lpVtbl->Close ( preloaderCommandList );
if ( !SUCCEEDED(result) )
	RETURN_ERROR(-1, "Close failed (0x%08X)", result );

renderer->commandQueue->lpVtbl->ExecuteCommandLists ( renderer->commandQueue, 1, &preloaderCommandList );
result = renderer->frameFence->lpVtbl->SetEventOnCompletion ( renderer->frameFence, 0, renderer->eofEvent );
if ( !SUCCEEDED(result) )
	RETURN_ERROR(-1, "SetEventOnCompletion failed (0x%08X)", result );
result = renderer->commandQueue->lpVtbl->Signal(renderer->commandQueue, renderer->frameFence, 0);
if ( !SUCCEEDED(result) )
	RETURN_ERROR(-1, "Signal failed (0x%08X)", result );

WaitForSingleObject ( renderer->eofEvent, INFINITE );

In the above example, we wait for the command list to complete before continuing to execute on the CPU. This is not strictly required, as long as you are not attempting to reset the same command list or submitting data that might depend upon the preloader to complete in a different command queue. You can prepare the command lists for the first couple of frames and even execute them on the same queue, as tasks submitted to the same queue are guaranteed to be synchronized. However, because synchronization issues can be quite hard to track down, it is adviced to exercise caution in prematurely optimizing waits and barriers.