Vulkan Textures Unbound

6 minute read

Problem Statement

I have recently been working on implementing a Vulkan 1.1 backend for my engine. Since the project started with a Direct3D 12 graphics backend, all of the shaders are written in HLSL. While DXC has done wonders to reduce the amount of rework needed for many of the project’s shaders, one feature that did not work directly out of the box when cross-compiling to SPIR-V for Vulkan was unbounded arrays of textures. There seems to be limited information available online about how to support these, so that will be the focus of this post.

To be clear, when I say “array of textures” I am talking about the following:

Texture2D mytextures[10] : register(t0); // array of textures <- this post is about this

Texture2DArray othertexture : register(t0); // texture array <- not talking about this

HLSL and Direct3D 12 allow for truly unbounded arrays of textures. The shader author does not need to know the upper limit of the array, and from the application side the implemeneter only needs to be sure they do not cause the shader to index outside of a valid range of bound descriptors. Since not indexing outside the bounds of an array is a characteristic of a well-formed application in the first place, this seems a reasonable requirement for the flexibility this introduces.

To be a bit more specific, here’s what we are trying to support in a cross-platform way:

// instead of this
Texture2D materialTextures[1024] : register(t0); // array of textures with upper limit

// we want to write this
Texture2D materialTextures[] : register(t0); // unbounded array of textures

Error Messages

When initially starting a Vulkan build of the application without any of the required extended features enabled and an unbound array of textures in use by a shader, the first error we stumble upon is something similar to the following, let’s call this Error 1:

Shader requires VkPhysicalDeviceDescriptorIndexingFeaturesEXT::runtimeDescriptorArray but is not enabled on the device.

Easy enough, it tells us directly in the message which feature we need to enable in order for this syntax to be allowed. I’ll show later in the post how to query for support and enable that feature, but for now let’s move on. With that feature enabled, the initial error is assuaged, but now we have a new validation error, which we’ll refer to as Error 2:

Descriptor set 0x4c6 bound as set #1 encountered the following validation error at vkCmdDrawIndexed() time: Descriptor in binding #1 index 18 is being used in draw but has not been updated.

What this is basically saying is that if you have a descriptor set layout that declares itself has having some number of descriptors, but you have only written the first few descriptors into your descriptor set, and the shader could possibly index past where you have written, you are in error. Since we updated our shader and enabled the runtimeDescriptorArray feature, the validation layer assumes that the entire descriptor set could be accessed by the shader and will validate against the entire contents, including for locations that may not have actually been written.

For example, say our descriptor set layout declares that it uses 1000 descriptors, but so far we have only had need to load the first 100. The validation layer will see the uninitialized contents after our first 100 textures and think that we may intend to index into them in the shader, which would be an error. There is a fairly straightforward way around this without enabling another feature that I will show, but if the feature descriptorBindingPartiallyBound is available, I recommend enabling it as it will provide true support for partial binding.

Quick and Dirty Way to Address Error 2

All it boils down to is this. After the valid entries have been written into VkWriteDescriptorSet structs, fill the entirety of the remaining objects with the first descriptor. The only assumption made here is that at least the first descriptor is valid. I feel this is a safe-enough assumption when combined with some well-placed assertions, otherwise why make a call to update the set in the first place? This way technically works, insofar that draws take place as expected with no validation layer warnings.

// after cycling through all of the "real" descriptors desired to be set, find out if any descriptor slots are left
if(descriptorCount < descriptorRange.numDescriptors)
	// for each unwritten descriptor in the layout, copy the first descriptor into those locations
	for(uint32 i = descriptorCount; i < descriptorRange.numDescriptors; ++i)
		// the first descriptor is assumed to be valid
		imageInfos[i] = imageInfos[0];

VkWriteDescriptorSet writeDescriptorSet;
writeDescriptorSet.pNext = nullptr;
writeDescriptorSet.dstSet = descriptorSet;
writeDescriptorSet.dstBinding = binding;
writeDescriptorSet.dstArrayElement = 0u;
writeDescriptorSet.descriptorCount = descriptorCount;
writeDescriptorSet.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE;
writeDescriptorSet.pImageInfo = pImageInfo;
writeDescriptorSet.pBufferInfo = nullptr;
writeDescriptorSet.pTexelBufferView = nullptr;

vkUpdateDescriptorSets(...); // as usual

Even with the extended feature enabled, filling in the remaining descriptors with a known error texture would be a good way of detecting out-of-bounds accesses at runtime. Thanks to Alex Tardif for bringing that to my attention.

A Cleaner Approach

The following approach solves both of the issues presented above, and is a simpler and cleaner approach compared to copying a single descriptor all over the place.

We want to query for and enable two extended features in order to enable truly unbounded arrays of textures. As luck would have it, both features come from the same set, namely VkPhysicalDeviceDescriptorIndexingFeaturesEXT. The features we want enabled are:

  • runtimeDescriptorArray - for Error 1
  • descriptorBindingPartiallyBound - for Error 2

This code block shows how to query for support for these features for a given VkPhysicalDevice:

VkPhysicalDeviceDescriptorIndexingFeaturesEXT indexingFeatures{};
indexingFeatures.pNext = nullptr;

VkPhysicalDeviceFeatures2 deviceFeatures{};
deviceFeatures.pNext = &indexingFeatures;
vkGetPhysicalDeviceFeatures2(physicalDevice, &deviceFeatures);

if(indexingFeatures.descriptorBindingPartiallyBound && indexingFeatures.runtimeDescriptorArray)
	// all set to use unbound arrays of textures

Next, enable those two features when creating the logical device (VkDevice):

VkPhysicalDeviceDescriptorIndexingFeaturesEXT indexingFeatures{};
indexingFeatures.pNext = nullptr;
indexingFeatures.descriptorBindingPartiallyBound = VK_TRUE;
indexingFeatures.runtimeDescriptorArray = VK_TRUE;

VkDeviceCreateInfo createInfo{};
createInfo.pNext = &indexingFeatures;
// the rest of the createInfo is filled out as normal

The last adjustment we need to make is to the descriptor set layout. For layouts requiring an unbounded array of textures, we want to add the VK_DESCRIPTOR_BINDING_PARTIALLY_BOUND_BIT_EXT flag by filling out a VkDescriptorBindingFlagsEXT struct. This will allow the validation layer to ease up and trust that the implementer is not going to index outside the valid range of bound descriptors for the given set.

// update these values to be useful for your specific use case
VkDescriptorSetLayoutBinding layoutBinding{};
layoutBinding.binding = 0u;
layoutBinding.descriptorType = VK_DESCRIPTOR_TYPE_SAMPLED_IMAGE;
layoutBinding.descriptorCount = 10000u;
layoutBinding.stageFlags = VK_SHADER_STAGE_FRAGMENT_BIT;


VkDescriptorSetLayoutBindingFlagsCreateInfoEXT extendedInfo{};
extendedInfo.pNext = nullptr;
extendedInfo.bindingCount = 1u;
extendedInfo.pBindingFlags = &bindFlag;

VkDescriptorSetLayoutCreateInfo layoutInfo{};
layoutInfo.pNext = &extendedInfo;
layoutInfo.flags = 0;
layoutInfo.bindingCount = 1u;
layoutInfo.pBindings = &layoutBinding;

vkCreateDescriptorSetLayout(...); // as usual

Final Thoughts

There you have it. The second way requires a little more initialization code at startup to query for and enable the required features, but that seems a small price to pay for a cleaner implementation and being able to bind partially-filled descriptor sets.

That said, it is also compatible with copying a known texture descriptor (error or otherwise) as mentioned earlier, if desired. Use cases vary across and within projects, and the implementer should choose the best fit for the problem they’re solving.

Being a sole developer, I don’t have access to the swaths of hardware configurations available to larger studios. I can however state that the above features are available on my 980 Ti, which is over four years old at the time of writing this post. I imagine most desktop GPUs will have had the features available for some time now, though mobile-centric GPUs may not be as accommodating (mobile is not currently one of my target ecosystems). Specific device support can be looked up on Sascha Willem’s site here.