Skip to content

Conversation

illwieckz
Copy link
Member

Extracted from #1842:

Extracted and polished. The design is now stable enough to be submitted for merging.

The bug I face in #1842 is unrelated to that code.

@illwieckz illwieckz added T-Improvement Improvement for an existing feature A-Build T-Feature-Request Proposed new feature labels Oct 6, 2025
@illwieckz
Copy link
Member Author

illwieckz commented Oct 6, 2025

Once this is merged, here are example of things you can do in a C++ file:

#include "DaemonEmbeddedFiles/EngineShaders.h"

void test()
{
	const char* filename = "common.glsl";

	// Reading a file content by its name:
	const char* file1 = EngineShaders::ReadFile(filename);

	// Using a file directly using its symbol:
	const unsigned char* file2 = EngineShaders::common_glsl;

	// Alternatively:
	const char* file3 = (const char*) EngineShaders::common_glsl;

	// Testing the presence of a file with a given name:
	Log::Debug("File %s is %s", filename,
		EngineShaders::ReadFile(filename) == nullptr ? "absent" : "present");

	// Iterating the files:
	for ( auto it : EngineShaders::FileMap )
	{
		Log::Debug("Embedded file %s", it.first);
	}
}

The unsigned char* and char* inconsistency was already there before.

@illwieckz
Copy link
Member Author

illwieckz commented Oct 6, 2025

Listing files to embed:

set(GLSL_EMBED_DIR "${ENGINE_DIR}/renderer/glsl_source")
set(GLSL_EMBED_LIST
	# Common shader libraries
	common.glsl
	common_cp.glsl
	fogEquation_fp.glsl
	shaderProfiler_vp.glsl
	shaderProfiler_fp.glsl
	# …
)

Generating the embedded files, etc:

	# Generate GLSL include files.
	daemon_embed_files("EngineShaders" "GLSL" "TEXT" "client-objects")

First option ("EngineShaders") is the wanted NameSpace, second option ("GLSL") is the SLUG used in GLSL_EMBED_DIR and GLSL_EMBED_LIST variable and list, the third option ("TEXT") selects the file format (selecting "TEXT" will strip the \r characters, selecting "BINARY" will apply no transformation), and the last option ("client-objects") is the target the generated source files contribute to.

Those variable and list should be named ${SLUG}_EMBED_DIR and ${SLUG}_EMBED_LIST with ${SLUG} being the same, as they are discovered based on the SLUG string passed to the daemon_embed_files() macro.

This daemon_embed_files() macro should be called after the target is defined (here, client-objects).

And that's all. It's even not needed anymore to add the entry points by hand like this:

set(RENDERERLIST
	${DAEMON_EMBEDDED_DIR}/EngineShaders.cpp
	${DAEMON_EMBEDDED_DIR}/EngineShaders.h
	${ENGINE_DIR}/renderer/BufferBind.h
	# …
)

@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch from 9b96078 to 1313666 Compare October 6, 2025 01:58
@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch 2 times, most recently from a7ff720 to 945511f Compare October 6, 2025 05:31
@slipher
Copy link
Member

slipher commented Oct 6, 2025

So this will be used for the Vulkan binary shaders list?

I guess the separate EMBED_DIR and EMBED_LIST thing was designed for VFS embedding; if we don't need that it seems over-complicated. Just use a single list of absolute paths like everything else.

Note that the GLSL map that we have now is coded in a rather inefficient way. The map object stores the files as std::string, but the original char arrays are still there, so we have 2 copies of the files at all times. Then when you read out a file a 3rd copy is created... it would be better for the map to have a StringView or something.

It doesn't make sense to me that some auxiliary functions return char* because that would be unusable for binary files. In any case I doubt we need those; just the map is a good enough interface.

@VReaperV
Copy link
Contributor

VReaperV commented Oct 7, 2025

So this will be used for the Vulkan binary shaders list?

Vulkan will use functionality that was already added in #1845, which is now rebased on this pr.

@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch from 95677da to ff5cf6c Compare October 7, 2025 06:14
@illwieckz
Copy link
Member Author

illwieckz commented Oct 7, 2025

So this will be used for the Vulkan binary shaders list?

I don't know yet if that's usable for the Vulkan binary, but with the latest iteration the embedded file should be directly accessible as a constexpr unsigned char*.

I guess the separate EMBED_DIR and EMBED_LIST thing was designed for VFS embedding; if we don't need that it seems over-complicated. Just use a single list of absolute paths like everything else.

Not only, it's easier to paste the path without having to add the prefix boiler plate. I prefer to set the DIR once than copypasting the DIR for every file.

Note that the GLSL map that we have now is coded in a rather inefficient way. The map object stores the files as std::string, but the original char arrays are still there, so we have 2 copies of the files at all times. Then when you read out a file a 3rd copy is created... it would be better for the map to have a StringView or something.

Ouch that's awful!

That should be better now, the map is now a map of a name and a pair of const unsigned char* data pointer and size_t size.

It doesn't make sense to me that some auxiliary functions return char* because that would be unusable for binary files. In any case I doubt we need those; just the map is a good enough interface.

I removed the ReadFile() function because with the latest iteration those two lines were fundamentally different and we should avoid confusion:

const char* text_ptr = EngineShaders::ReadFile(filename);

audioFile = FS::PakPath::ReadFile(filename);

Our other ReadFile() functions (like the second one) return a std::string with known size, while we now embed a const char* without bound checking (but the size is stored right next to it). So I removed the code for the first line to avoid deadly confusions.

It's unsafe to access the const unsigned char* array without reading the size too. For the GLSL code that expects text, we don't have to read the size because I added a null terminator at the end of every files. The size stored in the pair is the size of the original file (without the null terminator), so when copying it in a std::string (for example in the FileSystem code) we would not copy that extra null terminator that is only there for codes like the GLSL code reading the const unsigned char* array directly as a text buffer.

@illwieckz
Copy link
Member Author

Here is an example of how the current implementation works in the C++ code:

void test()
{
	// Using a file directly using its symbol:
	const char* file = (const char*) EngineShaders::common_glsl;

	// Loading a file by its name:
	const char* filename = "common.glsl";
	auto found = EngineShaders::FileMap.find(filename);
	const char* foundfile = (const char*) found->second.data;

	// Iterating the files:
	for ( auto it : EngineShaders::FileMap )
	{
		Log::Debug("Embedded file %s at address %p with size %d", it.first, it.second.data, it.second.size);
	}
}

@illwieckz
Copy link
Member Author

illwieckz commented Oct 7, 2025

The map is just a convenience for accessing the files, the map just stores the name, the size and the pointer to the data, so I assume the compiler will drop the map when the data is only accessed directly.

This should fit the need for:

  • What we already do for the GLSL shaders.
  • Embedding a file and loading it directly, for things like builtin images like $white, as an alternative to write a generator if a bit complex to write.
  • Exposing a builtin pak in memory to the VFS, including exposing it to the virtualized game, with the ability to list files and load them using the existing VFS mechanism.

This code was tested with my builtin pak implementation, so the only thing I haven't tested is the direct access (more than compiling the above code) is the direct access, but that should work.

@@ -0,0 +1,7 @@
struct embeddedFileMapEntry_t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use Str::StringRef. There's no reason to be using unsigned char instead of char for the data

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no reason to be using unsigned char instead of char for the data

I tried:

In file included from GeneratedSource/DaemonEmbeddedFiles/EngineShaders.cpp:7:
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘226’ from ‘int’ to ‘char’ [-Wnarrowing]
  137 | };
      | ^
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘137’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘164’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘226’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘136’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘154’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘226’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘137’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘165’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘226’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘136’ from ‘int’ to ‘char’ [-Wnarrowing]
GeneratedSource/DaemonEmbeddedFiles/EngineShaders/fogEquation_fp_glsl.h:137:1:
 error: narrowing conversion of ‘154’ from ‘int’ to ‘char’ [-Wnarrowing]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes that's very inconvenient to work around at the level of defining the actual array, but we should reinterpret_cast any pointers to char as soon as possible because unsigned char is worse for all purposes.

endif()

# Add null terminator.
string(REGEX REPLACE ",$" ",0x00," contents "${contents}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be using a regex replace since it is in a fixed location at the end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed.

@@ -0,0 +1,139 @@
set(DAEMON_TEXT_EMBEDDER "${CMAKE_CURRENT_SOURCE_DIR}/cmake/EmbedText.cmake")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This "Text" naming scheme doesn't make that much sense now that it can handle binary files.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but I didn't want to rename that one in that PR for now. I was going to name it DaemonEmbeddedFile or something like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


file(MAKE_DIRECTORY "${embed_dir}")

foreach(kind CPP H)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in this loop is too confusing. I shouldn't have to sit here half an hour trying to decipher this code using multiple variable indirection to do something that probably could have been done by just setting a variable to a hard-coded string.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of indirection saves me a lot of time when rewriting and re-rewriting across my various iterations , and it already saved me a lot of time. It only looks better to hardcode those strings when all the work was done by someone else, because then you don't know all the hundreds of different strings that have been used before being presented to the final state. Those indirections aren't only written to reduce the amount of lines to edit when redesigning things, but also to avoid the introduction of mistakes when editing the alternative expanded boiler plate.

And we know well who is doing all those iterations and who will suffer from editing the duplicated boiler plate once it is expanded:

git ls-files | grep -i cmake | while read f; do git blame -w -M -C -C --line-porcelain "$f" | grep -I '^author '; done | sort -f | uniq -ic | sort -n --reverse
   2135 author Thomas Debesse
   1286 author TimePath
    612 author slipher
    334 author Daniel Maloney
    243 author Corentin Wallez
    175 author Darren Salt
    161 author dolcetriade
    135 author VReaperV
    108 author Amanieu d'Antras
    107 author perturbed
     43 author Tsarevich Dmitry
     29 author Morel Bérenger
     12 author Mattia Basaglia
      7 author Tim
      3 author cmf028
      2 author Jesper B. Christensen
      1 author maek
      1 author Keziolio

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only looks better to hardcode those strings when all the work was done by someone else,

Yes that's the point. Other people besides you should be able to read the code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant: You don't know yet what's the cost of maintaining it is.

I can tell you it's easier this way, because I already maintained it.

Said other way: this is the solution I have chosen because I first tried the hardcoded way and it was painful and doing it this way made it easier. Lessons got learned, I share you the result, I save you time.

endif()

include(DaemonBuildInfo)
include(DaemonSourceGenerator)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Name seems inapt, how about DaemonFileEmbedding? Or alternatively something about "resources" since that is a commonly used term for embedding files in this way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just DaemonEmbed? That's what I did in #1845 until I rebased on this pr.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DaemonEmbed or a variant of it (DamonEmbeddedFile?) is the naming I'm planing to use for EmbedText that doesn't only produce embedded text files anymore.

Also this cmake does more code generation than just embedding files (like the buildinfo thing).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

endmacro()

macro(daemon_embed_files basename slug format targetname)
set(embed_source_dir "${slug}_EMBED_DIR")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variable name concatenation will make the code hard to understand later. For example if you search for GLSL_EMBED_DIR you will not find anything; the variable seems to be unused.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better than boilerplate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism also makes possible to embed files with paths having directories in them, the previous code only supported basename and it was then also making assumptions in the back of the user that were more painful and more limiting.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mechanism also makes possible to embed files with paths having directories in them, the previous code only supported basename and it was then also making assumptions in the back of the user that were more painful and more limiting.

That's a separate question. You can have directory names in the path without magical variable concatenation. The function can take two arguments - base dir variable and file list variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified it to pass variables instead.

@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch from ff5cf6c to 6978bf5 Compare October 8, 2025 23:39
@illwieckz
Copy link
Member Author

So, now there is:

  • DaemonSourceGenerator.cmake, which takes care of providing functions for the “BuildInfo” system and the driving of file embedding, used for GLSL shaders for now. It replaces DamonBuildinfo.cmake and merged and genericized the code for embedding GLSL shades, to be reusable for any set of file of any kind. This DaemonSourceGenerator.cmake file also features the reusable mechanism for writing a file not rewriting it if the content didn't change, already used for both BuildInfo and File set embedding. This DaemonSourceGenerator.cmake file is meant to be the place to add more source generation code if the need surfaces.
  • DaemonFileEmbedder.cmake, previously named EmbedText.cmake, and is just that: it embeds an arbitrary file into a C++ file. All the plumbing to leverage it in an actual CMake build configuration is does in DaemonSourceGenerator.cmake.

So, to reword it, the reasons why DaemonSourceGenerator.cmake isn't named DaemonFileEmbedder.cmake are:

  • EmbedText.cmake is already named DaemonFileEmbedder.cmake.
  • DaemonSourceGenerator.cmake isn't only about embedding files, embedding files is just one thing it does among sources it generates, it already provides the BuildInfo system and the generic mechanism to rewrite an arbitrary to content to a file only if it changed, feature that is used by both embedding and BuildInfo but that is also a feature usable by itself if needed.
  • DaemonSourceGenerator.cmake is meant to be the place to add more source generation code we may need in the future.

The reason why I didn't rename EmbedText.cmake to DaemonFileEmbedder.cmake before was because I wanted to limit possible merge conflict with other branchs, but it looks like that just induced confusion so it's now properly renamed.

I want to remind that I have a more long term plan to gather such kind of generic and reusable CMake tooling as a single library (in a folder, I will not merge all cmake files into one):

@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch from 6978bf5 to d2a19bc Compare October 9, 2025 00:45
@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch 2 times, most recently from 45fab90 to c5da974 Compare October 9, 2025 02:51
@illwieckz illwieckz force-pushed the illwieckz/cmake-embed branch from c5da974 to 2bdccd1 Compare October 16, 2025 15:29
@illwieckz
Copy link
Member Author

This now uses file(GENERATE …) instead of daemon_write_generated).

That now looks ready to me.

@slipher
Copy link
Member

slipher commented Oct 19, 2025

Looking pretty good but I still want to ask that Str::StringRef or Str::StringView be used to provide a handle to the data instead of a thing with unsigned char *. It's an annoying interface to give people a string made of unsigned chars.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-Build T-Feature-Request Proposed new feature T-Improvement Improvement for an existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants