Code Generation in C++
The two main ways to generate code in c++ are templates and macros. While macros can manipulate text they lack the compile time information and constraints that the compiler provides. Templates on the other hand are incredibly powerful especially with c++20’s concepts, non type template parameters and increased constexpr support, however they still lack the ability to manipulate strings such as identifiers and keywords.
Metaclasses
A proposal by Herb Sutter to add metaclasses to the c++ language.
Metaclasses can be used to generate regular c++ classes and this proposal relies on three foundations:
Reflection - no standard support yet but interospection is possible with template metaprogramming.
Compile time code - supported by the constexpr keyword.
Injection - no support.
Metaclasses aren’t part of c++20 and might also not be available in the next versions of the language.
For more information I recommend watching the cppcon talk by Herb Sutter. (link)
Code Generation in Other Languages
Rust macros can generate code in a functional style or by modifying the abstract syntax tree.
Go templates can be used to manipulate strings, for example to generate a static site or code in other languages (even c++).
D mixins can inject a compile time computed string and compile it.
In Jai #run allows arbitrary compile time code execution.
Examples
I’ve found a different way to generate c++ and used it to implement the examples in the metaclasses talk and some of the features in other languages. Before we get into the implementation details lets take a look at some examples:
// generate interface module file
Interface<{"Shape"},
"import GenerateExamples;",
InterfaceFunc<void, "area", {"const"}>,
InterfaceFunc<double, "scale_by", { "const","noexcept" }, { Tw<double>, "factor" }>,
InterfaceFunc<Test::MyStruct, "GetMyStruct">
>();
// generate a class module file that behaves like an enum class
EnumClass<{"MyEnum"}, { "a" }, { "b", 1 }, { "c", 5 }, { "d" }>(Tw<long>);
// inject content into a file
Inject<{"Example.template.h"},{"MyClass.ixx"},
"/*GENERATE_EXPORT_HERE*/", "export module MyClass;",
"/*GENERATE_MY_CLASS_HERE*/", "class MyClass{};",
"/*GENERATE_MY_ARG_HERE*/", "export struct MyArg{int x;};">();
// open the calculator during compilation
System<"calc.exe">();
/*...*/
/*INSERT_POST_GENERATE_MACRO*/
/*...*/
export class C
{
public:
/*INSERT_FUNC*/
};
/*...*/
void Generate()
{
// inject strings when this call to Inject<...>() is being compiled
Inject<
{"PreGenerate.ixx"}, {"PostGenerate.ixx"},
"/*INSERT_POST_GENERATE_MACRO*/", "#define POST_GENERATE",
"/*INSERT_FUNC*/", "void Foo(){}">();
}
The examples and implementation can be found here: https://github.com/a10nw01f/Gen
Currently only supports visual studio with msvc.
Overview
- Use c++20’s compile time computation abilities to generate code as a constexpr string
- Make the compiler output the generated string
- Hook the compiler output
- Parse the string and perform arbitrary logic
- Write the generated code to a file which will be compiled
Constexpr String Generation
Constexpr string formatting in c++20 is easier than ever since std::string, std::vector and many other types can be used in a constexpr context. Before formatting we can validate the input by using concepts, static assert or by throwing an exception from a consteval function. Each of these methods has its own pros and cons in terms of integration and the information that the compiler provides when an error occurs.
Static Print
There is no static print in the standard so we have to make our own and this is how it can be done:
- Convert from std::string to std::array so it can be passed as a non type template parameter
- Create a template class which takes non type template parameters so that our generated string will be part of the signature
- Use FUNCSIG to get a string literal of our signature which contains the string
- Output this string literal with #prama message
export template<auto...>
struct StaticPrint
{
constexpr static void Print()
{
#pragma message(__FUNCSIG__)
}
};
Note: for gcc/clang we can generate a warning which will output the string instead of steps 4 and 5.
Hook
The compiler output is written into {project_name}.log file. By looking at the call stack in procmon.exe we can see that just before NtWriteFile and the kernel functions the call passes through the WriteFile function in KERNELBASE.dll which is where we will set our hook.
The hook is set by injecting Hook.dll into the devenv.exe process and when the dll is loaded into the process it sets up the hook by using the Detours library. (link)
After compiling Hook.dll and DllInjector.exe the hook can be set with a single click by running the inject_dll.bat file.
Extracting the Strings
As a convention only projects that ends with _GENERATE will invoke code generation. We can filter the calls to WriteFile by checking if the target file ends with _GENERATE.log and if the data written begins with the StaticPrint function signature and then extract the strings from the signature.
Compiling the Generated Code
Now that we have the generated code we want to pass it back to the compiler however we have a problem since we have no control over the file compilation order in a project. The easiest solution is to move all the code generation files into a separate static library project and link it to the original project.
Pros
- Has access to compile time information such as types and constants.
- Can be used to generate code in c++ or in other languages during compilation.
- Enables arbitrary compile time execution.
- Works with types and strings which can be used to represent identifiers, keywords, brackets etc.
- Can set restrictions and validate inputs.
- Can be extended to have multiple code generation steps (code which generates code which generates code…).
Cons
- Protability - might not be portable to all compilers and devlopment environments and porting it isn’t trivial.
- Requires setting up the hook and static library project.
- Can’t modify or generate files in the same project.
- Relies on undefined behaviour.
Conclusion
This technique enables compile time arbitrary execution and code generation. Unlike macros which are executed by the preprocessor in this technique we have the clarity, safety and type information of the compiler so we can import and call a c++ function instead of a macro. We can also pass and generate identifiers, keywords and other syntax elements which can’t be passed to a template. This comes at the cost of relying on undefined behaviour, additional setup and difficult compiler/development environment portability.