The venerable C preprocessor (cpp) – the part of the compilation process that interprets hash-prefixed directives such as #include and #define, and substitutes for the macros defined by the latter – is undoubtedly one of the backbones of the C/C++ ecosystem. However, its macro functionality suffers from known limitations: macros may not call themselves recursively, meaningfully work with mutable state or introduce syntax that does not obey the shape of either single keywords or function calls. This severely limits its utility for metaprogramming, necessitating the proliferation of idiosyncratic boilerplate-generation tools of high complexity but limited scope such as Yacc or Qt's moc.
This project is an attempt to create a Turing-complete general-purpose preprocessor for C and C++ that is powerful enough to subsume all of the above and more: indeed, we shall aim to be able to build on top of either C or C++ in the way the latter was originally built upon the former, while seamlessly blending in with existing code as the C preprocessor does.
We draw significant inspiration from Rust's macro system, which appears to be the most ambitious such effort this side of LISP, without binding ourselves to its sometimes curious choice of syntax or its demand of hygiene (this is C, after all!). Since all respectable programming language projects are self-hosting and this is not one of them respectable programming language projects, the preprocessor itself is written in Haskell. The usual warnings about alpha-quality software apply, and all syntax and semantics is subject to change.
struct {
int value;
LinkedList *next;
} LinkedList;
// define recursive macro to create a linked list
@define MakeList {
( {@^[,]$head, @^$tail} ) => (
new LinkedList( {$head, MakeList {$tail}} )
)
( {@^[,]$singleton} ) => (
new LinkedList( {$singleton, NULL} )
)
( {} ) => ( NULL )
}
// create a linked list with 5 elements
LinkedList *l = MakeList {1,2,3,4,5};Make sure you have a recent version of ghc, mtl>=2.2.1 and parsec installed. The appropriate packages exist in Debian testing repositories as ghc, libghc-mtl-dev and libghc-parsec3-dev respectively; building under Windows or other unixoid OS families is currently untested. Run make in the root directory of the repository. This generates an executable file named macros. See INSTALL.windows.md for Windows installation instructions.
Running
./macros <input file>
will emit the processed output to stdout. If any errors are encountered, they will be printed to stderr.
For convenience, a wrapper script with-macro is included.
Running
./with-macro g++ <input file> -O2 -Dotheroptions
(with the input filename being in the second position, i.e. immediately following the compiler executable!) is equivalent to running
g++ -O2 -Dotheroptions on the output of the preprocessor on <input file>.
If you don't have access to a system package manager or it does not provide the necessary GHC packages, you can instead choose to use Haskell Stack. This may be the easiest way to build the project on non-Linux systems. After potentially installing Stack and running stack build, you should be able to execute the preprocessor by running
stack exec macros <input file>
which will preprocess the file and emit output to stdout as above.
Use the additional flag -n after the input file, which suppresses emission of source-line hints (which greatly help with debugging, but get in the way of comprehension), and pipe through clang-format, like so:
./macros <input file> -n | clang-format
The fundamental principle of macros is keyword-triggered pattern-matching and substitution on token streams. A typical macro definition takes the following form:
@define macroname {
( pattern one ) => ( printf("first pattern encountered") )
( pattern two ) => ( printf("second pattern encountered") )
( pattern three) => ( printf("third pattern encountered") )
// ...
}This defines a macro that is triggered by encountering the keyword macroname anywhere in the program text following the definition. If the keyword is encountered, the macro processor will proceed to try and match the token stream following it against the patterns – the token streams inside the left-hand side parentheses of a pair of the form (stream) => (stream) in the definition body in turn. For the first left-hand pattern that fully matches the token stream following the keyword, the corresponding right-hand stream will be substituted in for the keyword and the matching tokens. If no patterns match, an error is thrown.
So for instance,
macroname pattern two;
macroname pattern one two three four;
/* will become:
* printf("second pattern encountered");
* printf("first pattern encountered") two three four; */The pattern of a macro can also capture tokens in compile-time variables, which are recognisable by the sigil $. The results of such a capture can be reused on the right-hand side. In the simplest case, we capture a single token:
@define greet {
( $name ) => ( printf("Hello %s!", $name) )
}
greet "world";
Multiple tokens can be captured in one variable by using the modifier @^, which is followed by any number of stop patterns [token-stream]. If any stop pattern is encountered or the token stream ends, capture terminates.
@define bracketless_printf {
( $pattern, @^[;]$args ) => (
printf($pattern, $args)
)
}
bracketless_printf "%d %d",1,2;
Macros and pattern matching
@define keyword { ( pattern ) => ( outcome ) ... }: Defines a new macrokeyword. The token stream afterkeywordwill be matched in turn against eachpattern. If a pattern is matched against successfully, a new stack frame is instantiated with all variables captured by the pattern, the correspondingoutcomeis processed in the new stack frame and emplaced instead ofkeywordplus the matching token stream.@undef keyword: Undefines the macrokeyword.@match ( token-stream ) { ( pattern ) => ( outcome ) ... }: Processestoken-stream, and attempts to match the result against eachpatternin turn, analogously to an@defined macro.
Variable capture
The following are only valid in patterns.
$var: Matches any single token. Stores the result in$var.@#$var: Matches any name, i.e. C/C++ identifier.@^[stop-token-stream-1]...[stop-token-stream-n]$var: Matches an arbitrarily long token stream until any one of thestop-token-streams is encountered, or the token stream ends (via end-of-file or a closing parenthesis). The form@^$varis permissible. Eachstop-token-streammust be shallow: behaviour is unspecified if it contains any brackets with nonempty contents. Since all token streams must be bracket-matched, use@^[{}]$varto stop when a block of curly braces is encountered.- Example:
@define test { ( @^[a][b][{}]$v ) => () } test c d a b { e }; // -> a b { e }; test c d b a { e }; // -> b a { e }; test { e }; // -> { e };
- Example:
@*[separator-token-stream]( pattern ): Matches zero or more ofpattern, optionally separated byseparator-token-stream. (The parameter, along with its[], may be omitted.) Any variables captured inpatternwill be put into a list.- So for instance,
@*[,]( @^[,]$param )will capture a comma-separated list of token streams, and put them in the list$param. Given the inputhello world, goodbye world, 1+2, this will generate avalue-specanalogous to@[ (hello world), (goodbye world), (1+2) ].
- So for instance,
@+[separator-token-stream]( pattern ): As above, but matches one or more.
Variables
$varname: Evaluates to the currently visible instance of$varname. The visible instance is either the topmost (most recent) definition of$varnameon the stack,@set $varname value-spec: Processes thevalue-spec. Then sets the value of the currently visible instance of$varnameto the result.- A
value-specis either a singleton(token-stream), or a list@[ value-spec, ..., value-spec ].
- A
@var $varname value-spec: Defines a variable$varnamein the local stack frame, processes thevalue-specand sets the value of$varnameto the result.@global $varname value-spec: Defines a global variable$varname, processes thevalue-specand sets the value of$varnameto the result.@push_back $varname value-spec: Processes thevalue-spec. Then appends the result to the currently visible instance$varname. If this instance is not a list, an error is thrown.@for[ separator-token-steam ]( bind-spec )( body-token-stream ), where[ separator-token-stream ]is optional: iterates over lists according tobind-spec. For each entry, instantiates a new stack frame with the bound variables, processesbody-token-streamin the frame and emits the result. Ifseparator-token-streamis nonempty, it is processed and emitted between each twobody-token-streams.bind-spectakes the form$v1,...,$vn : $l1,...,$ln, where the$l1,...,$lnare lists of identical length.- Example usage:
@var $listone @[ (1), (2), (3) ] @var $listtwo @[ (a), (b), (c) ] @for[,]( $a,$b : $listone,$listtwo )( $a @@ $b ) // outputs: 1a, 2b, 3c
Token stream operations
@@: Concatenates the preceding and following token:a @@ bevaluates toab. Compare##incpp.@!(token-stream)/@![token-stream]/@!{token-stream}/@!single-token: Emitstoken-streamorsingle-tokenwithout processing.@eval (token-stream)/@eval [token-stream]/@eval {token-stream}/@eval single-token: Processestoken-streamorsingle-tokentwice and emits the result, so e.g. a variable containing an unprocessed token stream is evaluated.@unquote "any string": Emitsany string.@quote (token-stream): Emits"token-stream"as a string literal.@calc (token-stream): Processestoken-stream, then attempts to evaluate it as an integer arithmetic expression. Substitutes in the result. Supported operators are+,-,*and+. Throws an error if the parameter is not an integer arithmetic expression.
File management
@include "filename": Includes and processes the filefilename, relative to the current file's directory.