diff --git a/README.md b/README.md
index baaf64bbf..b577f7a39 100644
--- a/README.md
+++ b/README.md
@@ -40,7 +40,7 @@
 <!-- intro -->
 
 > [!NOTE]
-> QuEST `v4` has been released which re-designed QuEST from the ground up. Read about the exciting new features [here](/docs/v4.md).
+> QuEST `v4` has been released which re-designed QuEST from the ground up. Read about the exciting new features [here](docs/v4.md).
 
 The **Quantum Exact Simulation Toolkit** (QuEST) is a high-performance simulator of quantum statevectors and density matrices.
 It hybridises **multithreading**, **GPU acceleration** and **distribution** to run lightning fast on laptops, desktops and 
@@ -83,19 +83,16 @@ In particular, QuEST `v4` was made possible through the support of the UK Nation
 
 </div>
 
-To learn more:
 
+<!-- <a> used below for doxygen compatibility -->
+
+To learn more:
+- view the <a href="#main_documentation">documentation</a>
 - visit the [website](https://quest.qtechtheory.org/)
-- see some [examples](/examples/)
-- view the [documentation](#documentation)
-- browse the [API](https://quest-kit.github.io/QuEST/group__api.html)
 - read the [whitepaper](https://www.nature.com/articles/s41598-019-47174-9), which featured in Scientific Report's [Top 100 in Physics](https://www.nature.com/collections/ecehgdfcba/) :trophy:
 
-<div align="center">
 
 
-</div>
-
 ---------------------------------
 
 
@@ -105,7 +102,7 @@ To learn more:
 ## 🎉  Introduction
 
 QuEST has a simple interface which is agnostic to whether it's running on CPUs, GPUs or a networked supercomputer.
-```C++
+```cpp
 Qureg qureg = createQureg(30);
 initRandomPureState(qureg);
 
@@ -119,7 +116,7 @@ qreal prob  = calcProbOfQubitOutcome(qureg, 0, outcome);
 qreal expec = calcExpecPauliStr(qureg, getPauliStr("XYZ"));
 ```
 Yet, it is flexible
-```C++
+```cpp
 mixDepolarising(qureg, targ, prob);
 mixKrausMap(qureg, targs, ntargs, krausmap);
 
@@ -133,7 +130,7 @@ multiplyCompMatr1(qureg, targ, getInlineCompMatr1( {{1,2i},{3i,4}} ));
 multiplyDiagMatrPower(qureg, targs, ntargs, diagmatr, exponent);
 ```
 and extremely powerful
-```C++
+```cpp
 setFullStateDiagMatrFromMultiVarFunc(fullmatr, myfunc, ntargsPerVar, nvars);
 applyFullStateDiagMatrPower(qureg, fullmatr, exponent);
 
@@ -181,19 +178,15 @@ QuEST supports:
 
 ---------------------------------
 
+<!-- permit doxygen to reference section -->
+<a id="main_documentation"></a>
+
 ## 📖  Documentation
 
 > [!IMPORTANT]
 > QuEST v4's documentation is still under construction!
 
-Visit the [docs](docs/) to:
-  - [see what's new in v4](docs/v4.md)
-  - [compile with cmake](docs/compile.md)
-  - [find compatible compilers](docs/compilers.md)
-  - [launch your simulations](docs/run.md)
-  - [view some examples](examples/)
-
-The [API](https://quest-kit.github.io/QuEST/group__api.html) documentation is divided into the following groups:
+Visit the [docs](docs/README.md) for guides and tutorials, or the [API](https://quest-kit.github.io/QuEST/group__api.html) which is divided into:
   - [calculations](https://quest-kit.github.io/QuEST/group__calculations.html)
   - [channels](https://quest-kit.github.io/QuEST/group__channels.html)
   - [debug](https://quest-kit.github.io/QuEST/group__debug.html)
@@ -241,10 +234,6 @@ You can also browse QuEST's extensive [tests](https://quest-kit.github.io/QuEST/
   - [deprecated test utilities](https://quest-kit.github.io/QuEST/group__deprecatedutils.html)
 -->
 
-Contributers to QuEST should also check out the:
-  - [software architecture](docs/architecture.md)
-  - [style guide](docs/styleguide.md)
-
 ---------------------------------
 
 ## 🚀  Getting started 
@@ -270,7 +259,7 @@ then run it with
 ./min_example
 ```
 
-See the [docs](docs/) for enabling acceleration and running the unit tests.
+See the [docs](docs/README.md) for enabling acceleration and running the unit tests.
 
 ---------------------------------
 
diff --git a/docs/README.md b/docs/README.md
index 4b7c7da00..4bf0ea018 100644
--- a/docs/README.md
+++ b/docs/README.md
@@ -1,17 +1,53 @@
-> TODO
+# 📖  Documentation
 
-> not sure where to put below (snippet from DiRAC)
+<!--
+  Doc overview
+  (this comment must be under the title for valid doxygen rendering)
 
-gpu_launch.sh wrapper script
-The gpu_launch.sh wrapper script is required to set the correct binding of GPU to MPI processes and the correct binding of interconnect interfaces to MPI process and GPU. We provide this centrally for convenience but its contents are simple:
+  @author Tyson Jones
+-->
 
-```
-#!/bin/bash
+<!-- @todo remove this when doc done -->
+> [!IMPORTANT]  
+> QuEST's `v4` documentation is still under construction.
+
+QuEST has been overhauled! See
+
+- 🎉  [`v4.md`](v4.md) for the exciting new features.
+
+To get started with QuEST, check out
+
+- 🔧  [`compilers.md`](compilers.md) for a list of compatible compilers.
+- 🔗  [`qtechtheory.org`](https://quest.qtechtheory.org/download/) for some help downloading compilers.
+- 🛠️  [`compile.md`](compile.md) for instructions on compiling.
+- ⚙️  [`cmake.md`](cmake.md) for a list of compiler variables.
+- 🚀  [`launch.md`](launch.md) to learn how to run QuEST on laptops to supercomputers.
+- 🎓  [`tutorial.md`](tutorial.md) for an introductory tutorial.
+- 📋  [API](https://quest-kit.github.io/QuEST/group__api.html) for the documentation of each function.
 
-# Compute the raw process ID for binding to GPU and NIC
-lrank=$((SLURM_PROCID % SLURM_NTASKS_PER_NODE))
+Interested in contributing? Then check out:
 
-# Bind the process to the correct GPU and NIC
-export CUDA_VISIBLE_DEVICES=${lrank}
-export UCX_NET_DEVICES=mlx5_${lrank}:1
+- ❤️  [`contributing.md`](contributing.md) to learn how to make a pull request.
+- 🏗️  [`architecture.md`](architecture.md) to understand the code structure.
+- 🎨  [`styleguide.md`](styleguide.md) for some tips on writing neat code.
+
+Want to learn how what's under the hood? Read the
+- 🏆  [whitepaper](https://www.nature.com/articles/s41598-019-47174-9) which featured in Scientific Report's [Top 100 in Physics](https://www.nature.com/collections/ecehgdfcba/)
+- 📝  [preprint](https://arxiv.org/abs/2311.01512) which derives `v4`'s optimised algorithms.
+- 🧪  [tests](/tests) which compare QuEST's outputs to non-optimised calculations.
+- 📈  [benchmarks](https://www.youtube.com/watch?v=dQw4w9WgXcQ) which are coming soon!
+
+
+If QuEST is useful to you, feel free to cite
+```
+@article{jones2019quest,
+  title={QuEST and high performance simulation of quantum computers},
+  author={Jones, Tyson and Brown, Anna and Bush, Ian and Benjamin, Simon C},
+  journal={Scientific reports},
+  volume={9},
+  number={1},
+  pages={10736},
+  year={2019},
+  publisher={Nature Publishing Group UK London}
+}
 ```
\ No newline at end of file
diff --git a/docs/architecture.md b/docs/architecture.md
index 6358a4616..1ad8f3a40 100644
--- a/docs/architecture.md
+++ b/docs/architecture.md
@@ -1,11 +1,12 @@
+# 🏗️  Architecture
+
 <!--
   Explanation of QuEST's software architecture
+  (this comment must be under the title for valid doxygen rendering)
   
   @author Tyson Jones
 -->
 
-# Architecture
-
 All user-visible API signatures are contained in `include/`, divided into semantic submodules (like `calculations.h` and `qureg.h`), but all exposed by `quest.h`. They are all strictly `C` _and_ `C++` compatible, hence their `.h` file extension.
 
 The source code within `src/` is divided between five subdirectories, listed below in order of increasing control flow depth. All code is parsed strictly by `C++`, hence all files have `.cpp` and `.hpp` extensions.
diff --git a/docs/cmake.md b/docs/cmake.md
index bc3d539c9..e0594f202 100644
--- a/docs/cmake.md
+++ b/docs/cmake.md
@@ -1,13 +1,13 @@
+# ⚙️  CMake
+
 <!--
   Instructions for compiling QuEST with CMake
+  (this comment must be under the title for valid doxygen rendering)
 
   @author Oliver Thomson Brown
   @author Tyson Jones (test variables)
 -->
 
-
-# CMake Configuration Options in QuEST
-
 Version 4 of QuEST includes reworked CMake to support library builds, CMake export, and installation. Here we detail useful variables to configure the compilation of QuEST. If using a Unix-like operating system any of these variables can be set using the `-D` flag when invoking CMake, for example:
 
 ```
diff --git a/docs/compile.md b/docs/compile.md
index 9921986ab..b4a26a021 100644
--- a/docs/compile.md
+++ b/docs/compile.md
@@ -1,5 +1,8 @@
+# 🛠️  Compile
+
 <!--
   Instructions for compiling QuEST with CMake
+  (this comment must be under the title for valid doxygen rendering)
 
   @author Tyson Jones
 
@@ -11,36 +14,42 @@
     use-cases before progressively more visually complicated examples
 -->
 
-# Compile
-
 QuEST can be compiled with [CMake](https://cmake.org/) to make a standalone executable, or an exported library, or a library installed on the system. 
 Compiling is configured with variables supplied by the [`-D` flag](https://cmake.org/cmake/help/latest/command/add_definitions.html) to the [CMake CLI](https://cmake.org/cmake/help/latest/guide/user-interaction/index.html#command-line-cmake-tool). This page details _how_ to compile QuEST for varying purposes and hardwares.
 
-**TOC**:
-- [Basic](#basic)
-- [Optimising](#optimising)
-- [Linking](#linking)
-- [Configuring](#configuring)
-   * [Precision](#precision)
-   * [Compilers](#compilers)
-   * [Flags](#flags)
-- [Examples](#examples)
-- [Tests](#tests)
-   * [v4](#v4)
-   * [v3](#v3)
-- [Multithreading](#multithreading)
-- [GPU-acceleration](#gpu-acceleration)
-   * [NVIDIA](#nvidia)
-   * [AMD](#amd)
-- [cuQuantum](#cuquantum)
-- [Distribution](#distribution)
-- [Multi-GPU](#multi-gpu)
+
+<!-- 
+    we are using explicit <a>, rather than markdown links,
+    for Doxygen compatibility. It cannot handle [](#sec)
+    links, and its <a> anchors are not scoped to files, so
+    we here prefix each name with the filename. Grr!
+-->
+
+> **TOC**:
+> - <a href="#compile_basic">Basic</a>
+> - <a href="#compile_optimising">Optimising</a>
+> - <a href="#compile_linking">Linking</a>
+> - <a href="#compile_configuring">Configuring</a>
+>    * <a href="#compile_precision">Precision</a>
+>    * <a href="#compile_compilers">Compilers</a>
+>    * <a href="#compile_flags">Flags</a>
+> - <a href="#compile_examples">Examples</a>
+> - <a href="#compile_tests">Tests</a>
+>    * <a href="#compile_v4">v4</a>
+>    * <a href="#compile_v3">v3</a>
+> - <a href="#compile_multithreading">Multithreading</a>
+> - <a href="#compile_gpu-acceleration">GPU-acceleration</a>
+>    * <a href="#compile_nvidia">NVIDIA</a>
+>    * <a href="#compile_amd">AMD</a>
+> - <a href="#compile_cuquantum">cuQuantum</a>
+> - <a href="#compile_distribution">Distribution</a>
+> - <a href="#compile_multi-gpu">Multi-GPU</a>
 
 > **See also**:
 > - [`cmake.md`](cmake.md) for the full list of passable compiler variables.
 > - [`compilers.md`](compilers.md) for a list of compatible and necessary compilers.
 > - [`qtechtheory.org`](https://quest.qtechtheory.org/download/) for help downloading the necessary compilers.
-> - [`run.md`](run.md) for a guide to executing the compiled application.
+> - [`launch.md`](launch.md) for a guide to executing the compiled application.
 
 > [!TIP]
 > QuEST's [Github Actions](https://github.com/QuEST-Kit/QuEST/actions/workflows/compile.yml) regularly test QuEST compilation using a broad combination of deployment settings; presently `108` combinations! The [`compile.yml`](/.github/workflows/compile.yml) workflow can serve as a concrete example of how to compile QuEST in a sanitised, virtual setting.
@@ -48,6 +57,10 @@ Compiling is configured with variables supplied by the [`-D` flag](https://cmake
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_basic"></a>
+
 ## Basic
 
 Compilation is a two-step process which can generate lots of temporary files and so should be performed in a `build/` folder to avoid clutter. From the `QuEST/` root, run (in terminal):
@@ -95,6 +108,10 @@ How _boring_! We must pass additional arguments in order to link QuEST to our ow
 ------------------
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_optimising"></a>
+
 ## Optimising
 
 QuEST's source code is careful to enable a myriad of optimisations such as [inlining](https://en.wikipedia.org/wiki/Inline_expansion), [loop unrolling](https://en.wikipedia.org/wiki/Loop_unrolling), [auto-vectorisation](https://en.wikipedia.org/wiki/Automatic_vectorization) and [cache optimisations](https://en.wikipedia.org/wiki/Cache_replacement_policies). To utilise them fully, we must instruct our compilers to enable them; like we might do with the [`-O3`](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html) flag when invoking a compiler like `gcc` directly.
@@ -106,7 +123,7 @@ On most platforms (with the exception of Windows), this is automatic with the co
 cmake .. -D CMAKE_BUILD_TYPE=Release
 ```
 
-When compiling on **Windows** however (using [Visual Studio](https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#visual-studio-generators)), or otherwise using a "[_multi-config generator_](https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#other-generators)", we must always supply the build type at **_build time_** via [`config`](https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-build-config):
+When compiling on **Windows** however (using [Visual Studio](https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#visual-studio-generators)), or otherwise using a "[_multi-config generator_](https://cmake.org/cmake/help/latest/manual/cmake-generators.7.html#other-generators)", we must always supply the build type at _build time_ via [`config`](https://cmake.org/cmake/help/latest/manual/cmake.1.html#cmdoption-cmake-build-config):
 ```bash
 # build
 cmake --build . --config Release
@@ -141,10 +158,14 @@ Read more about CMake generator configurations [here](https://cmake.org/cmake/he
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_linking"></a>
+
 ## Linking
 
 QuEST can be pre-compiled and later linked to other binaries, _or_ compiled directly alongside the user's source code. 
-We focus on the latter use-case, common among scientists when writing simulation scripts. Users seeking to integrate QuEST into larger stacks are likely already familiar with linking libraries through CMake and should check out [`cmake.md`](/docs/cmake.md) directly.
+We focus on the latter use-case, common among scientists when writing simulation scripts. Users seeking to integrate QuEST into larger stacks are likely already familiar with linking libraries through CMake and should check out [`cmake.md`](cmake.md) directly.
 
 To compile a `C` or `C++` file such as
 ```C
@@ -168,7 +189,7 @@ where
 - `myexec` is the output executable name, which will be saved in `build`.
 
 To compile multiple dependent files, such as
-```C++
+```cpp
 /* myfile.cpp */
 
 #include "quest.h"
@@ -182,7 +203,7 @@ int main() {
     return 0;
 }
 ```
-```C++
+```cpp
 /* otherfile.cpp */
 
 #include <stdio.h>
@@ -191,7 +212,7 @@ void myfunc() {
     printf("hello quworld!\n");
 }
 ```
-simply separate them by `;` in `USER_SOURCE`, wrapped in `"`:
+simply separate them by `;` in `USER_SOURCE`, wrapped in quotations:
 ```bash
 # configure
 cmake .. -D USER_SOURCE="myfile.cpp;otherfile.cpp" -D OUTPUT_EXE=myexec
@@ -208,16 +229,20 @@ and the executable can thereafter be run (from within `build`) via
 ./myexec
 ```
 
-You can pass compiler and linker flags needed by your source files through the [`CMAKE_C_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html), [`CMAKE_CXX_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html) and [`CMAKE_EXE_LINKER_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_EXE_LINKER_FLAGS.html) CMake flags as detailed in the [below section](#flags). Note however that if your configuration becomes complicated or your source code requires different `C`/`C++` standards than the QuEST source, you should consider separately compiling QuEST then linking it
+You can pass compiler and linker flags needed by your source files through the [`CMAKE_C_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html), [`CMAKE_CXX_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_LANG_FLAGS.html) and [`CMAKE_EXE_LINKER_FLAGS`](https://cmake.org/cmake/help/latest/variable/CMAKE_EXE_LINKER_FLAGS.html) CMake flags as detailed in the <a href="#compile_flags">below section</a>. Note however that if your configuration becomes complicated or your source code requires different `C`/`C++` standards than the QuEST source, you should consider separately compiling QuEST then linking it
 to your project as a library!
 
+------------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="compile_configuring"></a>
 
-------------------
+## Configuring
 
 
-## Configuring
+<!-- permit doxygen to reference section -->
+<a id="compile_precision"></a>
 
 ### Precision
 
@@ -244,8 +269,10 @@ The values inform types:
 
 
 > [!NOTE]
-> When enabling [GPU-acceleration](#gpu-acceleration), the precision _must_ be set to `1` or `2` since GPUs do not support quad precision.
+> When enabling <a href="#compile_gpu-acceleration">GPU-acceleration</a>, the precision _must_ be set to `1` or `2` since GPUs do not support quad precision.
 
+<!-- permit doxygen to reference section -->
+<a id="compile_compilers"></a>
 
 ### Compilers
 
@@ -259,8 +286,13 @@ replacing `gcc` and `g++` with e.g. [`clang`](https://clang.llvm.org/), [`cl`](h
 These compilers will also be used as the _host compilers_ (around which bespoke compilers _wrap_) when enabling GPU-acceleration or distribution.
 
 > [!IMPORTANT]
-> It is _not_ correct to specify GPU and MPI compilers, like `nvcc` or `mpicc`, via the above flags. See the respective [GPU](#gpu-acceleration) and [MPI](#distribution) sections.
+> It is _not_ correct to specify GPU and MPI compilers, like `nvcc` or `mpicc`, via the above flags. See the respective <a href="#compile_gpu-acceleration">GPU</a> and <a href="#compile_distribution">MPI</a> sections.
+
+
+
 
+<!-- permit doxygen to reference section -->
+<a id="compile_flags"></a>
 
 ### Flags
 
@@ -289,6 +321,10 @@ QuEST itself accepts a variety of its preprocessors (mostly related to testing)
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_examples"></a>
+
 ## Examples
 
 To compile all of QuEST's [`examples/`](/examples/), use
@@ -303,13 +339,23 @@ The executables will be saved in the (current) `build` directory, in a sub-direc
 ```bash
 ./examples/matrices/cpp_initialisation
 ```
-as elaborated upon in [`run.md`](run.md#tests).
+as elaborated upon in [`launch.md`](launch.md#tests).
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
 
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_tests"></a>
+
 ## Tests
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_v4"></a>
+
 ### v4
 
 To compile QuEST's latest unit and integration tests, use
@@ -321,7 +367,13 @@ cmake .. -D ENABLE_TESTING=ON
 # build
 cmake --build .
 ```
-This will compile an executable `tests` in subdirectory `build/tests/`, which can be run as explained in [`run.md`](run.md#tests).
+This will compile an executable `tests` in subdirectory `build/tests/`, which can be run as explained in [`launch.md`](launch.md#tests).
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
+
+
+<!-- permit doxygen to reference section -->
+<a id="compile_v3"></a>
 
 ### v3
 
@@ -333,17 +385,24 @@ cmake .. -D ENABLE_TESTING=ON -D ENABLE_DEPRECATED_API=ON
 # build
 cmake --build .
 ```
-and run as explained in [`run.md`](run.md#v3).
+and run as explained in [`launch.md`](launch.md#v3).
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
 
 
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_multithreading"></a>
+
 ## Multithreading
 
-Multithreading allows multiple cores of a CPU, or even multiple connected CPUs, to cooperatively perform and ergo accelerate QuEST's expensive functions. Practically all modern computers have the capacity for, and benefit from, multithreading. Note it requires that the CPUs have shared memory (such as through [NUMA](https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support)) and so ergo live in the same machine. CPUs on _different_ machines, connected via a network, can be parallelised over using [distribution](#distribution).
+Multithreading allows multiple cores of a CPU, or even multiple connected CPUs, to cooperatively perform and ergo accelerate QuEST's expensive functions. Practically all modern computers have the capacity for, and benefit from, multithreading. Note it requires that the CPUs have shared memory (such as through [NUMA](https://learn.microsoft.com/en-us/windows/win32/procthread/numa-support)) and so ergo live in the same machine. CPUs on _different_ machines, connected via a network, can be parallelised over using <a href="#compile_distribution">distribution</a>.
 
 QuEST uses [OpenMP](https://www.openmp.org/) to perform multithreading, so accelerating QuEST over multiple CPUs or cores requires a compiler integrated with OpenMP. This is true of almost all major compilers - see a list of tested compilers in [`compilers.md`](compilers.md#cpu).
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
 
 > [!IMPORTANT]  
 > Using [`Clang`](https://clang.llvm.org/) on MacOS requires use of the `libomp` library, obtainable via [Homebrew](https://brew.sh/):
@@ -365,13 +424,17 @@ cmake --build .
 ```
 This is in fact the default behaviour!
 
-The number of threads over which to parallelise QuEST's execution is chosen through setting environment variables, like [`OMP_NUM_THREADS`](https://www.openmp.org/spec-html/5.0/openmpse50.html), immediately before execution. See [`run.md`](run.md#multithreading) for a general guide on multithreaded deployment.
+The number of threads over which to parallelise QuEST's execution is chosen through setting environment variables, like [`OMP_NUM_THREADS`](https://www.openmp.org/spec-html/5.0/openmpse50.html), immediately before execution. See [`launch.md`](launch.md#multithreading) for a general guide on multithreaded deployment.
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
 
 
 
 ------------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="compile_gpu-acceleration"></a>
+
 ## GPU-acceleration
 
 QuEST's core functions perform simple mathematical transformations on very large arrays, and are ergo well suited to parallelisation using general purpose GPUs. This involves creating persistent memory in the GPU VRAM which mirrors the ordinary CPU memory in RAM, and dispatching the transformations to the GPU, updating the GPU memory. The greater number of cores and massive internal memory bandwidth of the GPU can make this extraordinarily faster than using multithreading. 
@@ -379,6 +442,10 @@ QuEST's core functions perform simple mathematical transformations on very large
 QuEST supports parallelisation using both NVIDIA GPUs (using CUDA) and AMD GPUs (using HIP). Using either requires obtaining a specialised compiler and passing some GPU-specific compiler flags.
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_nvidia"></a>
+
 ### NVIDIA
 
 > TODO!
@@ -404,8 +471,11 @@ For example, compiling for the [NVIDIA A100](https://www.nvidia.com/en-us/data-c
 cmake .. -D ENABLE_CUDA=ON -D CMAKE_CUDA_ARCHITECTURES=80
 ```
 
+
+<!-- the below link fails in Doxygen - it's too stupid to recognise the section ref -->
 > [!CAUTION]
-> Setting the wrong compute capability will cause silently erroneous results. Always run the [unit tests](run.md#tests) after compiling for the first time to confirm it was set correctly.
+> Setting the wrong compute capability will cause silently erroneous results. Always run the [unit tests](launch.md#tests) after compiling for the first time to confirm it was set correctly.
+
 
 Building then proceeds as normal, e.g.
 ```bash
@@ -413,9 +483,15 @@ Building then proceeds as normal, e.g.
 cmake --build . --parallel
 ```
 
-See [`run.md`](run.md#gpu-acceleration) for information on 
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
+The compiled executable can be run like any other, though the GPU behaviour can be prior configured with environment variables. See [`launch.md`](launch.md#gpu-acceleration) for a general guide on GPU-accelerated deployment.
+
+
 
 
+<!-- permit doxygen to reference section -->
+<a id="compile_amd"></a>
+
 ### AMD
 
 > TODO!
@@ -439,8 +515,10 @@ For example, compiling for the [AMD Instinct MI210 accelerator](https://www.amd.
 cmake .. -D ENABLE_HIP=ON -D CMAKE_HIP_ARCHITECTURES=gfx90a
 ```
 
+
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
 > [!CAUTION]
-> Setting the wrong LLVM target name can cause silently erroneous results. Always run the [unit tests](run.md#tests) after compiling for the first time to confirm it was set correctly.
+> Setting the wrong LLVM target name can cause silently erroneous results. Always run the [unit tests](launch.md#tests) after compiling for the first time to confirm it was set correctly.
 
 
 Building then proceeds as normal, e.g.
@@ -449,12 +527,16 @@ Building then proceeds as normal, e.g.
 cmake --build . --parallel
 ```
 
-The compiled executable can be run like any other, though the GPU behaviour can be prior configured with environment variables. See [`run.md`](run.md#gpu-acceleration) for a general guide on GPU-accelerated deployment.
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
+The compiled executable can be run like any other, though the GPU behaviour can be prior configured with environment variables. See [`launch.md`](launch.md#gpu-acceleration) for a general guide on GPU-accelerated deployment.
 
 
 
 ------------------
 
+<!-- permit doxygen to reference section -->
+<a id="compile_cuquantum"></a>
+
 ## cuQuantum
 
 When compiling for NVIDIA GPUs, you can choose to optionally enable [_cuQuantum_](https://docs.nvidia.com/cuda/cuquantum/latest/index.html). This will replace some of QuEST's custom GPU functions with [_cuStateVec_](https://docs.nvidia.com/cuda/cuquantum/latest/custatevec/index.html) routines which are likely to use tailored optimisations for your particular GPU and ergo run faster.
@@ -483,17 +565,24 @@ cmake .. -D ENABLE_CUDA=ON -D CMAKE_CUDA_ARCHITECTURES=80 -D ENABLE_CUQUANTUM=ON
 cmake --build . --parallel
 ```
 
-No other changes are necessary, nor does cuQuantum affect [hybridising](#multi-gpu) GPU acceleration and distribution. Launching the executable is the same as in the above section. See [`run.md`](run.md#gpu-acceleration).
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
+No other changes are necessary, nor does cuQuantum affect <a href="#compile_multi-gpu">hybridising</a> GPU acceleration and distribution. Launching the executable is the same as in the above section. See [`launch.md`](launch.md#gpu-acceleration).
+
 
 
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_distribution"></a>
+
 ## Distribution
 
 Because statevectors grow exponentially with the number of simulated qubits, it is easy to run out of memory. In such settings, we may seek to use _distribution_ whereby multiple cooperating machines on a network each store a tractable partition of the state. Distribution can also be useful to speed up our simulations, when the benefit of additional parallelisation outweighs the inter-machine communication penalties.
 
 
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
 Enabling distribution requires compiling QuEST with an MPI compiler, such as those listed in [`compilers.md`](compilers.md#comm). Test your compiler is working via
 ```bash
 mpicxx --version
@@ -511,12 +600,17 @@ cmake .. -D ENABLE_DISTRIBUTION=ON
 cmake --build . --parallel
 ```
 
-Note that distributed executables are launched in a distinct way to the other deployment mods, as explained in [`run.md`](run.md#distribution),
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
+Note that distributed executables are launched in a distinct way to the other deployment mods, as explained in [`launch.md`](launch.md#distribution),
 
 
 
 ------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compile_multi-gpu"></a>
+
 ## Multi-GPU
 
 > TODO!
diff --git a/docs/compilers.md b/docs/compilers.md
index f18bc83e2..6c4f44303 100644
--- a/docs/compilers.md
+++ b/docs/compilers.md
@@ -1,25 +1,34 @@
+# 🔧  Compilers
+
 <!--
   A summary of necessary compilers to use QuEST's
   various backend parallelisation deployments
+  (this comment must be under the title for valid doxygen rendering)
   
   @author Tyson Jones
 -->
 
-# Compilers
-
-QuEST separates compilation of the **_frontend_**, **_backend_** and the **_tests_**, which have progressively stricter compiler requirements.
+QuEST separates compilation of the _frontend_, _backend_ and the _tests_, which have progressively stricter compiler requirements.
 This page details the specialised compilers necessary to enable specific features hardware accelerators, and lists such compilers which are
 known to be compatible with QuEST.
 
-**TOC**:
-- [Frontend](#frontend)
-- [Backend](#backend)
-   * [comm](#comm)
-   * [cpu](#cpu)
-   * [gpu](#gpu)
-   * [comm + gpu](#comm-gpu)
-   * [gpu + cuquantum](#gpu-cuquantum)
-- [Tests](#tests)
+
+<!-- 
+    we are using explicit <a>, rather than markdown links,
+    for Doxygen compatibility. It cannot handle [](#sec)
+    links, and its <a> anchors are not scoped to files, so
+    we here prefix each name with the filename. Grr!
+-->
+
+> **TOC**:
+> - <a href="#compilers_frontend">Frontend</a>
+> - <a href="#compilers_backend">Backend</a>
+>    * <a href="#compilers_comm">Comm</a>
+>    * <a href="#compilers_cpu">cpu</a>
+>    * <a href="#compilers_gpu">gpu</a>
+>    * <a href="#compilers_comm-gpu">comm + gpu</a>
+>    * <a href="#compilers_gpu-cuquantum">gpu + cuquantum</a>
+> - <a href="#compilers_tests">Tests</a>
 
 > **See also**:
 > - [`compile.md`](compile.md) for a guide to compiling QuEST.
@@ -34,6 +43,9 @@ known to be compatible with QuEST.
 ---------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="compilers_frontend"></a>
+
 ## Frontend
 
 [![Languages](https://img.shields.io/badge/C-11-ff69b4.svg)](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3631.pdf)
@@ -49,6 +61,10 @@ User code can be written in either `C11` or `C++14`, and has so far been tested
 
 ---------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compilers_backend"></a>
+
 ## Backend
 
 [![Languages](https://img.shields.io/badge/C++-17-ff69b4.svg)](https://en.cppreference.com/w/cpp/17)
@@ -56,6 +72,9 @@ User code can be written in either `C11` or `C++14`, and has so far been tested
 The backend is divided into subdirectories [`api/`](/quest/src/api), [`core/`](/quest/src/core), [`comm/`](/quest/src/comm),  [`cpu/`](/quest/src/cpu) and [`gpu/`](/quest/src/gpu). All can be compiled with a generic `C++17` compiler, but enabling distribution, multithreading and GPU-acceleration requires using specialised compilers for the latter three. Each can be toggled and compiled independently. Note however that tightly-coupled multi-GPU simulations (`comm` + `gpu`) can be accelerated using bespoke compilers, and use of [cuQuantum](https://developer.nvidia.com/cuquantum-sdk) requires modern compilers (`gpu + cuquantum`), detailed below.
 
 
+<!-- permit doxygen to reference section -->
+<a id="compilers_comm"></a>
+
 ### comm
 
 Enabling distribution requires compiling `comm/` with an [MPI](https://en.wikipedia.org/wiki/Message_Passing_Interface)-compatible compiler, which has so far been tested with
@@ -67,6 +86,10 @@ Enabling distribution requires compiling `comm/` with an [MPI](https://en.wikipe
 
 when wrapping all previously mentioned compilers.
 
+
+<!-- permit doxygen to reference section -->
+<a id="compilers_cpu"></a>
+
 ### cpu
 
 Enabling multithreading requires compiling `cpu/` with an [OpenMP](https://www.openmp.org/)-compatible compiler. Versions
@@ -76,18 +99,29 @@ Enabling multithreading requires compiling `cpu/` with an [OpenMP](https://www.o
 have been explicitly tested, as used by the aforementioned compilers.
 
 
+<!-- permit doxygen to reference section -->
+<a id="compilers_gpu"></a>
+
 ### gpu
 
 Enabling acceleration on NVIDIA or AMD GPUs requires compiling `gpu/` with a [CUDA](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/) or [ROCm](https://rocm.docs.amd.com/en/docs-6.0.2/) compiler respectively. These must be compatible with [Thrust](https://developer.nvidia.com/thrust) and [rocThrust](https://github.com/ROCm/rocThrust) respectively. QuEST v4 has been so far tested with
 - [cuda](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) 11
 - [cuda](https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html) 12
 
+
+<!-- permit doxygen to reference section -->
+<a id="compilers_comm-gpu"></a>
+
 ### comm + gpu
 
 Simultaneously emabling both distribution _and_ GPU-acceleration is possible with use of the separate compilers above. However, simulation can be accelerated by using a [CUDA-aware MPI](https://developer.nvidia.com/blog/introduction-cuda-aware-mpi/) compiler, enabling QuEST to use [GPUDirect](https://developer.nvidia.com/gpudirect) and avoid superfluous exchanges of CPU and GPU memories. So far, QuEST has been tested with:
 - [UCX](https://openucx.org/) 1.13
 - [UCX](https://openucx.org/) 1.15
 
+
+<!-- permit doxygen to reference section -->
+<a id="compilers_gpu-cuquantum"></a>
+
 ### gpu + cuquantum
 
 Enabling [cuQuantum](https://developer.nvidia.com/cuquantum-sdk) on NVIDIA GPUs with [compute-capability](https://developer.nvidia.com/cuda-gpus) >= `7.0` requires use of a modern CUDA compiler, specifically
@@ -97,6 +131,10 @@ Enabling [cuQuantum](https://developer.nvidia.com/cuquantum-sdk) on NVIDIA GPUs
 
 ---------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="compilers_tests"></a>
+
 ## Tests
 
 [![Languages](https://img.shields.io/badge/C++-20-ff69b4.svg)](https://en.cppreference.com/w/cpp/20)
diff --git a/docs/contributing.md b/docs/contributing.md
new file mode 100644
index 000000000..cdeb61d67
--- /dev/null
+++ b/docs/contributing.md
@@ -0,0 +1,15 @@
+# ❤️  Contributing
+
+<!--
+  How to contribute
+  (this comment must be under the title for valid doxygen rendering)
+  
+  @author Tyson Jones
+-->
+
+> [!IMPORTANT]  
+> This page is under construction!
+
+<!--- @todo -->
+
+In the meantime, feel free to open an issue, a discussion or a pull request, or reach out to `tyson.jones.input@gmail.com`.
diff --git a/docs/run.md b/docs/launch.md
similarity index 79%
rename from docs/run.md
rename to docs/launch.md
index 9b1fd2239..447e0ff01 100644
--- a/docs/run.md
+++ b/docs/launch.md
@@ -1,35 +1,47 @@
+# 🚀  Launching
+
 <!--
   Instructions for running QuEST with different parallelisations
+  (this comment must be under the title for valid doxygen rendering)
 
   @author Tyson Jones
 -->
 
-# Run
-
-Running your [compiled](compile.md) QuEST application can be as straightforward as running any other executable, though some additional steps are needed to make use of hardware acceleration. This page how to launch your own QuEST applications on different platforms, how to run the examples and unit tests, how to make use of multithreading, GPU-acceleration, distribution and supercomputer job schedulers, and monitor the hardware utilisation.
-
-**TOC**:
-- [Examples](#examples)
-- [Tests](#tests)
-   * [v4](#v4)
-   * [v3](#v3)
-- [Multithreading](#multithreading)
-   * [Choosing threads](#choosing-threads)
-   * [Monitoring utilisation](#monitoring-utilisation)
-   * [Improving performance](#improving-performance)
-- [GPU-acceleration](#gpu-acceleration)
-   * [Launching](#launching)
-   * [Monitoring](#monitoring)
-   * [Configuring](#configuring)
-   * [Benchmarking](#benchmarking)
-- [Distribution](#distribution)
-   * [Launching](#launching-1)
-   * [Configuring](#configuring-1)
-   * [Benchmarking](#benchmarking-1)
-- [Multi-GPU](#multi-gpu)
-- [Supercomputers](#supercomputers)
-   * [SLURM](#slurm)
-   * [PBS](#pbs)
+Launching your [compiled](compile.md) QuEST application can be as straightforward as running any other executable, though some additional steps are needed to make use of hardware acceleration. This page how to launch your own QuEST applications on different platforms, how to run the examples and unit tests, how to make use of multithreading, GPU-acceleration, distribution and supercomputer job schedulers, and monitor the hardware utilisation.
+
+
+<!-- 
+    we are using explicit <a>, rather than markdown links,
+    for Doxygen compatibility. It cannot handle [](#sec)
+    links, and its <a> anchors are not scoped to files, so
+    we here prefix each name with the filename. Grr!
+-->
+
+> **TOC**:
+> - <a href="#launch_examples">Examples</a>
+> - <a href="#launch_tests">Tests</a>
+>    * <a href="#launch_v4">v4</a>
+>    * <a href="#launch_v3">v3</a>
+> - <a href="#launch_multithreading">Multithreading</a>
+>    * <a href="#launch_choosing-threads">Choosing threads</a>
+>    * <a href="#launch_monitoring-utilisation">Monitoring utilisation</a>
+>    * <a href="#launch_improving-performance">Improving performance</a>
+> - <a href="#launch_gpu-acceleration">GPU-acceleration</a>
+>    * <a href="#launch_launching">Launching</a>
+>    * <a href="#launch_monitoring">Monitoring</a>
+>    * <a href="#launch_configuring">Configuring</a>
+>    * <a href="#launch_benchmarking">Benchmarking</a>
+> - <a href="#launch_distribution">Distribution</a>
+>    * <a href="#launch_launching-1">Launching</a>
+>    * <a href="#launch_configuring-1">Configuring</a>
+>    * <a href="#launch_benchmarking-1">Benchmarking</a>
+> - <a href="#launch_multi-gpu">Multi-GPU</a>
+> - <a href="#launch_supercomputers">Supercomputers</a>
+>    * <a href="#launch_slurm">SLURM</a>
+>    * <a href="#launch_pbs">PBS</a>
+
+
+
 
 > [!NOTE]
 > This page assumes you are working in a `build` directory into which all executables have been compiled.
@@ -39,9 +51,13 @@ Running your [compiled](compile.md) QuEST application can be as straightforward
 ---------------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_examples"></a>
+
 ## Examples
 
 > See [`compile.md`](compile.md#examples) for instructions on compiling the examples.
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
 
 The example source codes are located in [`examples/`](/examples/) and are divided into subdirectories, e.g.
 ```
@@ -92,9 +108,17 @@ Must pass single cmd-line argument:
 
 ---------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_tests"></a>
+
 ## Tests
 
 > See [`compile.md`](compile.md#tests) for instructions on compiling the `v4` and `v3` unit tests.
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
+<!-- permit doxygen to reference section -->
+<a id="launch_v4"></a>
 
 ### v4
 
@@ -135,6 +159,7 @@ or specific test sections and subsections:
 ./tests/tests -c "validation" -c "matrix uninitialised"
 ```
 
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
 If the tests were compiled with [distribution enabled](compile.md#distribution), they can distributed via
 ```bash
 mpirun -np 8 ./tests/tests
@@ -168,8 +193,13 @@ Test project /build
 Alas tests launched in this way cannot be deployed with distribution.
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_v3"></a>
+
 ### v3
 
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
 The deprecated tests, when [compiled](compile.md#v3), can be run from the `build` directory via
 ```bash
 ./tests/deprecated/dep_tests
@@ -193,11 +223,20 @@ ctest
 
 ---------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_multithreading"></a>
+
 ## Multithreading
 
 > [!NOTE]
 > Parallelising QuEST over multiple cores and CPUs requires first compiling with 
 > multithreading enabled, as detailed in [`compile.md`](compile.md#multithreading). 
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
+
+<!-- permit doxygen to reference section -->
+<a id="launch_choosing-threads"></a>
 
 ### Choosing threads
 
@@ -221,9 +260,14 @@ It is prudent to choose as many threads as your CPU(s) have total hardware threa
 <!-- the doxygen-doc hyperlink above includes a hash of the function name which should be unchanging! -->
 
 > [!NOTE]
-> When running [distributed](#distribution), variable `OMP_NUM_THREADS` specifies the number of threads _per node_ and so should ordinarily be the number of hardware threads (or cores) **_per machine_**.
+> When running <a href="#launch_distribution">distributed</a>, variable `OMP_NUM_THREADS` specifies the number of threads _per node_ and so should ordinarily be the number of hardware threads (or cores) _per machine_.
+
+
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_monitoring-utilisation"></a>
+
 ### Monitoring utilisation
 
 
@@ -252,6 +296,10 @@ Note however that QuEST will not leverage multithreading at runtime when either:
 Usage of multithreading can be (inadvisably) forced using [`createForcedQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga619bbba1cbc2f7f9bbf3d3b86b3f02be) or [`createCustomQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga849971f43e246d103da1731d0901f2e6).
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_improving-performance"></a>
+
 ### Improving performance
 
 Performance may be improved by setting other [OpenMP variables](https://www.openmp.org/spec-html/5.0/openmpch6.html). Keep in mind that for large `Qureg`, QuEST's runtime is dominated by the costs of modifying large memory structures during long, uninterrupted loops: namely the updating of statevector amplitudes. Some sensible settings include
@@ -264,11 +312,11 @@ Performance may be improved by setting other [OpenMP variables](https://www.open
 
 
 OpenMP experts may further benefit from knowing that QuEST's multithreaded source code, confined to [`cpu_subroutines.cpp`](/quest/src/cpu/cpu_subroutines.cpp), is almost exclusively code similar to
-```C++
+```cpp
 #pragma omp parallel for if(qureg.isMultithreaded)
 for (qindex n=0; n<numIts; n++)
 ```
-```C++
+```cpp
 #pragma omp parallel for reduction(+:val)
 for (qindex n=0; n<numIts; n++)
     val += 
@@ -282,17 +330,25 @@ and never specifies [`schedule`](https://rookiehpc.org/openmp/docs/schedule/inde
 
 
 > [!TIP]
-> Sometimes the memory bandwidth between different sockets of a machine is poor, and it is substantially better to exchange memory in bulk between their NUMA nodes, rather than through repeated random access. In such settings, it can be worthwhile to hybridise multithreading and distribution, even upon a single machine, partitioning same-socket threads into their own MPI node. This forces inter-socket communication to happen in-batch, via message-passing, at the expense of using _double_ total memory (to store buffers). See the [distributed](#distribution) section.
+> Sometimes the memory bandwidth between different sockets of a machine is poor, and it is substantially better to exchange memory in bulk between their NUMA nodes, rather than through repeated random access. In such settings, it can be worthwhile to hybridise multithreading and distribution, even upon a single machine, partitioning same-socket threads into their own MPI node. This forces inter-socket communication to happen in-batch, via message-passing, at the expense of using _double_ total memory (to store buffers). See the <a href="#launch_distribution">distributed</a> section.
 
 
 
 ---------------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_gpu-acceleration"></a>
+
 ## GPU-acceleration
 
 > [!NOTE]
 > Using GPU-acceleration requires first compiling QuEST with `CUDA` or `HIP` enabled (to utilise NVIDIA and AMD GPUs respectively) as detailed in [`compile.md`](compile.md#gpu-acceleration).
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
+
+<!-- permit doxygen to reference section -->
+<a id="launch_launching"></a>
 
 ### Launching
 
@@ -301,9 +357,14 @@ The compiled executable is launched like any other, via
 ./myexec
 ```
 
-Using _multiple_ available GPUs, regardless of whether they are local or distributed, is done through additionally enabling [distribution](#multi-gpu).
+Using _multiple_ available GPUs, regardless of whether they are local or distributed, is done through additionally enabling <a href="#launch_multi-gpu">distribution</a>.
 
 
+
+
+<!-- permit doxygen to reference section -->
+<a id="launch_monitoring"></a>
+
 ### Monitoring
 
 
@@ -344,6 +405,10 @@ Note however that GPU-acceleration might not be leveraged at runtime when either
 Usage of GPU-acceleration can be (inadvisably) forced using [`createForcedQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga619bbba1cbc2f7f9bbf3d3b86b3f02be) or [`createCustomQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga849971f43e246d103da1731d0901f2e6).
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_configuring"></a>
+
 ### Configuring
 
 There are a plethora of [environment variables](https://askubuntu.com/questions/58814/how-do-i-add-environment-variables) which be used to control the execution on [NVIDIA](https://docs.nvidia.com/cuda/cuda-c-programming-guide/#env-vars) and [AMD](https://rocm.docs.amd.com/projects/HIP/en/docs-develop/reference/env_variables.html) GPUs. We highlight only some below.
@@ -355,6 +420,9 @@ There are a plethora of [environment variables](https://askubuntu.com/questions/
 
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_benchmarking"></a>
+
 ### Benchmarking
 
 Beware that the CPU dispatches tasks to the GPU _asynchronously_. Control flow returns immediately to the CPU, which will proceed to other duties (like dispatching the next several quantum operation's worth of instructions to the GPU) while the GPU undergoes independent computation (goes _brrrrr_).
@@ -366,16 +434,25 @@ However, it _does_ mean codes which seeks to benchmark QuEST must be careful to
 ---------------------
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_distribution"></a>
+
 ## Distribution
 
 
 > [!NOTE]
 > Distributing QuEST over multiple machines requires first compiling with 
 > distribution enabled, as detailed in [`compile.md`](compile.md#distribution). 
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
 
 > [!IMPORTANT]
-> Simultaneously using distribution _and_ GPU-acceleration introduces additional considerations detailed in the [proceeding section](#multi-gpu).
+> Simultaneously using distribution _and_ GPU-acceleration introduces additional considerations detailed in the <a href="#launch_multi-gpu">proceeding section</a>.
+
+
 
+<!-- permit doxygen to reference section -->
+<a id="launch_launching-1"></a>
 
 ### Launching
 
@@ -388,17 +465,18 @@ or on some platforms (such as with Intel and Microsoft MPI):
 mpiexec -n 32 myexec.exe
 ```
 
-Some supercomputing facilities however may require custom or additional commands, like [SLURM](https://slurm.schedmd.com/documentation.html)'s [`srun`](https://slurm.schedmd.com/srun.html) command. See an excellent guide [here](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/distribution-binding/#distribution), and the job submission guide [below](#supercomputers).
+Some supercomputing facilities however may require custom or additional commands, like [SLURM](https://slurm.schedmd.com/documentation.html)'s [`srun`](https://slurm.schedmd.com/srun.html) command. See an excellent guide [here](https://docs.lumi-supercomputer.eu/runjobs/scheduled-jobs/distribution-binding/#distribution), and the job submission guide <a href="#launch_supercomputers">below</a>.
 ```bash
 srun --nodes=8 --ntasks-per-node=4 --distribution=block:block
 ```
 
 
+
 > [!IMPORTANT]
 > QuEST can only be distributed with a _power of `2`_ number of nodes, i.e. `1`, `2`, `4`, `8`, `16`, ...
 
 > [!NOTE]
-> When [multithreading](#multithreading) is also enabled, the environment variable `OMP_NUM_THREADS` 
+> When <a href="#launch_multithreading">multithreading</a> is also enabled, the environment variable `OMP_NUM_THREADS` 
 > will determine how many threads are used by _each node_ (i.e. each MPI process). Ergo optimally
 > deploying to `8` machines, each with `64` CPUs (a total of `512` CPUs), might resemble:
 > ```bash
@@ -406,12 +484,16 @@ srun --nodes=8 --ntasks-per-node=4 --distribution=block:block
 > ```
 
 
+
 It is sometimes convenient (mostly for testing) to deploy QuEST across more nodes than there are available machines and sockets, inducing a gratuitous slowdown. Some MPI compilers like [OpenMPI](https://www.open-mpi.org/) forbid this by default, requiring additional commands to permit [oversubscription](https://docs.open-mpi.org/en/main/launching-apps/scheduling.html).
 ```bash
 mpirun -np 1024 --oversubscribe ./mytests
 ```
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_configuring-1"></a>
+
 ### Configuring
 
 
@@ -419,6 +501,9 @@ mpirun -np 1024 --oversubscribe ./mytests
 > - detail environment variables
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_benchmarking-1"></a>
+
 ### Benchmarking
 
 QuEST strives to reduce inter-node communication when performing distributed simulation, which can otherwise dominate runtime. Between these rare communications, nodes work in complete independence and are likely to desynchronise, especially when performing operations with non-uniform loads. In fact, many-controlled quantum gates are skipped by non-participating nodes which would otherwise wait idly!
@@ -433,6 +518,9 @@ It is ergo always prudent to explicitly call [`syncQuESTEnv()`](https://quest-ki
 ---------------------
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_multi-gpu"></a>
+
 ## Multi-GPU
 
 
@@ -445,8 +533,24 @@ It is ergo always prudent to explicitly call [`syncQuESTEnv()`](https://quest-ki
 > - detail controlling local vs distributed gpus with device visibility
 
 
+
+> helpful ARCHER2 snippet:
+> ```bash
+> # Compute the raw process ID for binding to GPU and NIC
+> lrank=$((SLURM_PROCID % SLURM_NTASKS_PER_NODE))
+> 
+> # Bind the process to the correct GPU and NIC
+> export CUDA_VISIBLE_DEVICES=${lrank}
+> export UCX_NET_DEVICES=mlx5_${lrank}:1
+> ```
+
+
 ---------------------
 
+
+<!-- permit doxygen to reference section -->
+<a id="launch_supercomputers"></a>
+
 ## Supercomputers
 
 A QuEST executable is launched like any other in supercomputing settings, including when distributed.
@@ -457,6 +561,11 @@ For convenience however, we offer some example [SLURM](https://slurm.schedmd.com
 > These submission scripts are only illustrative. It is likely the necessary configuration and commands on
 > your own supercomputing facility differs!
 
+
+
+<!-- permit doxygen to reference section -->
+<a id="launch_slurm"></a>
+
 ### SLURM
 
 4 machines each with 8 CPUs:
@@ -489,6 +598,9 @@ srun ./myexec
 
 
 
+<!-- permit doxygen to reference section -->
+<a id="launch_pbs"></a>
+
 ### PBS
 
 4 machines each with 8 CPUs:
diff --git a/docs/styleguide.md b/docs/styleguide.md
index 6e8f435ba..bb48822ea 100644
--- a/docs/styleguide.md
+++ b/docs/styleguide.md
@@ -1,38 +1,42 @@
+# 🎨  Style guide
+
 <!--
   A style guide for QuEST contributors
+  (this comment must be under the title for valid doxygen rendering)
   
   @author Tyson Jones
 -->
 
 
+
 Don't agonise about style - write your code as you see fit and we can address major issues in review/PR.
-Some encouraged convetions include:
+Some encouraged conventions include:
 
 - use `camelCase` for everything except:
   - constants which use `CAPITALS_AND_UNDERSCORES`
   - related function prefixes, like `prefix_someFunction()`
 - favour clarity over concision, for example
-  ```C++
+  ```cpp
   qcomp elem = state[ind][ind];
   qreal prob = std::real(elem);
   return prob;
   ```
   over
-  ```C++
+  ```cpp
   return std::real(state[ind][ind]);
   ```
 - never ever do:
-  ```C++
+  ```cpp
   using namespace std;
   ```
   but _do_ shorten common containers like `vector`:
-  ```C++
+  ```cpp
   using std::vector;
 
   vector<int> mylist;
   ```
 - whitespace is free; use it wherever it can improve clarity, like to separate subroutines.
-  ```C++
+  ```cpp
   // i000 = nth local index where all suffix bits are 0
   qindex i000 = insertThreeZeroBits(n, braQb1, ketQb2, ketQb1);
   qindex i0b0 = setBit(i000, ketQb2, braBit2);
@@ -50,7 +54,7 @@ Some encouraged convetions include:
   ```
 - use `auto` where it improves readability, discretionarily. Obviously it is better than massive, unimportant types of objects or heavily templated collections, but sometimes knowing the precise type of a primitive is helpful
 - It is permissable to avoid superfluous braces around single-line branches:
-  ```C++
+  ```cpp
   if (cond)
       return x;
   ```
diff --git a/docs/tutorial.md b/docs/tutorial.md
new file mode 100644
index 000000000..68b3fbf1b
--- /dev/null
+++ b/docs/tutorial.md
@@ -0,0 +1,868 @@
+# 🎓  Tutorial
+
+<!--
+  Tutorial
+  (this comment must be under the title for valid doxygen rendering)
+  
+  @author Tyson Jones
+-->
+
+QuEST is included into a `C` or `C++` project via
+```cpp
+#include "quest.h"
+```
+
+<!-- @todo the below link fails in Doxygen; it's too stupid to recognise the section ref -->
+> [!TIP]
+> Some of QuEST's deprecated `v3` API can be accessed by specifying `ENABLE_DEPRECATED_API` when [compiling](/docs/compile.md#v3), or defining it before import, i.e. 
+> ```cpp
+> #define ENABLE_DEPRECATED_API 1
+> #include "quest.h"
+> ```
+> We recommend migrating to the latest `v4` API however as will be showcased below.
+
+Simulation typically proceeds as:
+1. [Initialise](https://quest-kit.github.io/QuEST/group__environment.html#gab89cfc1bf94265f4503d504b02cf54d4) the QuEST [environment](https://quest-kit.github.io/QuEST/group__environment.html), preparing available GPUs and networks.
+2. [Configure](https://quest-kit.github.io/QuEST/group__debug.html) the environment, such as through [seeding](https://quest-kit.github.io/QuEST/group__debug__seed.html).
+3. [Create](https://quest-kit.github.io/QuEST/group__qureg__create.html) a [`Qureg`](https://quest-kit.github.io/QuEST/structQureg.html), allocating memory for its amplitudes.
+4. Prepare its [initial state](https://quest-kit.github.io/QuEST/group__initialisations.html), overwriting its amplitudes.
+5. Apply [operators](https://quest-kit.github.io/QuEST/group__operations.html) and [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html), expressed as [matrices](https://quest-kit.github.io/QuEST/group__matrices.html) and [channels](https://quest-kit.github.io/QuEST/group__channels.html).
+6. Perform [calculations](https://quest-kit.github.io/QuEST/group__calculations.html), potentially using [Pauli](https://quest-kit.github.io/QuEST/group__paulis.html) observables.
+7. [Report](https://quest-kit.github.io/QuEST/group__types.html) or log the results to file.
+8. Destroy any heap-allocated [`Qureg`](https://quest-kit.github.io/QuEST/group__qureg__destroy.html) or [matrices](https://quest-kit.github.io/QuEST/group__matrices__destroy.html).
+9. [Finalise](https://quest-kit.github.io/QuEST/group__environment.html#ga428faad4d68abab20f662273fff27e39) the QuEST environment.
+
+Of course, the procedure is limited only by the programmers imagination `¯\_(ツ)_/¯` Let's see an example of these steps below.
+
+
+<!-- 
+    we are using explicit <a>, rather than markdown links,
+    for Doxygen compatibility. It cannot handle [](#sec)
+    links, and its <a> anchors are not scoped to files, so
+    we here prefix each name with the filename. Grr!
+-->
+
+> **TOC**:
+> - <a href="#tutorial_initialise-the-environment">Initialise the environment</a>
+> - <a href="#tutorial_configure-the-environment">Configure the environment</a>
+> - <a href="#tutorial_create-a-qureg">Create a `Qureg`</a>
+> - <a href="#tutorial_prepare-an-initial-state">Prepare an initial state</a>
+> - <a href="#tutorial_apply-operators">Apply operators</a>
+>   * <a href="#tutorial_controls">controls</a>
+>   * <a href="#tutorial_paulis">paulis</a>
+>   * <a href="#tutorial_matrices">matrices</a>
+>   * <a href="#tutorial_circuits">circuits</a>
+>   * <a href="#tutorial_measurements">measurements</a>
+>   * <a href="#tutorial_decoherence">decoherence</a>
+> - <a href="#tutorial_perform-calculations">Perform calculations</a>
+> - <a href="#tutorial_report-the-results">Report the results</a>
+> - <a href="#tutorial_cleanup">Cleanup</a>
+> - <a href="#tutorial_finalise-quest">Finalise QuEST</a>
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_initialise-the-environment"></a>
+
+## Initialise the environment
+
+
+Before calling any other QuEST functions, we must [_initialise_](https://quest-kit.github.io/QuEST/group__environment.html#gab89cfc1bf94265f4503d504b02cf54d4) the QuEST [_environment_](https://quest-kit.github.io/QuEST/group__environment.html).
+```cpp
+initQuESTEnv();
+```
+This does several things, such as
+- assessing which hardware accelerations (multithreading, GPU-acceleration, distribution, cuQuantum) were compiled and are currently available to use.
+- initialising any external libraries as needed, like MPI, CUDA and cuQuantum.
+- seeding the random number generators (informing measurements and random states), using a [CSPRNG](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator) if available.
+
+We could instead forcefully [disable](https://quest-kit.github.io/QuEST/group__environment.html#ga485268e52f838743357e7a4c8c241e57) certain hardware accelerations
+```cpp
+int useMPI = 0;
+int useGPU = 0;
+int useOMP = 0;
+initCustomQuESTEnv(useMPI, useGPU, useOMP);
+```
+
+> [!TIP]
+> We recommend enabling _all_ deployments, as automated by `initQuESTEnv()`, which
+> permits QuEST to choose how to best accelerate subsequently created `Qureg`.
+
+We can [view](https://quest-kit.github.io/QuEST/group__environment.html#ga08bf98478c4bf21b0759fa7cd4a97496) the environment configuration at runtime, via
+```cpp
+reportQuESTEnv();
+```
+which might output something like
+```
+QuEST execution environment:
+  [precision]
+    qreal.................double (8 bytes)
+    qcomp.................std::__1::complex<double> (16 bytes)
+    qindex................long long int (8 bytes)
+    validationEpsilon.....1e-12
+  [compilation]
+    isMpiCompiled...........1
+    isGpuCompiled...........1
+    isOmpCompiled...........1
+    isCuQuantumCompiled.....0
+  [deployment]
+    isMpiEnabled.....0
+    isGpuEnabled.....1
+    isOmpEnabled.....1
+  [cpu]
+    numCpuCores.......10 per machine
+    numOmpProcs.......10 per machine
+    numOmpThrds.......8 per node
+    cpuMemory.........32 GiB per node
+    cpuMemoryFree.....7.1 GiB per node
+  [gpu]
+    numGpus...........1
+    gpuDirect.........1
+    gpuMemPools.......1
+    gpuMemory.........15.9 GiB per node
+    gpuMemoryFree.....15.2 GiB per node
+    gpuCache..........1 GiB
+  [distribution]
+    isMpiGpuAware.....0
+    numMpiNodes.......8
+  [statevector limits]
+    minQubitsForMpi.............3
+    maxQubitsForCpu.............30
+    maxQubitsForGpu.............29
+    maxQubitsForMpiCpu..........35
+    maxQubitsForMpiGpu..........34
+    maxQubitsForMemOverflow.....59
+    maxQubitsForIndOverflow.....63
+  [density matrix limits]
+    minQubitsForMpi.............2
+    maxQubitsForCpu.............15
+    maxQubitsForGpu.............14
+    maxQubitsForMpiCpu..........17
+    maxQubitsForMpiGpu..........16
+    maxQubitsForMemOverflow.....29
+    maxQubitsForIndOverflow.....31
+  [statevector autodeployment]
+    8 qubits.....[omp]
+    12 qubits....[gpu]
+    29 qubits....[gpu] [mpi]
+  [density matrix autodeployment]
+    4 qubits.....[omp]
+    6 qubits.....[gpu]
+    15 qubits....[gpu] [mpi]
+```
+
+We can also [obtain](https://quest-kit.github.io/QuEST/group__environment.html#ga6b9e84b462a999a1fbb9a372f990c491) some of the environment information [programmatically](https://quest-kit.github.io/QuEST/structQuESTEnv.html)
+```cpp
+QuESTEnv env = getQuESTEnv();
+
+if (env.isGpuAccelerated)
+    printf("vroom vroom");
+```
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_configure-the-environment"></a>
+
+## Configure the environment
+
+
+Configuring the environment is ordinarily not necessary, but convenient in certain applications.
+
+For example, we may wish our simulations to deterministically obtain the same measurement outcomes and random states as a previous or future run, and ergo choose to [override](https://quest-kit.github.io/QuEST/group__debug__seed.html#ga9e3a6de413901afbf50690573add1587) the default seeds.
+```cpp
+unsigned seeds[] = {123u, 1u << 10};
+setSeeds(seeds, 2);
+```
+
+We may wish further to [adjust](https://quest-kit.github.io/QuEST/group__debug__reporting.html) how subsequent functions will display information to the screen
+```cpp
+int maxRows = 8;
+int maxCols = 4;
+setMaxNumReportedItems(maxRows, maxCols);
+setMaxNumReportedSigFigs(3);
+```
+or [add](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga29413703d609254244d6b13c663e6e06) extra spacing between QuEST's printed outputs
+```cpp
+setNumReportedNewlines(3);
+```
+
+Perhaps we also wish to relax the [precision](https://quest-kit.github.io/QuEST/group__debug__validation.html#gae395568df6def76045ec1881fcb4e6d1) with which our future inputs will be asserted unitary or Hermitian
+```cpp
+setValidationEpsilon(0.001);
+```
+but when unitarity _is_ violated, or we otherwise pass an invalid input, we wish to execute a [custom function](https://quest-kit.github.io/QuEST/group__debug__validation.html#ga14b6e7ce08465e36750da3acbc41062f) before exiting.
+```cpp
+#include <stdlib.h>
+
+void myErrorHandler(const char *func, const char *msg) {
+    printf("QuEST function '%s' encountered error '%s'\n", func, msg);
+    printf("Exiting...\n");
+    exit(1);
+}
+
+setInputErrorHandler(myErrorHandler);
+```
+
+> [!TIP]
+> `C++` users may prefer to throw an exception which can be caught, safely permitting execution to continue. In such cases, the erroneous function will _never_ corrupt any passed inputs like `Qureg` nor matrices, nor cause leaks.
+> ```cpp
+> #include <stdexcept>
+> #include <string>
+> void myErrorHandlerA(const char* errFunc, const char* errMsg) {
+>     std::string func(errFunc);
+>     std::string msg(errMsg);
+>     throw std::runtime_error(func + ": " + msg);
+> }
+> setInputErrorHandler(myErrorHandler);
+> ```
+<!-- newlines removed above because doxygen renders them as <br> text, how stupid! -->
+
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_create-a-qureg"></a>
+
+## Create a Qureg
+
+
+To [create](https://quest-kit.github.io/QuEST/group__qureg__create.html) a statevector of `10` qubits, we call
+```cpp
+Qureg qureg = createQureg(10);
+```
+which we can [verify](https://quest-kit.github.io/QuEST/group__qureg__report.html#ga2a9df2538e537332b1aef8596ce337b2) has begun in the very boring zero state.
+```cpp
+reportQureg(qureg);
+```
+```
+Qureg (10 qubit statevector, 1024 qcomps, 16.1 KiB):
+    1  |0⟩
+    0  |1⟩
+    0  |2⟩
+    0  |3⟩
+    ⋮
+    0  |1020⟩
+    0  |1021⟩
+    0  |1022⟩
+    0  |1023⟩
+```
+> This printed only `8` amplitudes as per our setting of [`setMaxNumReportedItems()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga093c985b1970a0fd8616c01b9825979a) above.
+
+Behind the scenes, the function `createQureg` did something clever; it consulted the compiled deployments and available hardware to decide whether to distribute `qureg`, or dedicate it persistent GPU memory, and marked whether or not to multithread its subsequent modification. It attempts to choose _optimally_, avoiding gratuitous parallelisation if the overheads outweigh the benefits, or if the hardware devices have insufficient memory.
+
+We call this _auto-deployment_, and the chosen configuration can be [previewed](https://quest-kit.github.io/QuEST/group__qureg__report.html#ga97d96af7c7ea7b31e32cbe3b25377e09) via
+```cpp
+reportQuregParams(qureg);
+```
+```
+Qureg:
+  [deployment]
+    isMpiEnabled.....0
+    isGpuEnabled.....0
+    isOmpEnabled.....1
+  [dimension]
+    isDensMatr.....0
+    numQubits......10
+    numCols........N/A
+    numAmps........2^10 = 1024
+  [distribution]
+    numNodes.....N/A
+    numCols......N/A
+    numAmps......N/A
+  [memory]
+    cpuAmps...........16 KiB
+    gpuAmps...........N/A
+    cpuCommBuffer.....N/A
+    gpuCommBuffer.....N/A
+    globalTotal.......16 KiB
+```
+The above output informs us that the `qureg` has not been distributed nor GPU-accelerated, but _will_ be multithreaded.
+
+If we so wished, we could [_force_](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga619bbba1cbc2f7f9bbf3d3b86b3f02be) the use of all deployments available to the environment
+```cpp
+Qureg qureg = createForcedQureg(10);
+reportQuregParams(qureg);
+```
+```
+Qureg:
+  [deployment]
+    isMpiEnabled.....1
+    isGpuEnabled.....1
+    isOmpEnabled.....1
+  [dimension]
+    isDensMatr.....0
+    numQubits......10
+    numCols........N/A
+    numAmps........2^10 = 1024
+  [distribution]
+    numNodes.....2^3 = 8
+    numCols......N/A
+    numAmps......2^7 = 128 per node
+  [memory]
+    cpuAmps...........2 KiB per node
+    gpuAmps...........2 KiB per node
+    cpuCommBuffer.....2 KiB per node
+    gpuCommBuffer.....2 KiB per node
+    globalTotal.......64 KiB
+```
+or [select](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga849971f43e246d103da1731d0901f2e6) specific deployments
+```cpp
+int useMPI = 1;
+int useGPU = 0;
+int useOMP = 0;
+Qureg qureg = createCustomQureg(10, 0, useMPI, useGPU, useOMP);
+```
+
+In lieu of a statevector, we could create a [density matrix](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga1470424b0836ae18b5baab210aedf5d9)
+```cpp
+Qureg qureg = createDensityQureg(10);
+```
+which is also auto-deployed. Note this contains _square_ as many amplitudes as the equal-dimension statevector and ergo requires _square_ as much memory.
+```cpp
+reportQureg(qureg);
+reportQuregParams(qureg);
+```
+```
+Qureg (10 qubit density matrix, 1024x1024 qcomps, 16 MiB):
+    1  0  …  0  0
+    0  0  …  0  0
+    0  0  …  0  0
+    0  0  …  0  0
+    ⋮
+    0  0  …  0  0
+    0  0  …  0  0
+    0  0  …  0  0
+    0  0  …  0  0
+
+
+Qureg:
+  ...
+  [dimension]
+    isDensMatr.....1
+    numQubits......10
+    numCols........2^10 = 1024
+    numAmps........2^20 = 1048576
+  ...
+  [memory]
+    cpuAmps...........16 MiB
+    ...
+    globalTotal.......16 MiB
+```
+
+> The spacing between the outputs of those two consecutive QuEST functions was determined by our earlier call to [`setMaxNumReportedSigFigs()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga29413703d609254244d6b13c663e6e06).
+
+
+A density matrix `Qureg` can model classical uncertainty as results from [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html), and proves useful when simulating quantum operations on a noisy quantum computer.
+
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_prepare-an-initial-state"></a>
+
+## Prepare an initial state
+
+
+In lieu of manually [modifying](https://quest-kit.github.io/QuEST/group__init__amps.html) the state amplitudes, QuEST includes functions to prepare a `Qureg` in some common [initial states](https://quest-kit.github.io/QuEST/group__init__states.html)
+
+```cpp
+initZeroState(qureg);         // |0> or |0><0|
+initPlusState(qureg);         // |+> or |+><+|
+initClassicalState(qureg, i); // |i> or |i><i|
+initPureState(rho, psi);      // rho = |psi><psi|
+```
+or random states
+```cpp
+initRandomPureState(psi);
+
+int numPureStates = 15;
+initRandomMixedState(rho, numPureStates);
+
+reportQureg(psi);
+reportQureg(rho);
+```
+```
+Qureg (5 qubit statevector, 32 qcomps, 616 bytes):
+    0.0884-0.164i     |0⟩
+    0.149+0.207i      |1⟩
+    0.232+0.0656i     |2⟩
+    -0.0435+0.0332i   |3⟩
+            ⋮
+    -0.108-0.0431i    |28⟩
+    -0.0161-0.121i    |29⟩
+    -0.0463+0.00341i  |30⟩
+    -0.0491-0.186i    |31⟩
+
+
+Qureg (5 qubit density matrix, 32x32 qcomps, 16.1 KiB):
+    0.0256+(1.08e-19)i  -0.000876+0.00412i  …  0.000912+0.00869i   -0.00597+0.00615i
+    -0.000876-0.00412i  0.033-(6.78e-20)i   …  0.000223+0.00369i   -0.00207+0.00451i
+    -0.00443-0.00871i   0.0155-0.000843i    …  0.00375+0.00669i    (8.5e-5)-0.000851i
+    0.00287-0.00397i    0.00637-0.000315i   …  0.00486+0.00218i    0.00268+0.0053i
+             ⋮
+    -0.00385-0.000732i  0.00965+0.00542i    …  0.00162-0.0112i     0.00404+0.00685i
+    0.00491+0.00245i    -0.000319+0.0021i   …  -0.00902-0.00312i   -0.00465+0.00275i
+    0.000912-0.00869i   0.000223-0.00369i   …  0.0183+(1.32e-19)i  0.000509+0.00401i
+    -0.00597-0.00615i   -0.00207-0.00451i   …  0.000509-0.00401i   0.0173+(3.12e-19)i
+```
+
+> The number of printed significant figures above results from our earlier calling of [`setMaxNumReportedSigFigs()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga15d46e5d813f70b587762814964e1994).
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_apply-operators"></a>
+
+## Apply operators
+
+
+QuEST supports an extensive set of [operators](https://quest-kit.github.io/QuEST/group__operations.html) to effect upon a `Qureg`. 
+```cpp
+int target = 2;
+applyHadamard(qureg, target);
+
+qreal angle = 3.14 / 5;
+int targets[]  = {4,5,6};
+applyPhaseGadget(qureg, targets, 3, angle);
+```
+
+> [!IMPORTANT]  
+> Notice the type of `angle` is [`qreal`](https://quest-kit.github.io/QuEST/group__types.html#ga2d479c159621c76ca6f96abe66f2e69e) rather than the expected `double`. This is a precision agnostic alias for a floating-point, real scalar which allows you to recompile QuEST with a varying [precision](/docs/compile.md#precision) with no modifications to your code. 
+<!-- @todo the above link fails in Doxygen; it's too stupid to recognise the section ref -->
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_controls"></a>
+
+### controls
+
+
+All unitary operations accept any number of control qubits
+```cpp
+int controls[] = {0,1,2,3,7,8,9};
+applyMultiControlledSqrtSwap(qureg, controls, 7, targets[0], targets[1]);
+```
+and even _control states_ which specify the bits (`0` or `1`) that the respective controls must be in to effect the non-identity operation.
+```cpp
+int states[] = {0,0,0,1,1,1,0};
+applyMultiStateControlledRotateX(qureg, controls, states, 7, target, angle);
+```
+
+> [!TIP]
+> `C` users can pass inline list arguments using [compound literals](https://en.cppreference.com/w/c/language/compound_literal)
+> ```C
+> applyMultiControlledMultiQubitNot(qureg, (int[]) {0,1,2}, 3, (int[]) {4,5}, 2);
+> ```
+> while `C++` users can pass [vector](https://en.cppreference.com/w/cpp/container/vector) literals or [initializer lists](https://en.cppreference.com/w/cpp/utility/initializer_list), alleviating the need to specify the list lengths.
+> ```cpp
+> applyMultiControlledMultiQubitNot(qureg, {0,1,2}, {4,5});
+> ```
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_paulis"></a>
+
+### paulis
+
+
+Some operators accept [`PauliStr`](https://quest-kit.github.io/QuEST/structPauliStr.html) which can be [constructed](https://quest-kit.github.io/QuEST/group__paulis__create.html) all sorts of ways - even inline!
+```cpp
+applyPauliGadget(qureg, getPauliStr("XYZ"), angle);
+```
+
+> [!TIP]
+> Using _one_ QuEST function is _always_ faster than using an equivalent sequence. So
+> ```cpp
+> applyPauliStr(qureg, getPauliStr("YYYYYYY"));
+> ```
+> is _much_ faster than
+> ```cpp
+> for (int i=0; i<7; i++)
+>     applyPauliY(qureg, i);
+> ```
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_matrices"></a>
+
+### matrices
+
+
+<!-- giving no hyperlink -->
+
+#### `CompMatr1`
+
+Don't see your operation in the API? You can specify it as a general [matrix](https://quest-kit.github.io/QuEST/group__matrices.html).
+```cpp
+qcomp x = 1i/sqrt(2);
+CompMatr1 matrix = getInlineCompMatr({{-x,x},{-x,-x}});
+applyCompMatr1(qureg, target, matrix);
+```
+
+> [!IMPORTANT]  
+> The type [`qcomp`](https://quest-kit.github.io/QuEST/group__types.html#ga4971f489e74bb185b9b2672c14301983) above is a precision agnostic complex scalar, and has beautiful arithmetic overloads!
+> ```cpp
+> qcomp x = 1.5 + 3.14i;
+> qcomp *= 1E3i - 1E-5i;
+> ```
+> Beware that in `C++`, `1i` is a _double precision_ literal, so `C++` users should instead
+> use the custom precision-agnostic literal `1_i`.
+> ```cpp
+> qcomp x = 1.5 + 3.14_i;
+> ```
+
+
+<!-- giving no hyperlink -->
+
+#### `CompMatr`
+
+Want a bigger matrix? No problem - they can be [any size](https://quest-kit.github.io/QuEST/group__matrices__create.html#ga634309472d1edf400174680af0685b89), with many ways to [initialise](https://quest-kit.github.io/QuEST/group__matrices__setters.html) them.
+```cpp
+CompMatr bigmatrix = createCompMatr(8);
+setCompMatr(bigmatrix, {{1,2,3,...}});
+applyCompMatr(qureg, ..., bigmatrix);
+```
+Matrix elements can be manually modified, though this requires we [synchronise](https://quest-kit.github.io/QuEST/group__matrices__sync.html) them with GPU memory once finished.
+```cpp
+qindex dim = bigmatrix.numRows;
+
+// initialise random diagonal unitary
+for (qindex r=0; r<dim; r++)
+    for (qindex c=0; c<dim; c++)
+        bigmatrix.cpuElems[r][c] = exp(rand() * 1i) * (r==c);
+
+// update the GPU copy 
+syncCompMatr(bigmatrix);
+```
+
+> [!IMPORTANT]  
+> The created `CompMatr` is a [heap object](https://craftofcoding.wordpress.com/2015/12/07/memory-in-c-the-stack-the-heap-and-static/) and must be [destroyed](https://quest-kit.github.io/QuEST/group__matrices__destroy.html) when we are finished with it, to free up its memory and avoid leaks.
+> ```cpp
+> destroyCompMatr(bigmatrix);
+> ```
+> This is true of any QuEST structure returned by a `create*()` function. It is _not_ true of functions prefixed with `get*()` with are always [stack variables](https://craftofcoding.wordpress.com/2015/12/07/memory-in-c-the-stack-the-heap-and-static/), hence why functions like `getCompMatr1()` can be called inline!
+
+
+<!-- giving no hyperlink -->
+
+#### `FullStateDiagMatr`
+
+Above, we initialised [`CompMatr`](https://quest-kit.github.io/QuEST/structCompMatr.html) to a diagonal unitary. This is incredibly wasteful; only `256` of its `65536` elements are non-zero! We should instead use [`DiagMatr`](https://quest-kit.github.io/QuEST/structDiagMatr.html) or [`FullStateDiagMatr`](https://quest-kit.github.io/QuEST/structFullStateDiagMatr.html). The latter is even distributed (if chosen by the autodeployer), permitting it to be as large as a `Qureg` itself!
+```cpp
+FullStateDiagMatr fullmatrix = createFullStateDiagMatr(qureg.numQubits);
+```
+and can be [initialised](https://quest-kit.github.io/QuEST/group__matrices__setters.html) in many ways, including from all-`Z` pauli sums!
+```cpp
+PauliStrSum sum = createInlinePauliStrSum(R"(
+    1   II
+    1i  ZI
+    1i  IZ
+    -1  ZZ
+)");
+
+setFullStateDiagMatrFromPauliStrSum(fullmatrix, sum);
+```
+> [!IMPORTANT]  
+> The argument to `createInlinePauliStrSum` is a multiline string for which the syntax differs between `C` and `C++`; we used the latter above. See examples [`initialisation.c`](/examples/paulis/initialisation.c) and [`initialisation.cpp`](/paulis/matrices/initialisation.cpp) for clarity.
+
+> [!CAUTION]
+> Beware that in distributed settings, because `fullmatrix` _may_ be distributed, we should must exercise extreme caution when modifying its `fullmatrix.cpuElems` directly. 
+
+
+A `FullStateDiagMatr` acts upon all qubits of a qureg
+```cpp
+applyFullStateDiagMatr(qureg, fullmatrix);
+```
+and can be raised to an arbitrary power, helpful for example in simulating [quantum spectral methods](https://www.science.org/doi/10.1126/sciadv.abo7484).
+```cpp
+qcomp exponent = 3.5;
+applyFullStateDiagMatrPower(qureg, fullmatrix, exponent);
+```
+
+Notice the `exponent` is a `qcomp` and ergo permitted to be a complex number. Unitarity requires `exponent` is strictly real, but we can always relax the unitarity validation...
+
+
+<!-- giving no hyperlink -->
+
+#### validation
+
+
+Our example above initialised `CompMatr` to a diagonal because it is tricky to generate random non-diagonal _unitary_ matrices - and QuEST checks for unitarity!
+```cpp
+// m * dagger(m) != identity
+CompMatr1 m = getCompMatr1({{.1,.2},{.3,.4}});
+applyCompMatr1(qureg, 0, m);
+```
+```
+QuEST encountered a validation error during function 'applyCompMatr1':
+The given matrix was not (approximately) unitary.
+Exiting...
+```
+If we're satisfied our matrix _is_ sufficiently approximately unitary, we can [adjust](https://quest-kit.github.io/QuEST/group__debug__validation.html#gae395568df6def76045ec1881fcb4e6d1) or [disable](https://quest-kit.github.io/QuEST/group__debug__validation.html#ga5999824df0785ea88fb2d5b5582f2b46) the validation.
+```cpp
+// max(norm(m * dagger(m) - identity)) = 0.9025
+setValidationEpsilon(0.903);
+applyCompMatr1(qureg, 0, m);
+```
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_circuits"></a>
+
+### circuits
+
+
+QuEST includes a few convenience functions for effecting [QFT](https://quest-kit.github.io/QuEST/group__op__qft.html) and [Trotter](https://quest-kit.github.io/QuEST/group__op__paulistrsum.html) circuits.
+
+```cpp
+applyQuantumFourierTransform(qureg, targets, 3);
+
+qreal time = .3;
+int order = 4;
+int reps = 10;
+applyTrotterizedPauliStrSumGadget(qureg, sum, time, order, reps);
+```
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_measurements"></a>
+
+### measurements
+
+
+We can also effect a wide range of non-unitary operations, such as destructive [measurements](https://quest-kit.github.io/QuEST/group__op__measurement.html)
+```cpp
+int outcome1 = applyQubitMeasurement(qureg, 0);
+
+qreal prob;
+qindex outcome2 = applyMultiQubitMeasurementAndGetProb(qureg, targets, 3, &prob);
+```
+and conveniently [report](https://quest-kit.github.io/QuEST/group__types.html#ga2be8a4433585a8d737c02128b4754a03) their outcome.
+```cpp
+reportScalar("one qubit outcome", outcome1);
+reportScalar("three qubit outcome", outcome2);
+```
+
+> [!IMPORTANT]  
+> Notice the type of `outcome2` is a [`qindex`](https://quest-kit.github.io/QuEST/group__types.html#ga6017090d3ed4063ee7233e20c213424b) rather than an `int`. This is a larger type which can store much larger numbers without overflow - up to `2^63` - and is always used by the API for many-qubit indices.
+
+Should we wish to leave the state unnormalised, we can instead use [projectors](https://quest-kit.github.io/QuEST/group__op__projectors.html).
+
+
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_decoherence"></a>
+
+### decoherence
+
+
+Density matrices created with [`createDensityQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga1470424b0836ae18b5baab210aedf5d9) can undergo [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html) channels.
+
+```cpp
+qreal prob = 0.1;
+mixDamping(rho, target, prob);
+mixDephasing(rho, target, prob);
+mixTwoQubitDepolarising(rho, targets[0], targets[1], prob);
+```
+which we can specify as inhomogeneous Pauli channels
+```cpp
+// passing probabilities of X, Y, Z errors respectively
+mixPaulis(Qureg qureg, target, .05, .10, .15);
+```
+or completely generally as [Kraus maps](https://quest-kit.github.io/QuEST/group__channels.html) and [superoperators](https://quest-kit.github.io/QuEST/group__channels.html)!
+```cpp
+int numTargets = 1;
+int numOperators = 4;
+
+qreal p = 0.1;
+qreal l = 0.3;
+
+// generalised amplitude damping
+KrausMap map = createInlineKrausMap(numTargets, numOperators, {
+    {
+        {sqrt(p), 0},
+        {0, sqrt(p*(1-l))}
+    }, {
+        {0, sqrt(p*l)}, 
+        {0, 0}
+    }, {
+        {sqrt((1-p)*(1-l)), 0},
+        {0, sqrt(1-p)}
+    }, {
+        {0, 0},
+        {sqrt((1-p)*l), 0}
+    }
+});
+
+int victims[] = {2};
+mixKrausMap(rho, victims, 1, map);
+```
+We can even directy mix density matrices together
+```cpp
+mixQureg(rho1, rho2, prob);
+```
+
+Sometimes we wish to left-multiply general operators upon density matrices without also right-multiplying their adjoint - i.e. our operators should _not_ be effected as unitaries. We can do this with the `multiply*()` functions.
+```cpp
+multiplyDiagMatrPower(rho, fullmatrix, 0.5);
+```
+
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_perform-calculations"></a>
+
+## Perform calculations
+
+
+After so much modification to our state, we will find that its amplitudes have differed substantially. But it's impractical to observe the exponentially-many amplitudes with [`reportQureg()`](https://quest-kit.github.io/QuEST/group__qureg__report.html#ga2a9df2538e537332b1aef8596ce337b2). We can instead give QuEST the [questions](https://quest-kit.github.io/QuEST/group__calculations.html) we wish to answer about the resulting state.
+
+For example, we can find the [probability](https://quest-kit.github.io/QuEST/group__calc__prob.html) of measurement outcomes _without_ modifying the state.
+```cpp
+int outcome = 1;
+qreal prob1 = calcProbOfQubitOutcome(qureg, target, outcome);
+
+int qubits[]   = {2,3,4};
+int outcomes[] = {0,1,1};
+qreal prob2 = calcProbOfMultiQubitOutcome(qureg, qubits, outcomes, 3);
+```
+We can obtain _all_ outcome probabilities in one swoop:
+```cpp
+qreal probs[8];
+calcProbsOfAllMultiQubitOutcomes(probs, qureg, qubits, 3);
+```
+
+> [!TIP]
+> `C++` users can also obtain the result as a natural `std::vector<qreal>`.
+> ```cpp
+> auto probs = calcProbsOfAllMultiQubitOutcomes(qureg, {2,3,4});
+> ```
+
+It is similarly trivial to find [expectation values](https://quest-kit.github.io/QuEST/group__calc__expec.html)
+```cpp
+qreal expec1 = calcExpecPauliStr(qureg, getPauliStr("XYZIII"));
+qreal expec2 = calcExpecPauliStrSum(qureg, sum);
+qreal expec3 = calcExpecFullStateDiagMatr(qureg, fullmatrix);
+```
+or [distance measures](https://quest-kit.github.io/QuEST/group__calc__comparisons.html) between states, including between statevectors and density matrices.
+```cpp
+qreal pur = calcPurity(rho);
+qreal fid = calcFidelity(rho, psi);
+qreal dist = calcDistance(rho, psi);
+```
+
+We can even find reduced density matrices resulting from [partially tracing](https://quest-kit.github.io/QuEST/group__calc__partialtrace.html) out qubits.
+```cpp
+Qureg reduced = calcPartialTrace(qureg, targets, 3);
+
+reportScalar("entanglement", calcPurity(reduced));
+```
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_report-the-results"></a>
+
+## Report the results
+
+
+We've seen above that [scalars](https://quest-kit.github.io/QuEST/group__types.html) can be reported, handling the pretty formatting of real and complex numbers, controlled by settings like [`setMaxNumReportedSigFigs()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga15d46e5d813f70b587762814964e1994). But we can also report every data structure in the QuEST API, such as Pauli strings
+```cpp
+reportPauliStr(
+    getInlinePauliStr("XXYYZZ", {5,50, 10,60, 30,40})
+);
+```
+```
+YIIIIIIIIIXIIIIIIIIIZIIIIIIIIIZIIIIIIIIIIIIIIIIIIIYIIIIXIIIII
+```
+and their weighted sums
+```cpp
+reportPauliStrSum(sum);
+```
+```
+PauliStrSum (4 terms, 160 bytes):
+    1   II
+    i   ZI
+    i   IZ
+    -1  ZZ
+```
+All outputs are affected by the [reporter settings](https://quest-kit.github.io/QuEST/group__debug__reporting.html).
+```cpp
+setMaxNumReportedItems(4,4);
+setMaxNumReportedSigFigs(1);
+reportCompMatr(bigmatrix);
+```
+```
+CompMatr (8 qubits, 256x256 qcomps, 1 MiB):
+    0.9-0.5i  0         …  0          0
+    0         0.8-0.6i  …  0          0
+        ⋮
+    0         0         …  -0.5-0.9i  0
+    0         0         …  0          0.4+0.9i
+```
+
+
+
+> [!NOTE]  
+> Facilities for automatically logging to file are coming soon!
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_cleanup"></a>
+
+## Cleanup
+
+
+While not strictly necessary before the program ends, it is a good habit to destroy data structures as soon as you are finished with them, freeing their memory.
+
+```cpp
+destroyQureg(qureg);
+destroyCompMatr(bigmatrix);
+destroyFullStateDiagMatr(fullmatrix);
+destroyPauliStrSum(sum);
+destroyKrausMap(map);
+```
+
+
+
+--------------------------------------------
+
+<!-- permit doxygen to reference section -->
+<a id="tutorial_finalise-quest"></a>
+
+## Finalise QuEST
+
+
+The _final_ [step](https://quest-kit.github.io/QuEST/group__environment.html#ga428faad4d68abab20f662273fff27e39) of our program should be to call
+```cpp
+finalizeQuESTEnv();
+```
+which ensures everything is synchronised, frees accelerator resources, and finalises MPI.
+This is important because it ensures:
+-  _everything is done_, and that distributed nodes that are still working (e.g. haven't yet logged to their own file) are not interrupted by early termination of another node.
+- the MPI process ends gracefully, and doesn't spew out messy errors!
+- our GPU processes are killed quickly, freeing resources for other processes.
+
+> [!CAUTION]
+> After calling `finalizeQuESTEnv()`, MPI will close and each if being accessed directly by the user, will enter an undefined state. Subsequent calls to MPI routines may return gibberish, and distributed machines will have lost their ability to communicate. It is recommended to call `finalizeQuESTEnv()` immediately before exiting.
+
+You are now a QuEST expert 🎉 though there are _many_ more functions in the [API](https://quest-kit.github.io/QuEST/group__api.html) not covered here. Go forth and simulate!
\ No newline at end of file
diff --git a/docs/v4.md b/docs/v4.md
index 77df7ea62..bc8018355 100644
--- a/docs/v4.md
+++ b/docs/v4.md
@@ -1,17 +1,36 @@
+# 🎉  What's new in v4
 
-# What's new in v4
+<!--
+  Version 4 feature list
+  (this comment must be under the title for valid doxygen rendering)
+  
+  @author Tyson Jones
+-->
 
 QuEST `v4` has completely overhauled the API, software architecture, algorithms, implementations and testing. This page details the new features, divided into those relevant to _users_, _developers_ who integrate QuEST into larger software stacks, and _contributors_ who develop QuEST or otherwise peep at the source code!
 
-**TOC**:
-- [For users](#for-users)
-- [For developers](#for-developers)
-- [For contributors](#for-contributors)
-- [Acknowledgements](#acknowledgements)
 
+<!-- 
+    we are using explicit <a>, rather than markdown links,
+    for Doxygen compatibility. It cannot handle [](#sec)
+    links, and its <a> anchors are not scoped to files, so
+    we here prefix each name with the filename. Grr!
+-->
+
+> **TOC**:
+> - <a href="#v4_for-users">For users</a>
+> - <a href="#v4_for-developers">For developers</a>
+> - <a href="#v4_for-contributors">For contributors</a>
+> - <a href="#v4_acknowledgements">Acknowledgements</a>
+
+
+
+<!-- permit doxygen to reference section -->
+<a id="v4_for-users"></a>
 
 ## For users
 
+
 - **auto-deployer** <br>
   Functions like [`createQureg()`](https://quest-kit.github.io/QuEST/group__qureg__create.html#gab3a231fba4fd34ed95a330c91fcb03b3) and [`createFullStateDiagMatr()`](https://quest-kit.github.io/QuEST/group__matrices__create.html#ga3f4b64689928ea8489a4860e3a7a530f) will _automatiaclly decide_ whether to make use of the compiled and available hardware facilities, like multithreading, GPU-acceleration and distribution. The user no longer needs to consider which deployments are optimal for their simulation sizes, nor which devices have sufficient memory to fit their `Qureg`!
   <br><br>
@@ -40,6 +59,10 @@ QuEST `v4` has completely overhauled the API, software architecture, algorithms,
   The [documentation](/docs/) has been rewritten from the ground-up, and the [API doc](https://quest-kit.github.io/QuEST/topics.html) grouped into sub-categories and aesthetically overhauled with [Doxygen Awesome](https://jothepro.github.io/doxygen-awesome-css/). It is now more consistently structured, mathematically explicit, and is a treat on the eyes!
 
 
+
+<!-- permit doxygen to reference section -->
+<a id="v4_for-developers"></a>
+
 ## For developers
 
 - **new build** <br>
@@ -48,6 +71,11 @@ QuEST `v4` has completely overhauled the API, software architecture, algorithms,
 - **easier integration** <br>
   QuEST's backend now uses the standard `C++` [complex primitive](https://en.cppreference.com/w/cpp/numeric/complex) to represent quantum amplitudes and matrix elements, made precision agnostic via new [`qcomp`]([`qcomp`](https://quest-kit.github.io/QuEST/group__types.html#ga4971f489e74bb185b9b2672c14301983)) type. Further, [dense matrices](https://quest-kit.github.io/QuEST/structCompMatr.html) now have both 1D row-major and 2D (aliasing the 1D) memory pointers. This permits `Qureg` and matrix data to be seamlessly accessed by third-party libraries, such as for linear algebra, without the need for adapters nor expensive copying.
 
+
+
+<!-- permit doxygen to reference section -->
+<a id="v4_for-contributors"></a>
+
 ## For contributors
 
 - **modular architecture** <br>
@@ -62,6 +90,9 @@ QuEST `v4` has completely overhauled the API, software architecture, algorithms,
   This greatly aids the development process and helps spot bugs earlier, as well as making the assumptions more explicit and ergo the code easier to read and understand.
 
 
+<!-- permit doxygen to reference section -->
+<a id="v4_acknowledgements"></a>
+
 ## Acknowledgements
 
 QuEST `v4` development was lead by [Tyson Jones](https://tysonjones.io/index.html), with notable contributions from [Oliver Thomson Brown](https://www.epcc.ed.ac.uk/about-us/our-team/dr-oliver-brown), [Richard Meister](https://github.com/rrmeister), [Erich Essmann](https://www.research.ed.ac.uk/en/persons/erich-essmann), [Ali Rezaei](https://www.research.ed.ac.uk/en/persons/ali-rezaei) and [Simon C. Benjamin](https://www.materials.ox.ac.uk/peoplepages/benjamin.html). Development was financially supported by the UK National Quantum Computing centre (_NQCC200921_), the [UKRI SEEQA](https://gtr.ukri.org/projects?ref=EP%2FY004655%2F1#/tabOverview) project, the University of Oxford, and the University of Edinburgh’s Chancellor’s Fellowship scheme. Developer time was contributed by [AMD](https://www.amd.com/en.html), the [QTechTheory](https://qtechtheory.org/) group at the University of Oxford, the [EPCC](https://www.epcc.ed.ac.uk/) of the University of Edinburgh, and [Quantum Motion Technologies](https://quantummotion.tech/). Many helpful discussions were had with, and troubleshooting support given by, [NVIDIA](https://www.nvidia.com)'s [cuQuantum](https://developer.nvidia.com/cuquantum-sdk) team.
diff --git a/examples/README.md b/examples/README.md
index a70a78864..7fe49d61c 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,369 +1,12 @@
+# 🔖  Examples
+
 <!--
   Examples and tutorials
+  (this comment must be under the title for valid doxygen rendering)
   
   @author Tyson Jones
 -->
 
-# Examples
-
-The above folders contain example `C` and `C++` files which use QuEST's [API](https://quest-kit.github.io/QuEST/group__api.html), helping illustrate how to use specific functions. Instructions for compiling and running them are given in [`compile.md`](/docs/compile.md#tests) and [`run.md`](/docs/run.md#tests) respectively.
-
-
-# Tutorial
-
-QuEST is included into a `C` or `C++` project via
-```C++
-#include "quest.h"
-```
-
-> [!TIP]
-> Some of QuEST's deprecated `v3` API can be accessed by specifying `ENABLE_DEPRECATED_API` when [compiling](/docs/compile.md), or defining it before import, i.e. 
-> ```C++
-> #define ENABLE_DEPRECATED_API 1
-> #include "quest.h"
-> ```
-> We recommend migrating to the latest `v4` API however, demonstrated below.
-
-Simulation typically proceeds as:
-1. [Initialise](https://quest-kit.github.io/QuEST/group__environment.html#gab89cfc1bf94265f4503d504b02cf54d4) the QuEST [environment](https://quest-kit.github.io/QuEST/group__environment.html), preparing available GPUs and networks.
-2. [Configure](https://quest-kit.github.io/QuEST/group__debug.html) the environment, such as through [seeding](https://quest-kit.github.io/QuEST/group__debug__seed.html).
-3. [Create](https://quest-kit.github.io/QuEST/group__qureg__create.html) a [`Qureg`](https://quest-kit.github.io/QuEST/structQureg.html), allocating memory for its amplitudes.
-4. Prepare its [initial state](https://quest-kit.github.io/QuEST/group__initialisations.html), overwriting its amplitudes.
-5. Apply [operators](https://quest-kit.github.io/QuEST/group__operations.html) and [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html), expressed as [matrices](https://quest-kit.github.io/QuEST/group__matrices.html) and [channels](https://quest-kit.github.io/QuEST/group__channels.html).
-6. Perform [calculations](https://quest-kit.github.io/QuEST/group__calculations.html), potentially using [Pauli](https://quest-kit.github.io/QuEST/group__paulis.html) observables.
-7. [Report](https://quest-kit.github.io/QuEST/group__types.html) or log the results to file.
-8. Destroy any heap-allocated [`Qureg`](https://quest-kit.github.io/QuEST/group__qureg__destroy.html) or [matrices](https://quest-kit.github.io/QuEST/group__matrices__destroy.html).
-8. [Finalise](https://quest-kit.github.io/QuEST/group__environment.html#ga428faad4d68abab20f662273fff27e39) the QuEST environment.
-
-Of course, the procedure is limited only by the programmers imagination `¯\_(ツ)_/¯` Let's see an example of these steps below.
-
-
-## 1. Initialise the environment
-
-Before calling any other QuEST functions, we must [_initialise_](https://quest-kit.github.io/QuEST/group__environment.html#gab89cfc1bf94265f4503d504b02cf54d4) the QuEST [_environment_](https://quest-kit.github.io/QuEST/group__environment.html).
-```C++
-initQuESTEnv();
-```
-This does several things, such as
-- assessing which hardware accelerations (multithreading, GPU-acceleration, distribution, cuQuantum) were compiled and are currently available to use.
-- initialising any external libraries as needed, like MPI, CUDA and cuQuantum.
-- seeding the random number generators (informing measurements and random states), using a [CSPRNG](https://en.wikipedia.org/wiki/Cryptographically_secure_pseudorandom_number_generator) if available.
-
-We could instead forcefully [disable](https://quest-kit.github.io/QuEST/group__environment.html#ga485268e52f838743357e7a4c8c241e57) certain hardware accelerations
-```C++
-int useMPI = 0;
-int useGPU = 0;
-int useOMP = 0;
-initCustomQuESTEnv(useMPI, useGPU, useOMP);
-```
-
-> [!TIP]
-> We recommend enabling _all_ deployments, as automated by `initQuESTEnv()`, which
-> permits QuEST to choose how to best accelerate subsequently created `Qureg`.
-
-We can [view](https://quest-kit.github.io/QuEST/group__environment.html#ga08bf98478c4bf21b0759fa7cd4a97496) the environment configuration at runtime, via
-```C++
-reportQuESTEnv();
-```
-which might output something like
-```
-QuEST execution environment:
-  [precision]
-    qreal.................double (8 bytes)
-    qcomp.................std::__1::complex<double> (16 bytes)
-    qindex................long long int (8 bytes)
-    validationEpsilon.....1e-12
-  [compilation]
-    isMpiCompiled...........1
-    isGpuCompiled...........1
-    isOmpCompiled...........1
-    isCuQuantumCompiled.....0
-  [deployment]
-    isMpiEnabled.....0
-    isGpuEnabled.....1
-    isOmpEnabled.....1
-  [cpu]
-    numCpuCores.......10 per machine
-    numOmpProcs.......10 per machine
-    numOmpThrds.......8 per node
-    cpuMemory.........32 GiB per node
-    cpuMemoryFree.....7.1 GiB per node
-  [gpu]
-    numGpus...........1
-    gpuDirect.........1
-    gpuMemPools.......1
-    gpuMemory.........15.9 GiB per node
-    gpuMemoryFree.....15.2 GiB per node
-    gpuCache..........1 GiB
-  [distribution]
-    isMpiGpuAware.....0
-    numMpiNodes.......8
-  [statevector limits]
-    minQubitsForMpi.............3
-    maxQubitsForCpu.............30
-    maxQubitsForGpu.............29
-    maxQubitsForMpiCpu..........35
-    maxQubitsForMpiGpu..........34
-    maxQubitsForMemOverflow.....59
-    maxQubitsForIndOverflow.....63
-  [density matrix limits]
-    minQubitsForMpi.............2
-    maxQubitsForCpu.............15
-    maxQubitsForGpu.............14
-    maxQubitsForMpiCpu..........17
-    maxQubitsForMpiGpu..........16
-    maxQubitsForMemOverflow.....29
-    maxQubitsForIndOverflow.....31
-  [statevector autodeployment]
-    8 qubits.....[omp]
-    12 qubits....[gpu]
-    29 qubits....[gpu] [mpi]
-  [density matrix autodeployment]
-    4 qubits.....[omp]
-    6 qubits.....[gpu]
-    15 qubits....[gpu] [mpi]
-```
-
-We can also [obtain](https://quest-kit.github.io/QuEST/group__environment.html#ga6b9e84b462a999a1fbb9a372f990c491) some of the environment information [programmatically](https://quest-kit.github.io/QuEST/structQuESTEnv.html)
-```C++
-QuESTEnv env = getQuESTEnv();
-
-if (env.isGpuAccelerated)
-   printf("vroom vroom");
-```
-
-
-## 2. Configure the environment
-
-Configuring the environment is ordinarily not necessary, but convenient in certain applications.
-
-For example, we may wish our simulations to deterministically obtain the same measurement outcomes and random states as a previous or future run, and ergo choose to [override](https://quest-kit.github.io/QuEST/group__debug__seed.html#ga9e3a6de413901afbf50690573add1587) the default seeds.
-```C++
-unsigned seeds[] = {123u, 1u << 10};
-setSeeds(seeds, 2);
-```
-
-We may wish further to [adjust](https://quest-kit.github.io/QuEST/group__debug__reporting.html) how subsequent functions will display information to the screen
-```C++
-int maxRows = 8;
-int maxCols = 4;
-setMaxNumReportedItems(maxRows, maxCols);
-setMaxNumReportedSigFigs(3);
-```
-or [add](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga29413703d609254244d6b13c663e6e06) extra spacing between QuEST's printed outputs
-```C++
-setNumReportedNewlines(3);
-```
-
-Perhaps we also wish to relax the [precision](https://quest-kit.github.io/QuEST/group__debug__validation.html#gae395568df6def76045ec1881fcb4e6d1) with which our future inputs will be asserted unitary or Hermitian
-```C++
-setValidationEpsilon(0.001);
-```
-but when unitarity _is_ violated, or we otherwise pass an invalid input, we wish to execute a [custom function](https://quest-kit.github.io/QuEST/group__debug__validation.html#ga14b6e7ce08465e36750da3acbc41062f) before exiting.
-```C++
-#include <stdlib.h>
-
-void myErrorHandler(const char *func, const char *msg) {
-    printf("QuEST function '%s' encountered error '%s'\n", func, msg);
-    printf("Exiting...\n");
-    exit(1);
-}
-
-setInputErrorHandler(myErrorHandler);
-```
-`C++` users may prefer to throw an exception which can be caught, safely permitting execution to continue. In such cases, the erroneous function will _never_ corrupt any passed inputs like `Qureg` nor matrices, nor cause leaks.
-```C++
-#include <stdexcept>
-#include <string>
-
-void myErrorHandlerA(const char* errFunc, const char* errMsg) {
-    std::string func(errFunc);
-    std::string msg(errMsg);
-    throw std::runtime_error(func + ": " + msg);
-}
-
-setInputErrorHandler(myErrorHandler);
-```
-
-## 3. Create a `Qureg`
-
-To [create](https://quest-kit.github.io/QuEST/group__qureg__create.html) a statevector of `10` qubits, we call
-```C++
-Qureg qureg = createQureg(10);
-```
-which we can [verify](https://quest-kit.github.io/QuEST/group__qureg__report.html#ga2a9df2538e537332b1aef8596ce337b2) has begun in the very boring zero state.
-```C++
-reportQureg(qureg);
-```
-```
-Qureg (10 qubit statevector, 1024 qcomps, 16.1 KiB):
-    1  |0⟩
-    0  |1⟩
-    0  |2⟩
-    0  |3⟩
-    ⋮
-    0  |1020⟩
-    0  |1021⟩
-    0  |1022⟩
-    0  |1023⟩
-```
-> This printed only `8` amplitudes as per our setting of [`setMaxNumReportedItems()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga093c985b1970a0fd8616c01b9825979a) above.
-
-Behind the scenes, the function `createQureg` did something clever; it consulted the compiled deployments and available hardware to decide whether to distribute `qureg`, or dedicate it persistent GPU memory, and marked whether or not to multithread its subsequent modification. It attempts to choose _optimally_, avoiding gratuitous parallelisation if the overheads outweigh the benefits, or if the hardware devices have insufficient memory.
-
-We call this **_auto-deployment_**, and the chosen configuration can be [previewed](https://quest-kit.github.io/QuEST/group__qureg__report.html#ga97d96af7c7ea7b31e32cbe3b25377e09) via
-```C++
-reportQuregParams(qureg);
-```
-```
-Qureg:
-  [deployment]
-    isMpiEnabled.....0
-    isGpuEnabled.....0
-    isOmpEnabled.....1
-  [dimension]
-    isDensMatr.....0
-    numQubits......10
-    numCols........N/A
-    numAmps........2^10 = 1024
-  [distribution]
-    numNodes.....N/A
-    numCols......N/A
-    numAmps......N/A
-  [memory]
-    cpuAmps...........16 KiB
-    gpuAmps...........N/A
-    cpuCommBuffer.....N/A
-    gpuCommBuffer.....N/A
-    globalTotal.......16 KiB
-```
-The above output informs us that the `qureg` has not been distributed nor GPU-accelerated, but _will_ be multithreaded.
-
-If we so wished, we could [_force_](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga619bbba1cbc2f7f9bbf3d3b86b3f02be) the use of all deployments available to the environment
-```C++
-Qureg qureg = createForcedQureg(10);
-reportQuregParams(qureg);
-```
-```
-Qureg:
-  [deployment]
-    isMpiEnabled.....1
-    isGpuEnabled.....1
-    isOmpEnabled.....1
-  [dimension]
-    isDensMatr.....0
-    numQubits......10
-    numCols........N/A
-    numAmps........2^10 = 1024
-  [distribution]
-    numNodes.....2^3 = 8
-    numCols......N/A
-    numAmps......2^7 = 128 per node
-  [memory]
-    cpuAmps...........2 KiB per node
-    gpuAmps...........2 KiB per node
-    cpuCommBuffer.....2 KiB per node
-    gpuCommBuffer.....2 KiB per node
-    globalTotal.......64 KiB
-```
-or [select](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga849971f43e246d103da1731d0901f2e6) specific deployments
-```C++
-int useMPI = 1;
-int useGPU = 0;
-int useOMP = 0;
-Qureg qureg = createCustomQureg(10, 0, useMPI, useGPU, useOMP);
-```
-
-In lieu of a statevector, we could create a [density matrix](https://quest-kit.github.io/QuEST/group__qureg__create.html#ga1470424b0836ae18b5baab210aedf5d9)
-```C++
-Qureg qureg = createDensityQureg(10);
-```
-which is also auto-deployed. Note this contains _square_ as many amplitudes as the equal-dimensin statevector, and ergo requires _square_ as much memory.
-```C++
-reportQureg(qureg);
-reportQuregParams(qureg);
-```
-```
-Qureg (10 qubit density matrix, 1024x1024 qcomps, 16 MiB):
-    1  0  …  0  0
-    0  0  …  0  0
-    0  0  …  0  0
-    0  0  …  0  0
-    ⋮
-    0  0  …  0  0
-    0  0  …  0  0
-    0  0  …  0  0
-    0  0  …  0  0
-
-
-Qureg:
-  ...
-  [dimension]
-    isDensMatr.....1
-    numQubits......10
-    numCols........2^10 = 1024
-    numAmps........2^20 = 1048576
-  ...
-  [memory]
-    cpuAmps...........16 MiB
-    ...
-    globalTotal.......16 MiB
-```
-
-> The spacing between the outputs of those two consecutive QuEST functions was determined by our earlier call to [`setMaxNumReportedSigFigs()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga29413703d609254244d6b13c663e6e06).
-
-
-A density matrix `Qureg` can model classical uncertainty as results from [decoherence](https://quest-kit.github.io/QuEST/group__decoherence.html), and proves useful when simulating quantum operations on a noisy quantum computer.
-
-
-## 4. Prepare an initial state
-
-In lieu of manually [modifying](https://quest-kit.github.io/QuEST/group__init__amps.html) the state amplitudes, QuEST includes functions to prepare a `Qureg` in some common [initial states](https://quest-kit.github.io/QuEST/group__init__states.html)
-
-```C++
-initZeroState(qureg);         // |0> or |0><0|
-initPlusState(qureg);         // |+> or |+><+|
-initClassicalState(qureg, i); // |i> or |i><i|
-initPureState(rho, psi);      // rho = |psi><psi|
-```
-or random states
-```C++
-initRandomPureState(psi);
-
-int numPureStates = 15;
-initRandomMixedState(rho, numPureStates);
-
-reportQureg(psi);
-reportQureg(rho);
-```
-```
-Qureg (5 qubit statevector, 32 qcomps, 616 bytes):
-    0.0884-0.164i     |0⟩
-    0.149+0.207i      |1⟩
-    0.232+0.0656i     |2⟩
-    -0.0435+0.0332i   |3⟩
-            ⋮
-    -0.108-0.0431i    |28⟩
-    -0.0161-0.121i    |29⟩
-    -0.0463+0.00341i  |30⟩
-    -0.0491-0.186i    |31⟩
-
-
-Qureg (5 qubit density matrix, 32x32 qcomps, 16.1 KiB):
-    0.0256+(1.08e-19)i  -0.000876+0.00412i  …  0.000912+0.00869i   -0.00597+0.00615i
-    -0.000876-0.00412i  0.033-(6.78e-20)i   …  0.000223+0.00369i   -0.00207+0.00451i
-    -0.00443-0.00871i   0.0155-0.000843i    …  0.00375+0.00669i    (8.5e-5)-0.000851i
-    0.00287-0.00397i    0.00637-0.000315i   …  0.00486+0.00218i    0.00268+0.0053i
-             ⋮
-    -0.00385-0.000732i  0.00965+0.00542i    …  0.00162-0.0112i     0.00404+0.00685i
-    0.00491+0.00245i    -0.000319+0.0021i   …  -0.00902-0.00312i   -0.00465+0.00275i
-    0.000912-0.00869i   0.000223-0.00369i   …  0.0183+(1.32e-19)i  0.000509+0.00401i
-    -0.00597-0.00615i   -0.00207-0.00451i   …  0.000509-0.00401i   0.0173+(3.12e-19)i
-```
-
-> The number of printed significant figures above results from our earlier calling of [`setMaxNumReportedSigFigs()`](https://quest-kit.github.io/QuEST/group__debug__reporting.html#ga15d46e5d813f70b587762814964e1994).
-
-
-## 5. Apply operators
-
-> TODO
\ No newline at end of file
+The above folders contain example `C` and `C++` files which use QuEST's [API](https://quest-kit.github.io/QuEST/group__api.html), helping illustrate how to use specific functions. Instructions for compiling and running them are given in [`compile.md`](/docs/compile.md#tests) and [`launch.md`](/docs/launch.md#tests) respectively.
+<!-- @todo the above links would fail Doxygen, which does not recognise the #section syntax.
+     no problem here however because doxygen fails to render this page all together -->
diff --git a/quest/include/channels.h b/quest/include/channels.h
index 5dd62cc13..b8c36799b 100644
--- a/quest/include/channels.h
+++ b/quest/include/channels.h
@@ -335,22 +335,38 @@ extern "C" {
     // C then overloads setKrausMap() to call the above VLA when given arrays, using C11 Generics.
     // See the doc of getCompMatr1() in matrices.h for an explanation of Generic, and its nuances.
 
-    /// @ingroup channels_setters
-    /// @notdoced
+    /// @neverdoced
     #define setKrausMap(map, ...) \
         _Generic((__VA_ARGS__), \
             qcomp*** : setKrausMap, \
             default  : _setKrausMapFromArr \
         )((map), (__VA_ARGS__))
 
-    /// @ingroup channels_setters
-    /// @notdoced
+    /// @neverdoced
     #define setSuperOp(op, ...) \
         _Generic((__VA_ARGS__), \
             qcomp** : setSuperOp, \
             default : _setSuperOpFromArr \
         )((op), (__VA_ARGS__))
 
+    // spoofing macros as functions
+    #if 0
+
+        /// @ingroup channels_setters
+        /// @notdoced
+        /// @conly
+        /// @macrodoc
+        void setKrausMap(KrausMap map, qcomp matrices[map.numMatrices][map.numRows][map.numRows]);
+
+        /// @ingroup channels_setters
+        /// @notdoced
+        /// @conly
+        /// @macrodoc
+        void setSuperOp(SuperOp op, qcomp matrix[op.numRows][op.numRows]);
+
+    #endif
+
+
 #else
 
     // MSVC's C11 does not support C99 VLAs, so there is no way to support _setKrausMapFromArr(),
@@ -383,11 +399,13 @@ extern "C" {
 
     /// @ingroup channels_setters
     /// @notdoced
+    /// @cpponly
     void setInlineKrausMap(KrausMap map, int numQb, int numOps, std::vector<std::vector<std::vector<qcomp>>> matrices);
 
 
     /// @ingroup channels_setters
     /// @notdoced
+    /// @cpponly
     void setInlineSuperOp(SuperOp op, int numQb, std::vector<std::vector<qcomp>> matrix);
 
 
@@ -422,17 +440,30 @@ extern "C" {
     }
 
 
-    /// @ingroup channels_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineKrausMap(map, numQb, numOps, ...) \
         _setInlineKrausMap((map), (numQb), (numOps), (qcomp[(numOps)][1<<(numQb)][1<<(numQb)]) __VA_ARGS__)
 
 
-    /// @ingroup channels_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineSuperOp(matr, numQb, ...) \
         _setInlineSuperOp((matr), (numQb), (qcomp[1<<(2*(numQb))][1<<(2*(numQb))]) __VA_ARGS__)
 
+    // spoofing macros as functions
+    #if 0
+
+        /// @ingroup channels_setters
+        /// @notdoced
+        /// @macrodoc
+        void setInlineKrausMap(KrausMap map, int numQb, int numOps, {{{ matrices }}});
+
+        /// @ingroup channels_setters
+        /// @notdoced
+        /// @macrodoc
+        void setInlineSuperOp(SuperOp op, int numQb, {{ matrix }});
+
+    #endif
+
 #else
 
     // MSVC's C11 does not support C99 VLA, so the inner *FromArr() functions have not
@@ -458,11 +489,13 @@ extern "C" {
 
     /// @ingroup channels_create
     /// @notdoced
+    /// @cpponly
     KrausMap createInlineKrausMap(int numQubits, int numOperators, std::vector<std::vector<std::vector<qcomp>>> matrices);
 
 
     /// @ingroup channels_create
     /// @notdoced
+    /// @cpponly
     SuperOp createInlineSuperOp(int numQubits, std::vector<std::vector<qcomp>> matrix);
 
 
@@ -499,17 +532,29 @@ extern "C" {
     }
 
 
-    /// @ingroup channels_create
-    /// @notdoced
+    /// @neverdoced
     #define createInlineKrausMap(numQb, numOps, ...) \
         _createInlineKrausMap((numQb), (numOps), (qcomp[(numOps)][1<<(numQb)][1<<(numQb)]) __VA_ARGS__)
 
-
-    /// @ingroup channels_create
-    /// @notdoced
+    /// @neverdoced
     #define createInlineSuperOp(numQb, ...) \
         _createInlineSuperOp((numQb), (qcomp[1<<(2*(numQb))][1<<(2*(numQb))]) __VA_ARGS__)
 
+    // spoofing macros as functions
+    #if 0
+
+        /// @ingroup channels_create
+        /// @notdoced
+        /// @macrodoc
+        KrausMap createInlineKrausMap(int numQb, int numOps, {{{ matrices }}});
+
+        /// @ingroup channels_create
+        /// @notdoced
+        /// @macrodoc
+        SuperOp createInlineSuperOp(int numQb, {{ matrix }});
+
+    #endif
+
 #else
 
     // MSVC's C11 does not support C99 VLA, so none of the necessary inner functions are defined,
diff --git a/quest/include/matrices.h b/quest/include/matrices.h
index c613a6607..d901623fb 100644
--- a/quest/include/matrices.h
+++ b/quest/include/matrices.h
@@ -471,8 +471,7 @@ static inline CompMatr2 _getCompMatr2FromArr(qcomp in[4][4]) {
     //   e.g. default: _Pragma("GCC error \"arg not allowed\"").
     
 
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getCompMatr1(...) \
         _Generic((__VA_ARGS__), \
             qcomp** : getCompMatr1, \
@@ -480,14 +479,17 @@ static inline CompMatr2 _getCompMatr2FromArr(qcomp in[4][4]) {
         )((__VA_ARGS__))
 
 
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getCompMatr2(...) \
         _Generic((__VA_ARGS__), \
             qcomp** : getCompMatr2, \
             default : _getCompMatr2FromArr \
         )((__VA_ARGS__))
 
+
+    // note the above macros do not need explicit, separate doxygen
+    // doc because the C++ overloads above it have identical signatures
+
 #endif
 
 
@@ -509,27 +511,19 @@ static inline CompMatr2 _getCompMatr2FromArr(qcomp in[4][4]) {
 
     // C++ merely invokes the std::vector initialiser overload
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineCompMatr1(...) \
         getCompMatr1(__VA_ARGS__)
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineCompMatr2(...) \
         getCompMatr2(__VA_ARGS__)
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineDiagMatr1(...) \
         getDiagMatr1(__VA_ARGS__)
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineDiagMatr2(...) \
         getDiagMatr2(__VA_ARGS__)
 
@@ -538,30 +532,46 @@ static inline CompMatr2 _getCompMatr2FromArr(qcomp in[4][4]) {
     // C adds compound literal syntax to make a temporary array. Helpfully, 
     // explicitly specifying the DiagMatr dimension enables defaulting-to-zero
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineCompMatr1(...) \
         _getCompMatr1FromArr((qcomp[2][2]) __VA_ARGS__)
 
-
-    /// @ingroup matrices_getters
-    /// @notdoced
+    /// @neverdoced
     #define getInlineCompMatr2(...) \
         _getCompMatr2FromArr((qcomp[4][4]) __VA_ARGS__)
 
+    /// @neverdoced
+    #define getInlineDiagMatr1(...) \
+        getDiagMatr1((qcomp[2]) __VA_ARGS__)
+
+    /// @neverdoced
+    #define getInlineDiagMatr2(...) \
+        getDiagMatr2((qcomp[4]) __VA_ARGS__)
+
+#endif
+
+// spoofing above macros as functions to doc
+#if 0
 
     /// @ingroup matrices_getters
     /// @notdoced
-    #define getInlineDiagMatr1(...) \
-        getDiagMatr1((qcomp[2]) __VA_ARGS__)
+    /// @macrodoc
+    CompMatr1 getInlineCompMatr1({{ matrix }});
 
+    /// @ingroup matrices_getters
+    /// @notdoced
+    /// @macrodoc
+    CompMatr2 getInlineCompMatr2({{ matrix }});
 
     /// @ingroup matrices_getters
     /// @notdoced
-    #define getInlineDiagMatr2(...) \
-        getDiagMatr2((qcomp[4]) __VA_ARGS__)
+    /// @macrodoc
+    DiagMatr1 getInlineDiagMatr1({ list });
 
+    /// @ingroup matrices_getters
+    /// @notdoced
+    /// @macrodoc
+    DiagMatr2 getInlineDiagMatr2({ list });
 
 #endif
 
@@ -749,14 +759,24 @@ extern "C" {
     // See the doc of getCompMatr1() above for an explanation of Generic, and its nuances
 
 
-    /// @ingroup matrices_setters
-    /// @notdoced
+    /// @neverdoced
     #define setCompMatr(matr, ...) \
         _Generic((__VA_ARGS__), \
             qcomp** : setCompMatr, \
             default : _setCompMatrFromArr \
         )((matr), (__VA_ARGS__))
 
+    // spoofing above macro as functions to doc
+    #if 0
+
+        /// @ingroup matrices_setters
+        /// @notdoced
+        /// @macrodoc
+        /// @conly
+        void setCompMatr(CompMatr matr, qcomp arr[matr.numRows][matr.numRows]);
+
+    #endif
+
 
     // no need to define bespoke overload for diagonal matrices, because 1D arrays decay to pointers
 
@@ -791,17 +811,20 @@ extern "C" {
 
     /// @ingroup matrices_setters
     /// @notdoced
+    /// @cpponly
     void setInlineCompMatr(CompMatr matr, int numQb, std::vector<std::vector<qcomp>> in);
 
 
     /// @ingroup matrices_setters
     /// @notdoced
+    /// @cpponly
     void setInlineDiagMatr(DiagMatr matr, int numQb, std::vector<qcomp> in);
 
 
     /// @ingroup matrices_setters
     /// @notdoced
     /// @nottested
+    /// @cpponly
     void setInlineFullStateDiagMatr(FullStateDiagMatr matr, qindex startInd, qindex numElems, std::vector<qcomp> in);
 
 
@@ -848,24 +871,40 @@ extern "C" {
     // unexpectedly re-evaluating user expressions due to its repetition in the macro
 
 
-    /// @ingroup matrices_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineCompMatr(matr, numQb, ...) \
         _setInlineCompMatr((matr), (numQb), (qcomp[1<<(numQb)][1<<(numQb)]) __VA_ARGS__)
 
-
-    /// @ingroup matrices_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineDiagMatr(matr, numQb, ...) \
         _setInlineDiagMatr((matr), (numQb), (qcomp[1<<(numQb)]) __VA_ARGS__)
 
-
-    /// @ingroup matrices_setters
-    /// @notdoced
-    /// @nottested
+    /// @neverdoced
     #define setInlineFullStateDiagMatr(matr, startInd, numElems, ...) \
         _setInlineFullStateDiagMatr((matr), (startInd), (numElems), (qcomp[(numElems)]) __VA_ARGS__)
 
+    // spoofing above macros as functions to doc
+    #if 0
+
+        /// @ingroup matrices_setters
+        /// @notdoced
+        /// @macrodoc
+        void setInlineCompMatr(CompMatr matr, int numQb, {{ matrix }});
+
+        /// @ingroup matrices_setters
+        /// @notdoced
+        /// @macrodoc
+        void setInlineDiagMatr(DiagMatr matr, int numQb, { list });
+
+        /// @ingroup matrices_setters
+        /// @nottested
+        /// @notdoced
+        /// @macrodoc
+        void setInlineFullStateDiagMatr(FullStateDiagMatr matr, qindex startInd, qindex numElems, { list });
+
+    #endif
+
+
 #else
 
     // MSVC C11 does not support C99 VLAs, so the inner functions above are illegal.
@@ -887,8 +926,7 @@ extern "C" {
     extern void _validateParamsToSetInlineFullStateDiagMatr(FullStateDiagMatr matr, qindex startInd, qindex numElems);
 
 
-    /// @ingroup matrices_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineDiagMatr(matr, numQb, ...) \
         do { \
             _validateParamsToSetInlineDiagMatr((matr), (numQb)); \
@@ -896,14 +934,16 @@ extern "C" {
         } while (0)
 
 
-    /// @ingroup matrices_setters
-    /// @notdoced
+    /// @neverdoced
     #define setInlineFullStateDiagMatr(matr, startInd, numElems, ...) \
         do { \
             _validateParamsToSetInlineFullStateDiagMatr((matr), (startInd), (numElems)); \
             setFullStateDiagMatr((matr), (startInd), (elems), (numElems)); \
         } while (0)
 
+    
+    // the above macros are documented in the previous #if branch
+
 #endif
 
 
@@ -930,11 +970,13 @@ extern "C" {
 
     /// @ingroup matrices_create
     /// @notdoced
+    /// @cpponly
     CompMatr createInlineCompMatr(int numQb, std::vector<std::vector<qcomp>> elems);
 
 
     /// @ingroup matrices_create
     /// @notdoced
+    /// @cpponly
     DiagMatr createInlineDiagMatr(int numQb, std::vector<qcomp> elems);
 
 
@@ -971,17 +1013,29 @@ extern "C" {
     }
 
 
-    /// @ingroup matrices_create
-    /// @notdoced
+    /// @neverdoced
     #define createInlineCompMatr(numQb, ...) \
         _createInlineCompMatr((numQb), (qcomp[1<<(numQb)][1<<(numQb)]) __VA_ARGS__)
 
-    
-    /// @ingroup matrices_create
-    /// @notdoced
+    /// @neverdoced
     #define createInlineDiagMatr(numQb, ...) \
         _createInlineDiagMatr((numQb), (qcomp[1<<(numQb)]) __VA_ARGS__)
 
+    // spoofing above macros as functions to doc
+    #if 0
+
+        /// @ingroup matrices_create
+        /// @notdoced
+        /// @macrodoc
+        CompMatr createInlineCompMatr(int numQb, {{ matrix }});
+
+        /// @ingroup matrices_create
+        /// @notdoced
+        /// @macrodoc
+        DiagMatr createInlineDiagMatr(int numQb, { list });
+
+    #endif
+
 #else
 
     // MSVC's C11 does not support C99 VLA, so we cannot use the above inner functions.
@@ -1002,6 +1056,11 @@ extern "C" {
 #endif
 
 
+    /// @todo
+    /// add std::vector<int> overloads for C++ users for the
+    /// below functions (missed during original overload work)
+
+
     /// @ingroup matrices_setters
     /// @notdoced
     /// @nottested
diff --git a/quest/include/modes.h b/quest/include/modes.h
index 49c5f8d43..61eaf77b4 100644
--- a/quest/include/modes.h
+++ b/quest/include/modes.h
@@ -89,6 +89,27 @@
 
 // further macros are defined in precision.h
 
+// spoofing above macro as consts to doc
+#if 0
+
+
+    /// @notdoced
+    /// @macrodoc
+    const int PERMIT_NODES_TO_SHARE_GPU = 0;
+
+
+    /// @notdoced
+    /// @macrodoc
+    const int INCLUDE_DEPRECATED_FUNCTIONS = 0;
+
+
+    /// @notdoced
+    /// @macrodoc
+    const int DISABLE_DEPRECATION_WARNINGS = 0;
+
+
+#endif
+
 
 
 // user flags for choosing automatic deployment; only accessible by C++ 
diff --git a/quest/include/operations.h b/quest/include/operations.h
index 0f2c0b8e0..ee5506df7 100644
--- a/quest/include/operations.h
+++ b/quest/include/operations.h
@@ -979,7 +979,7 @@ void applyMultiStateControlledPhaseGadget(Qureg qureg, int* controls, int* state
 
 
 /// @notdoced
-void applyPhaseFlip (Qureg qureg, int target);
+void applyPhaseFlip(Qureg qureg, int target);
 
 
 /// @notdoced
@@ -1047,7 +1047,7 @@ void applyTwoQubitPhaseShift(Qureg qureg, int target1, int target2, qreal angle)
 
 
 /// @notdoced
-void applyMultiQubitPhaseFlip (Qureg qureg, int* targets, int numTargets);
+void applyMultiQubitPhaseFlip(Qureg qureg, int* targets, int numTargets);
 
 
 /// @notdoced
diff --git a/quest/include/paulis.h b/quest/include/paulis.h
index 09b4220ae..a04ed39eb 100644
--- a/quest/include/paulis.h
+++ b/quest/include/paulis.h
@@ -101,9 +101,23 @@ typedef struct {
  */
 
 
+// base method is C and C++ compatible
 #ifdef __cplusplus
+extern "C" {
+#endif
 
-    // C++ users can access the base C method, along with direct overloads 
+    /// @ingroup paulis_create
+    /// @notdoced
+    PauliStr getPauliStr(const char* paulis, int* indices, int numPaulis);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#ifdef __cplusplus
+
+    // C++ users can access the above C method, along with direct overloads 
     // to accept integers (in lieu of chars), natural C++ string types
     // (like literals), and C++ vector types for brevity. Furthermore, C++
     // gets an overload which accepts only a string (no additional args)
@@ -113,11 +127,6 @@ typedef struct {
     // {0,3,1} are valid std::string instances, causing overload ambiguity. Blegh!
 
 
-    /// @ingroup paulis_create
-    /// @notdoced
-    extern "C" PauliStr getPauliStr(const char* paulis, int* indices, int numPaulis);
-
-
     /// @ingroup paulis_create
     /// @notdoced
     PauliStr getPauliStr(int* paulis, int* indices, int numPaulis);
@@ -141,8 +150,9 @@ typedef struct {
     PauliStr getPauliStr(std::string paulis);
 
 
-    /// @ingroup paulis_create
-    /// @notdoced
+    // never needs to be doc'd
+    /// @private
+    /// @neverdoced
     #define getInlinePauliStr(str, ...) \
         getPauliStr(str, __VA_ARGS__)
 
@@ -157,18 +167,13 @@ typedef struct {
     // many elements as claimed, avoiding seg-faults if the user provides too few indices 
 
 
-    /// @ingroup paulis_create
-    /// @notdoced
-    PauliStr getPauliStr(const char* paulis, int* indices, int numPaulis);
-
-
     /// @ingroup paulis_create
     /// @private
     PauliStr _getPauliStrFromInts(int* paulis, int* indices, int numPaulis);
 
 
-    /// @ingroup paulis_create
-    /// @notdoced
+    // documented above (identical signatures to C)
+    /// @neverdoced
     #define getPauliStr(paulis, ...) \
         _Generic((paulis), \
             int*    : _getPauliStrFromInts, \
@@ -176,11 +181,21 @@ typedef struct {
         )(paulis, __VA_ARGS__) 
 
 
-    /// @ingroup paulis_create
-    /// @notdoced
+    // documented below
+    /// @neverdoced
     #define getInlinePauliStr(str, ...) \
         getPauliStr((str), (int[sizeof(str)-1]) __VA_ARGS__, sizeof(str)-1)
 
+    // spoofing above macro as function to doc
+    #if 0
+
+        /// @ingroup paulis_create
+        /// @notdoced
+        /// @macrodoc
+        PauliStr getInlinePauliStr(const char* paulis, { list });
+
+    #endif
+
 
 #endif
 
@@ -249,6 +264,7 @@ extern "C" {
     /// @cpponly
     PauliStrSum createPauliStrSumFromReversedFile(std::string fn);
 
+
 #endif
 
 
@@ -268,6 +284,7 @@ extern "C" {
     /// @notdoced
     void destroyPauliStrSum(PauliStrSum sum);
 
+
 // end de-mangler
 #ifdef __cplusplus
 }
diff --git a/quest/include/precision.h b/quest/include/precision.h
index b39a18fad..b4540b3fc 100644
--- a/quest/include/precision.h
+++ b/quest/include/precision.h
@@ -30,8 +30,20 @@
 // benefit in shrinking the type size and facing the associated precision risks. Similarly,
 // there is little benefit in making it larger since a 'long long int' can represent 62 qubits,
 // which is already well beyond simulability, requiring 64 EiB total at double precision.
+// Still, we use a #define, rather than a typedef, so that the value can be compile-time overridden.
+
+/// @neverdoced
 #define INDEX_TYPE long long int
 
+// spoofing above macro as const to doc
+#if 0
+
+    /// @notdoced
+    /// @macrodoc
+    typedef long long int INDEX_TYPE;
+
+#endif
+
 
 
 /*
@@ -45,8 +57,19 @@
 // base-4 numeral encoding the Pauli string. A single 64-bit 'long long unsigned' can ergo
 // specify only 32 qubits, whereas two can specify more qubits (64) than we can simulate.
 // This type is defined purely to avoid littering the source with explicit typing.
+
+/// @neverdoced
 #define PAULI_MASK_TYPE long long unsigned int
 
+// spoofing above macro as typedef to doc
+#if 0
+
+    /// @notdoced
+    /// @macrodoc
+    typedef long long unsigned int PAULI_MASK_TYPE;
+
+#endif
+
 
 
 /*
@@ -61,7 +84,7 @@
 // validate precision is 1 (float), 2 (double) or 4 (long double)
 #if ! (FLOAT_PRECISION == 1 || FLOAT_PRECISION == 2 || FLOAT_PRECISION == 4)
     #error "FLOAT_PRECISION must be 1 (float), 2 (double) or 4 (long double)"
-#endif 
+#endif
 
 // infer floating-point type from precision
 #if FLOAT_PRECISION == 1
@@ -72,6 +95,19 @@
     #define FLOAT_TYPE long double
 #endif
 
+// spoofing above macros as typedefs and consts to doc
+#if 0
+
+    /// @notdoced
+    /// @macrodoc
+    const int FLOAT_PRECISION = 2;
+
+    /// @notdoced
+    /// @macrodoc
+    typedef double int FLOAT_TYPE;
+
+#endif
+
 
 
 /*
@@ -107,6 +143,15 @@
 
 #endif
 
+// spoofing above macros as typedefs and consts to doc
+#if 0
+
+    /// @notdoced
+    /// @macrodoc
+    const qreal DEFAULT_VALIDATION_EPSILON = 1E-12;
+
+#endif
+
 
 
 /*
@@ -124,6 +169,15 @@
     
 #endif
 
+// spoofing above macros as typedefs and consts to doc
+#if 0
+
+    /// @notdoced
+    /// @macrodoc
+    const char* QREAL_FORMAT_SPECIFIER = "%.14g";
+
+#endif
+
 
 
 #endif // PRECISION_H
diff --git a/quest/include/quest.h b/quest/include/quest.h
index a9e1632e5..e3fde6756 100644
--- a/quest/include/quest.h
+++ b/quest/include/quest.h
@@ -11,6 +11,21 @@
  * @defgroup api 📋 API
  */
 
+/**
+ * @page apilink 📋 API
+ * The API documentation can be viewed at @ref api.
+ * 
+ * We're working hard to move that page up one level. 😎
+ */
+
+/**
+ * @page testlink 🧪 Tests
+ * 
+ * The unit and integration tests can be viewed at @ref tests.
+ * 
+ * We're working hard to move that page up one level. 😎
+ */
+
 #ifndef QUEST_H
 #define QUEST_H
 
diff --git a/quest/include/types.h b/quest/include/types.h
index a04350982..b7ea5550e 100644
--- a/quest/include/types.h
+++ b/quest/include/types.h
@@ -145,8 +145,6 @@ static inline qcomp getQcomp(qreal re, qreal im) {
 // C11 arithmetic is already defined in complex header, and beautifully
 // permits mixing of parameterised types and precisions
 
-/// @cond EXCLUDE_FROM_DOXYGEN
-
 #ifdef __cplusplus
 
     // <complex> defines overloads between complex and same-precision floats,
@@ -168,8 +166,20 @@ static inline qcomp getQcomp(qreal re, qreal im) {
     #define DEFINE_ARITHMETIC_OVERLOADS 1
     #endif
 
+    // spoofing above macro as const to doc
+    #if 0
+
+        /// @notdoced
+        /// @macrodoc
+        const int DEFINE_ARITHMETIC_OVERLOADS = 1;
+
+    #endif
+
+
     #if DEFINE_ARITHMETIC_OVERLOADS
 
+    /// @cond EXCLUDE_FROM_DOXYGEN
+
     // shortcuts for below overload definitions
     #define COMP_TO_QCOMP(a) \
         qcomp( \
@@ -262,12 +272,12 @@ static inline qcomp getQcomp(qreal re, qreal im) {
     #undef DEFINE_ARITHMETIC_BETWEEN_COMPLEX_AND_COMPLEX
     #undef DEFINE_SINGLE_DIRECTION_ARITHMETIC_BETWEEN_COMPLEX_AND_COMPLEX
 
+    /// @endcond // EXCLUDE_FROM_DOXYGEN
+
     #endif // DEFINE_ARITHMETIC_OVERLOADS
 
 #endif
 
-/// @endcond // EXCLUDE_FROM_DOXYGEN
-
 
 
 /*
@@ -288,24 +298,32 @@ static inline qcomp getQcomp(qreal re, qreal im) {
     /// @nottested
     extern "C" void reportStr(const char* str);
 
+
     /// @notdoced
     /// @nottested
+    /// @cpponly
     void reportStr(std::string str);
 
+
     /// @notdoced
     /// @nottested
     extern "C" void reportScalar(const char* label, qcomp num);
 
+
     /// @notdoced
     /// @nottested
     void reportScalar(const char* label, qreal num);
 
+
     /// @notdoced
     /// @nottested
+    /// @cpponly
     void reportScalar(std::string label, qcomp num);
 
+
     /// @notdoced
     /// @nottested
+    /// @cpponly
     void reportScalar(std::string label, qreal num);
 
 #else
@@ -314,15 +332,18 @@ static inline qcomp getQcomp(qreal re, qreal im) {
     /// @nottested
     void reportStr(const char* str);
 
+
     /// @notdoced
     /// @nottested
-    void reportScalar      (const char* label, qcomp num);
+    void reportScalar(const char* label, qcomp num);
+
 
     /// @private
     void _reportScalar_real(const char* label, qreal num);
 
-    /// @notdoced
-    /// @nottested
+
+    // no need to be doc'd since signatures identical to C++ above
+    /// @neverdoced
     #define reportScalar(label, num) \
         _Generic((num), \
             qcomp   : reportScalar,       \
diff --git a/tests/README.md b/tests/README.md
new file mode 100644
index 000000000..adee0a8ab
--- /dev/null
+++ b/tests/README.md
@@ -0,0 +1,47 @@
+<!--
+  Tests
+  
+  @author Tyson Jones
+-->
+
+# 🧪  Tests
+
+This folder contains QuEST's extensive tests. See [`compile.md`](/docs/compile.md#tests) and [`launch.md`](/docs/launch.md#tests) to get them running.
+
+The subdirectories are:
+- [`utils`](utils/) containing test utilities, including non-optimised functions against which QuEST output is compared.
+- [`unit`](unit) containing [unit tests](https://en.wikipedia.org/wiki/Unit_testing) which test individual QuEST functions in isolation, under their entire input domains (where feasible).
+- [`integration`](integration/) containing [integration tests](https://en.wikipedia.org/wiki/Integration_testing) which test multiple QuEST functions working at scale.
+- [`deprecated`](deprecated/) containing `v3`'s tests and utilities, only used when explicitly [activated](/docs/compile.md#v3).
+
+The tests use [Catch2](https://github.com/catchorg/Catch2) and are generally structured as
+```cpp
+TEST_CASE( "someApiFunc", "[funcs]" ) {
+
+    PREPARE_TEST(...)
+
+    SECTION( "correctness" ) {
+
+        SECTION( "statevector" ) {
+
+            auto a = getApiResult();
+            auto b = getReferenceResult();
+            REQUIRE_AGREE(a, b);
+        }
+
+        SECTION( "density matrix" ) {
+
+            auto a = getApiResult();
+            auto b = getReferenceResult();s
+        }
+    }
+
+    SECTION( "validation" ) {
+
+        SECTION( "some way to mess it up" ) {
+
+            REQUIRE_THROWS( someApiFunc(badArgs) );
+        }
+    }
+}
+```
diff --git a/tests/main.cpp b/tests/main.cpp
index 8648cf446..190b58ef1 100644
--- a/tests/main.cpp
+++ b/tests/main.cpp
@@ -3,7 +3,7 @@
  *
  * @author Tyson Jones
  * 
- * @defgroup tests 🔧 Tests
+ * @defgroup tests 🧪 Tests
  * 
  * @defgroup testutils Utilities
  * @ingroup tests
diff --git a/tests/utils/macros.hpp b/tests/utils/macros.hpp
index 2a1cc28f5..7d5882051 100644
--- a/tests/utils/macros.hpp
+++ b/tests/utils/macros.hpp
@@ -47,6 +47,23 @@
 #define TEST_NUM_MIXED_DEPLOYMENT_REPETITIONS 10
 #endif
 
+// spoofing above macros as consts to doc
+#if 0
+
+    /// @macrodoc
+    const int TEST_MAX_NUM_QUBIT_PERMUTATIONS = 0;
+
+    /// @macrodoc
+    const int TEST_MAX_NUM_SUPEROP_TARGETS = 4;
+
+    /// @macrodoc
+    const int TEST_ALL_DEPLOYMENTS = 1;
+
+    /// @macrodoc
+    const int TEST_NUM_MIXED_DEPLOYMENT_REPETITIONS = 10;
+
+#endif
+
 
 /*
  * preconditions to the internal unit testing functions are checked using 
diff --git a/utils/docs/Doxyfile b/utils/docs/Doxyfile
index 435c03e8f..4a721e1c8 100644
--- a/utils/docs/Doxyfile
+++ b/utils/docs/Doxyfile
@@ -299,6 +299,9 @@ ALIASES += "nottested=@warning This function has not yet been unit tested and ma
 ALIASES += "notvalidated=@attention This function's input validation has not yet been tested, so erroneous usage may produce unexpected output. Please use with caution!"
 ALIASES += "notdoced=@note Documentation for this function or struct is under construction!"
 ALIASES += "cpponly=@remark This function is only available in C++."
+ALIASES += "conly=@remark This function is only available in C."
+ALIASES += "macrodoc=@note This entity is actually a macro."
+ALIASES += "neverdoced=@warning This entity is a macro, undocumented directly due to a Doxygen limitation. If you see this doc rendered, contact the devs!"
 ALIASES += "myexample=@par Example"
 ALIASES += "equivalence=@par Equivalences"
 ALIASES += "constraints=@par Constraints"
@@ -1013,13 +1016,31 @@ WARN_LOGFILE           =
 # Note: If this tag is empty the current directory is searched.
 
 # note we include both the include/ headers and corresponding src/api files,
-# so that doxygen can find the definitions when INLINE_SOURCES = YES
-# note too that we currently exclude the /docs and /examples folders, which
-# override the automatic expansions of the README.md sections into separate
-# pages under
+# so that doxygen can find the definitions when INLINE_SOURCES = YES.
+# we also explicitly list each file (rather than just the directory) where we
+# want to decide the ordering of the generated doxygen pages, and THEN include
+# the entire directory (in case new files are later added therein).
+# @todo /examples/ are being ignored below for some reason!
 INPUT                  = . \
-                         ./quest/include ./quest/src/api \
-                         ./tests ./tests/utils ./tests/unit ./tests/integration ./tests/deprecated
+                         ./docs/v4.md \
+                         ./docs/tutorial.md \
+                         ./docs/compile.md \
+                         ./docs/compilers.md \
+                         ./docs/cmake.md \
+                         ./docs/launch.md \
+                         ./docs/contributing.md \
+                         ./docs/architecture.md \
+                         ./docs/styleguide.md \
+                         ./docs \
+                         ./examples/README.md \
+                         ./examples \
+                         ./quest/include \
+                         ./quest/src/api \
+                         ./tests \
+                         ./tests/utils \
+                         ./tests/unit \
+                         ./tests/integration \
+                         ./tests/deprecated
 
 # This tag can be used to specify the character encoding of the source files
 # that doxygen parses. Internally doxygen uses the UTF-8 encoding. Doxygen uses