Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ Examples contained in this directory:

- [`read_write`](./read_write/README.md) - Generates code from a simple schema using a build script and then uses the generated code to deserialize and serialize a XML file during runtime.
- [`bpmn`](./bpmn/README.md) - Example that generates code for BPMN 2.0 and loads a diagram file.
- [`vsme`](./vsme/README.md) - Example that generates code for the XBRL based VSME taxonomy. It shows how a custom render step can be used to collapse the huge `xbrli:item` substitution group into a few `ItemWrapper` based types instead of generating one type per element.
2 changes: 2 additions & 0 deletions examples/vsme/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ xsd-parser-types = { workspace = true, features = [ "quick-xml" ] }

[build-dependencies]
anyhow = { workspace = true }
proc-macro2 = { workspace = true }
quote = { workspace = true }
xsd-parser = { workspace = true }

[lints]
Expand Down
51 changes: 51 additions & 0 deletions examples/vsme/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# VSME example

This example generates and uses the code for the XBRL based [VSME](https://xbrl.efrag.org/) taxonomy (the *Voluntary standard for non-listed Small- and Medium-sized Enterprises*).

XBRL instance documents store their facts as members of the `xbrli:item` substitution group. The VSME taxonomy defines **thousands** of such facts, but most of them share the same underlying type and only differ by their XML tag name (e.g. `vsme:Assets`, `vsme:AverageNumberOfAnnualTrainingHoursPerMaleEmployee`, ...). Generating a dedicated Rust type (and `enum` variant) for every single element would produce a huge, unwieldy amount of code.

## What this example shows

Instead of generating one type per element, the [build script](build.rs) installs a custom [`RenderStep`] (`FixItemType`) that:

1. Looks up the content of the `xbrli:item` element (a big `xs:choice`).
2. Groups all element variants of the choice by their type and removes them from the choice.
3. Adds a single group element per type that references a synthetic `XxxWrapped` custom type.
4. Emits a type definition and an [`ItemTags`](src/item.rs) implementation for each of those wrapper types:

```rust,ignore
pub type AmountOfEmissionToAirWrapped =
crate::item::ItemWrapper<vsme::AmountOfEmissionToAirDyn, AmountOfEmissionToAirWrappedTags>;

pub struct AmountOfEmissionToAirWrappedTags;

impl crate::item::ItemTags for AmountOfEmissionToAirWrappedTags {
fn tags() -> &'static [crate::item::ItemTag] {
static TAGS: [crate::item::ItemTag; 78] = [
crate::item::ItemTag {
tag: "vsme:Assets",
name: "Assets",
namespace: NS_VSME,
},
/* ... */
];

&TAGS
}
}
```

The [`ItemWrapper`](src/item.rs) type defined in this example provides the runtime support for this. It wraps the shared inner type and implements a `Serializer` and `Deserializer` that:

- on **deserialization**, only accept an element whose namespace-resolved tag is part of the associated `ItemTags` (so a document may use any prefix for the namespace), and remember which tag was used; and
- on **serialization**, write the value back using that remembered tag.

This way a few hundred elements collapse into just a handful of generated types while the round-trip still preserves the exact XML tag of every fact.

## Running

```sh
cargo run -p vsme
```

This deserializes [`xml/example.xml`](xml/example.xml), prints the parsed object and serializes it back to XML.
287 changes: 281 additions & 6 deletions examples/vsme/build.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,35 @@
//! This is a build script to generate the code for the `vsme` schema.

use std::cell::RefCell;
use std::collections::BTreeMap;
use std::env::var;
use std::fs::{create_dir_all, remove_dir_all};
use std::path::PathBuf;
use std::rc::Rc;

use anyhow::{Context, Error};
use proc_macro2::TokenStream;
use quote::{format_ident, quote};

use xsd_parser::models::data::TagName;
use xsd_parser::{
config::{GeneratorFlags, InterpreterFlags, OptimizerFlags, Schema},
generate_modules,
models::{meta::MetaType, Naming},
config::{GeneratorFlags, IdentQuadruple, InterpreterFlags, OptimizerFlags, Schema},
exec_generator_with_ident_cache, exec_interpreter_with_ident_cache, exec_optimizer,
exec_parser, exec_render,
models::{
code::{Module, ModulePath},
data::PathData,
meta::{
CustomMeta, ElementMeta, ElementMetaVariant, ElementMode, MetaType, MetaTypeVariant,
MetaTypes,
},
schema::{xs::FormChoiceType, Schemas},
ElementIdent, IdentType, Naming,
},
pipeline::{
generator::{Context as GeneratorContext, Error as GeneratorError},
renderer::{MetaData, RenderStep, RenderStepType},
},
traits::{NameBuilder as NameBuilderTrait, Naming as NamingTrait},
Config, Name, TypeIdent,
};
Expand All @@ -26,16 +46,33 @@ fn main() -> Result<(), Error> {
.context("Missing or invalid schema file!")?;

// This is almost the starting point defined in the main `[README.md]`.
let fix_item_type = FixItemType::default();
let config = Config::default()
.with_schema(Schema::File(schema_file))
.with_generate([(IdentType::Element, "xbrli:xbrl")])
.with_interpreter_flags(InterpreterFlags::all() - InterpreterFlags::WITH_NUM_BIG_INT)
.with_optimizer_flags(OptimizerFlags::all())
.with_generator_flags(GeneratorFlags::all() - GeneratorFlags::ADVANCED_ENUMS)
.with_naming(CustomNaming::default())
.with_quick_xml();
.with_quick_xml()
.with_render_step(fix_item_type.clone());

// Generate the code based on the configuration above.
let modules = generate_modules(config)?;
// Generate the code based on the configuration above. We run the pipeline
// manually instead of using `generate_modules`, because we need to inject
// the `FixItemType` transformation in between the optimizer and the
// generator.
let schemas = exec_parser(config.parser)?;
let (meta_types, ident_cache) =
exec_interpreter_with_ident_cache(config.interpreter, &schemas)?;
let meta_types = exec_optimizer(config.optimizer, meta_types)?;
let meta_types = fix_item_type.prepare_types(meta_types, &schemas)?;
let data_types = exec_generator_with_ident_cache(
config.generator,
&schemas,
Some(&ident_cache),
&meta_types,
)?;
let modules = exec_render(config.renderer, &data_types)?;

// Write the generated code to the module directory specified by Cargo.
let target_dir = cargo_dir.join("src/schema");
Expand Down Expand Up @@ -99,3 +136,241 @@ impl NamingTrait for CustomNaming {
format!("{s}_attr")
}
}

/// Render step (and meta type transformation) that collapses the many concrete
/// element variants of the `xbrli:item` choice into a few `ItemWrapper` based
/// types.
///
/// XBRL defines hundreds of facts as members of the `xbrli:item` substitution
/// group. Most of them share the same (Rust) type and only differ by their XML
/// tag name. Generating a dedicated enum variant for each of them would be
/// wasteful, so we instead group the elements by their type and represent each
/// group with a single [`ItemWrapper`](crate::item::ItemWrapper) that keeps the
/// list of supported tags around at runtime.
#[derive(Default, Debug, Clone)]
struct FixItemType(Rc<RefCell<Vec<SharedWrapped>>>);

/// A [`WrappedType`] that is shared between the custom generator step (which
/// resolves and stores the path to the concrete type) and the render step
/// (which uses that information to render the actual code).
type SharedWrapped = Rc<RefCell<WrappedType>>;

/// Information about a single synthetic `XxxWrapped` type.
#[derive(Debug)]
struct WrappedType {
/// Identifier of the synthetic `XxxWrapped` custom type.
ident: TypeIdent,

/// Identifier of the concrete type that is shared by all elements
/// represented by this wrapper.
type_: TypeIdent,

/// XML tags of all elements that are represented by this wrapper.
tags: Vec<TagInfo>,

/// Path to the concrete type relative to the root module (e.g.
/// `vsme :: AmountOfEmissionToAirDyn`). This is resolved and stored by the
/// custom generator step (see [`WrappedType::resolve`]).
target_type: Option<PathData>,
}

/// Information about a single XML tag represented by an [`ItemWrapper`].
#[derive(Debug)]
struct TagInfo {
/// Identifier (namespace and local name) of the element.
ident: ElementIdent,

/// Form of the element, used to decide whether the tag needs a namespace
/// prefix.
form: FormChoiceType,
}

impl FixItemType {
fn prepare_types(&self, mut types: MetaTypes, schemas: &Schemas) -> Result<MetaTypes, Error> {
let item_ident = IdentQuadruple::from((IdentType::Element, "xbrli:item"));
let item_ident = item_ident
.resolve(schemas)
.context("Unable to resolve `xbrli:item` element")?;

let item_ty = types
.items
.get(&item_ident)
.context("Unknown element: `xbrli:item`")?;
let MetaTypeVariant::ComplexType(meta) = &item_ty.variant else {
anyhow::bail!("`xbrli:item` is not a complex type")
};
let content_ident = meta
.content
.clone()
.context("`xbrli:item` is missing a content type")?;

let content_ty = types
.items
.get_mut(&content_ident)
.context("Unknown content type for `xbrli:item`")?;
let MetaTypeVariant::Choice(meta) = &mut content_ty.variant else {
anyhow::bail!("Content type of `xbrli:item` is not a choice")
};

// Group all concrete element variants of the choice by their type and
// remove them from the choice. We use a `BTreeMap` to get a stable
// order of the generated types. For each removed element we remember the
// information needed to reconstruct its XML tag name later on.
let mut map = BTreeMap::<TypeIdent, Vec<(ElementIdent, FormChoiceType)>>::new();
meta.elements.0.retain(|el| {
let ElementMetaVariant::Type {
type_,
mode: ElementMode::Element,
} = &el.variant
else {
return true;
};

map.entry(type_.clone())
.or_default()
.push((el.ident.clone(), el.form));

false
});

// Add one group element per type that references the synthetic wrapper
// type instead of the removed elements.
let mut pending = Vec::new();
for (concrete, tags) in map {
let mut wrapped = concrete.clone();
wrapped.name = Name::new_named(format!("{}Wrapped", wrapped.name));

meta.elements.0.push(ElementMeta::new(
concrete.to_property_ident(),
wrapped.clone(),
ElementMode::Group,
FormChoiceType::Unqualified,
));

pending.push((wrapped, concrete, tags));
}

// `meta` (and therefore the mutable borrow of `types`) is not used
// beyond this point, so we can now resolve the tag names and register
// the synthetic wrapper types as custom types.
for (ident, type_, tags) in pending {
let tags = tags
.into_iter()
.map(|(ident, form)| TagInfo::new(ident, form))
.collect::<Vec<_>>();

let wrapped = Rc::new(RefCell::new(WrappedType {
ident: ident.clone(),
type_,
tags,
target_type: None,
}));
self.0.borrow_mut().push(wrapped.clone());

// The custom generator step resolves the path to the concrete type
// during code generation and stores it in the (shared) wrapper.
let custom = CustomMeta::new(ident.name.clone()).with_generator(
move |ctx: &mut GeneratorContext<'_, '_>, _: &CustomMeta| {
wrapped.borrow_mut().resolve(ctx)
},
);
types
.items
.insert(ident, MetaType::new(MetaTypeVariant::Custom(custom)));
}

Ok(types)
}
}

impl RenderStep for FixItemType {
fn render_step_type(&self) -> RenderStepType {
RenderStepType::ExtraTypes
}

fn finish(&mut self, meta: &MetaData<'_>, module: &mut Module) {
for wrapped in self.0.borrow().iter() {
wrapped.borrow().render(meta, module);
}
}
}

impl WrappedType {
/// Resolve the path to the concrete type (relative to the root module) and
/// store it. This is called by the custom generator step during code
/// generation, where the path information is available. Requesting the type
/// reference also makes sure the concrete type is actually generated.
fn resolve(&mut self, ctx: &mut GeneratorContext<'_, '_>) -> Result<(), GeneratorError> {
let target_type = ctx.get_or_create_type_ref(&self.type_)?.path.clone();

self.target_type = Some(target_type);

Ok(())
}

/// Render the type definition and the `ItemTags` implementation for this
/// wrapper into the given `module`.
fn render(&self, meta: &MetaData<'_>, module: &mut Module) {
let wrapped_ident = format_ident!("{}", self.ident.name.as_str());
let tags_ident = format_ident!("{}Tags", self.ident.name.as_str());

let target_type = self
.target_type
.as_ref()
.expect("the concrete type path is resolved by the custom generator step")
.resolve_relative_to(&ModulePath::root());

let tags = self.tags.iter().map(|tag| tag.render(meta));
let count = self.tags.len();

module.append(quote! {
pub type #wrapped_ident = crate::item::ItemWrapper<#target_type, #tags_ident>;

#[derive(Debug)]
pub struct #tags_ident;

impl crate::item::ItemTags for #tags_ident {
fn tags() -> &'static [crate::item::ItemTag] {
static TAGS: [crate::item::ItemTag; #count] = [ #( #tags ),* ];

&TAGS
}
}
});
}
}

impl TagInfo {
/// Resolve the information for a single tag from the passed element data.
fn new(ident: ElementIdent, form: FormChoiceType) -> Self {
Self { ident, form }
}

/// Render this tag as a `crate::item::ItemTag` value.
fn render(&self, meta: &MetaData<'_>) -> TokenStream {
let Self { ident, form } = self;

let types = meta.types.meta.types;
let module = types
.modules
.get(&ident.ns)
.expect("the module for the tag's namespace exists");

// Reuse the namespace constant generated next to the schema types
// (e.g. `NS_VSME`) instead of repeating the namespace URI.
let namespace = module
.make_ns_const()
.expect("the tag has a namespace")
.resolve_relative_to(&ModulePath::root());
let tag = TagName::new(types, ident.ns, &ident.name, *form).get(true);
let name = ident.name.as_str();

quote! {
crate::item::ItemTag {
tag: #tag,
name: #name,
namespace: #namespace,
}
}
}
}
Loading
Loading