Add configurable list formatting for CSV/TSV serialization by turbomam · Pull Request #3134 · linkml/linkml

turbomam · 2026-02-04T17:42:56Z

New Summary from 2026-02-11

Adds configurable multivalued field formatting for CSV/TSV serialization, via schema-level annotations and CLI options.

Before: Multivalued fields always serialize with brackets: [value1|value2|value3]
After: With list_syntax: plaintext, fields serialize without brackets: value1|value2|value3

Closes #3041. Addresses the core of #2581 (filed by @matentzn as a blocker for supporting common delimited formats like pipe-separated, semicolon-separated, etc.).

Origin and design

This follows the design @cmungall and I agreed on in our Dec 15 rolling meeting notes:

annotations:
  list_syntax: plaintext   # python (default) | plaintext
  list_delimiter: "; "     # any string; space must be explicit

With mapping to json-flattener: list_syntax: plaintext → csv_list_markers=("", ""), list_delimiter → csv_inner_delimiter.

Deviation from spec: schema-level only

The Dec 15 spec discussed slot-level annotations overriding schema-level defaults via SchemaView. The implementation is schema-level only. json-flattener's GlobalConfig defines csv_list_markers and csv_inner_delimiter at the top level with no per-column configuration path, so slot-level overrides would require extending json-flattener itself. The primary use case (MIxS-style "semicolon-delimited, no brackets") is uniform across all multivalued fields in a schema, so this felt like the right scope for now.

No changes to csvutils.py

Per Chris's guidance ("prefer no changes in csvutils.py"), configuration is handled in the loader and dumper rather than in the shared utility layer.

SSSOM alignment

@matentzn suggested checking how SSSOM handles multivalued field packing. With list_syntax: plaintext and list_delimiter: "|", our output matches SSSOM's TSV spec exactly (plain a|b|c, no brackets, strip whitespace). LinkML generalizes what SSSOM hardcodes — appropriate for a general-purpose modeling language where different schemas need different conventions.

The SSSOM ecosystem is actively working on delimiter-in-value escaping (sssom#507, sssom-java#17). This PR doesn't implement escaping either, but the annotation-based configuration provides the right foundation to add it later.

What's not in scope

linkml-validate loader: This PR modifies the linkml_runtime loader/dumper (used by linkml-convert), not the separate linkml.validator.loaders.delimited_file_loader. Filed linkml-validate CSV/TSV loader lacks schema-aware parsing (boolean coercion, list splitting) #3147 to track unifying them.
Pandera / column-oriented data: @tfliss and @sneakers-the-rat raised broader tabular concerns in Discussion #1996. This is row-oriented only — those feel like follow-up work for the tabular data library discussion.
Delimiter-in-value escaping: Neither this PR nor SSSOM 1.0 implement escaping. Instead, the refuse_delimiter_in_data annotation/CLI flag raises a ValueError before serialization if any value contains the delimiter — preventing silent data corruption. Full escaping (e.g., SSSOM 1.1's backslash approach) can be added later.
RDF order preservation: @gouttegd clarified that multivalued slot order non-preservation is a LinkML-wide property (not SSSOM-specific), since the RDF translation rules use unstructured triples even when list_elements_ordered: true. Worth noting but orthogonal to this PR.

Configuration reference

Schema annotations

id: https://example.org/myschema
name: myschema
annotations:
  list_syntax: plaintext
  list_delimiter: "|"
  list_strip_whitespace: "true"
  refuse_delimiter_in_data: "true"

Annotation	Values	Default	Description
`list_syntax`	`python`, `plaintext`	`python`	`python` wraps lists in brackets `[a\|b\|c]`, `plaintext` has no brackets `a\|b\|c`
`list_delimiter`	any string	`\|` (pipe)	Character(s) used to separate list items
`list_strip_whitespace`	`true`, `false`	`true`	Strip whitespace around delimiters when loading and dumping
`refuse_delimiter_in_data`	`true`, `false`	`false`	Raise `ValueError` if any multivalued field value contains the delimiter, preventing silent data corruption

CLI options (override schema annotations)

linkml-convert -s schema.yaml -C Container -S items -t tsv \
  --list-syntax plaintext \
  --list-delimiter "|" \
  --list-strip-whitespace \
  --refuse-delimiter-in-data \
  input.yaml

CLI Option	Default	Description
`--list-syntax`	None (use schema)	`python` or `plaintext`
`--list-delimiter`	None (use schema)	Delimiter string
`--list-strip-whitespace` / `--no-list-strip-whitespace`	None (use schema)	Strip whitespace from list values
`--refuse-delimiter-in-data` / `--no-refuse-delimiter-in-data`	None (use schema)	Raise error if any value contains the delimiter

Review feedback addressed

From @cmungall's review (Feb 5):

✅ Converted all tests to pure idiomatic pytest (no unittest classes, no hybrid styles)
✅ Removed verbose agent-conversation-style comments
✅ Made helper functions public (dropped underscore prefix)

From Copilot:

✅ Fixed schema-level vs slot-level annotation mismatch in tests
✅ Removed unused variables
✅ Added warning log for invalid list_syntax values

Coverage:

✅ Added CLI integration tests — 25 converter tests + 29 CSV/TSV runtime tests pass

Files changed

docs/data/csvs.md — documentation
packages/linkml/src/linkml/converter/cli.py — CLI options
packages/linkml_runtime/src/linkml_runtime/dumpers/delimited_file_dumper.py — output formatting
packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py — input parsing
tests/linkml/test_utils/test_converter.py — CLI tests
tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py — runtime tests
tests/linkml_runtime/test_utils/test_csv_utils.py — utility tests

codecov · 2026-02-04T17:47:28Z

Codecov Report

❌ Patch coverage is 61.11111% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.77%. Comparing base (d187949) to head (751a917).

Files with missing lines	Patch %	Lines
packages/linkml/src/linkml/converter/cli.py	61.11%	2 Missing and 5 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3134      +/-   ##
==========================================
+ Coverage   79.92%   83.77%   +3.85%     
==========================================
  Files         144      144              
  Lines       16579    16597      +18     
  Branches     3421     3428       +7     
==========================================
+ Hits        13250    13904     +654     
+ Misses       2606     1918     -688     
- Partials      723      775      +52

Flag	Coverage Δ
linkml	`79.92% <61.11%> (+0.01%)`	⬆️
runtime	`79.92% <61.11%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR adds configurable list formatting for CSV/TSV serialization to address issue #3041, enabling users to control how multivalued fields are serialized (with or without brackets, custom delimiters, and whitespace handling).

Changes:

Adds schema-level annotations (list_syntax, list_delimiter, list_strip_whitespace) to control multivalued field formatting in CSV/TSV output
Implements CLI options (--list-syntax, --list-delimiter, --list-strip-whitespace) to override schema annotations
Extends CSV/TSV loaders and dumpers to handle plaintext-style lists (e.g., a|b|c) in addition to python-style lists (e.g., [a|b|c])

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
docs/data/csvs.md	Comprehensive documentation of the new configuration options with examples and usage instructions
packages/linkml/src/linkml/converter/cli.py	Adds three new CLI options for list formatting that apply to both input and output CSV/TSV operations
packages/linkml_runtime/src/linkml_runtime/dumpers/delimited_file_dumper.py	Implements list formatting configuration for CSV/TSV output, reading from schema annotations or CLI overrides
packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py	Implements list formatting configuration for CSV/TSV input, including helper functions for annotation reading and whitespace stripping
tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py	Comprehensive integration tests covering plaintext mode, custom delimiters, whitespace handling, and edge cases
tests/linkml_runtime/test_utils/test_csv_utils.py	Unit tests for annotation reading (contains a test schema that uses slot-level annotations inconsistently with implementation)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tests/linkml_runtime/test_utils/test_csv_utils.py

tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py

turbomam · 2026-02-05T15:50:17Z

Re: patch coverage

~~The 12 uncovered lines are in cli.py where the new CLI options are passed through to the loader/dumper.~~

Update: Added CLI integration tests in commit a1db0e5. The CLI options (--list-syntax, --list-delimiter, --list-strip-whitespace) are now tested directly via CliRunner.

Test coverage includes:

4 CLI tests in test_converter.py (linkml package)
33 tests in test_csv_tsv_loader_dumper.py and test_csv_utils.py (linkml_runtime package)

Copilot

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py

cmungall · 2026-02-05T20:52:09Z

tests/linkml_runtime/test_utils/test_csv_utils.py

+
+
+# =============================================================================
+# pytest-style unit tests for annotation-based CSV configuration (issue #3041)


This test file is now a hybrid of 3 styles:

UnitTest

pure pytest

non-idiomatic pytest (using classes)

The guidelines aren't clear what to do when contributing a new test to an existing UnitTest file:

https://linkml.io/linkml/maintainers/contributing.html#unittest-to-pytest-conversion

I favor consistency here, and I'd just convert this all to pure pytest

Fixed in 9b1e739 and 19b9006. Converted the entire file to pure pytest — both the old CsvUtilTestCase and the new annotation tests are now plain functions. Removed verbose comments and section headers.

cmungall · 2026-02-05T20:53:31Z

tests/linkml_runtime/test_utils/test_csv_utils.py

+# -----------------------------------------------------------------------------
+# Inline test schemas for annotation testing
+#
+# We use inline schemas here because:


This is a bit too much verbiage, sounds like the results of a conversation with an agent

Fixed. Removed the verbose comment blocks.

cmungall · 2026-02-05T20:56:37Z

tests/linkml_runtime/test_utils/test_csv_utils.py

+# -----------------------------------------------------------------------------
+# Note on KeyConfig generation for multivalued primitive slots
+#
+# The _get_key_config() function in csvutils.py is NOT modified per Chris's


Adding comments to tests is good if they help future maintenance or explain he purpose or function of the text, but this looks like a conversation that has lost its context

Fixed. Removed the KeyConfig note — it was stale context.

cmungall · 2026-02-05T20:57:09Z

tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py

+    return SchemaView(SCHEMA_WHITESPACE_PRESERVE)
+
+
+class TestWhitespaceStripping:


this style of pytest seems non-idiomatic for this repo

Fixed. Converted all test classes to plain test_* functions, including the pre-existing CsvAndTsvGenTestCase (mechanical conversion, no behavior change).

cmungall

make the tests more consistent with other tests

cmungall · 2026-02-05T20:59:36Z

tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py

 from linkml_runtime.dumpers import csv_dumper, json_dumper, tsv_dumper, yaml_dumper
 from linkml_runtime.loaders import csv_loader, tsv_loader, yaml_loader
+from linkml_runtime.loaders.delimited_file_loader import (
+    _get_list_config_from_annotations,


consider making more clearly intended as public

(doctests might work better if it's intended as private)

Made them public by dropping the underscore prefix: get_list_config_from_annotations, enhance_configmap_for_multivalued_primitives, strip_whitespace_from_lists.

Considered doctests, but the repo has no doctest infrastructure — there's no --doctest-modules in pyproject.toml and no doctest step in CI. The existing >>> examples in ~9 source files are documentation-only and never executed as tests. Making the functions public with dedicated pytest coverage seemed more practical.

turbomam · 2026-02-11T16:45:36Z

Consolidated unaddressed feedback — list formatting (PR #3134)

Gathering all outstanding feedback from multiple sources so nothing falls through the cracks.

1. Chris's CHANGES_REQUESTED review (Feb 5) — addressed in code, awaiting re-review

All 5 inline comments have been addressed:

✅ Converted test file from hybrid unittest/pytest to pure pytest (9b1e739, 19b9006)
✅ Removed verbose agent-conversation-style comments
✅ Removed stale KeyConfig context note
✅ Converted test classes to plain test_* functions
✅ Made helper functions public (dropped underscore prefix)

Status: Code changes pushed, Chris has not re-reviewed yet.

2. Chris's design spec from Dec 15 rolling meeting notes

The agreed-upon design from our Dec 15 meeting (Chris & Mark rolling notes):

# Schema annotation spec
attributes:
  name_list:
    multivalued: true
    annotations:
      list_syntax: python  ## allowed: python | plaintext
      list_delimiter: "; "  ## must include space explicitly. No effect if list_syntax == 'python'

Mapping to json-flattener:

If list_syntax == "python" → use defaults
Else → csv_list_markers = ("", ""), csv_inner_delimiter = $list_delimiter

Cascading: Use SchemaView — default to schema-level annotations, slot-level annotations override.

Additional guidance from Chris:

"prefer no changes in packages/linkml_runtime/src/linkml_runtime/utils/csvutils.py"
"remember that json-flattener isn't really schema aware"
"pass csv list marker (a tuple) and inner delimiter (or a csv style syntax enum) in schema"

Need to verify: Does the current implementation match this spec exactly? Specifically:

Slot-level annotation override of schema-level annotations
Correct mapping to json-flattener's csv_list_markers and csv_inner_delimiter
No changes in csvutils.py

3. Chris's helper function visibility comment (PR inline)

Chris said "consider making more clearly intended as public (doctests might work better if it's intended as private)". I made them public and noted the repo has no doctest infrastructure — filed #3146 to track adding it. Chris hasn't responded to this.

4. Copilot review — invalid `list_syntax` values

Copilot flagged that invalid list_syntax values (e.g. "foobar") silently default to python style. Fixed in 971b3ec — added a warning log. ✅

5. Discussion #1996 context

My progress update in the "Improved ways of working with tabular data" discussion links this PR. Broader context from the discussion:

@tfliss raised interaction with Pandera generator for inlined-as-simple-dict and range classes
@sneakers-the-rat raised the two orientations of tabular data (column-oriented vs row-oriented)
Chris's original post discusses whether we should have a separate tabular data library with plugin architecture

6. Feb 9 rolling notes

Chris noted "Finish linkml PRs" as a current action item.

7. Related issues

CSV/TSV loader does not split brackets-free, multivalued primitive slots (with pipe delimiter) #3041 — parent issue (Nico asked about ownership Jan 19, I confirmed Feb 4)
Make it possible to configure inlined multivalued strings syntax #2581 — original feature request (Nico asked for Chris/Sierra input Mar 2025)
specify how the LinkML multivalued metaslot should be used with MIxS terms GenomicsStandardsConsortium/mixs#952 — MIxS multivalued field conventions
allow whitespace between delimiters in Value syntax patterns? GenomicsStandardsConsortium/mixs#465 — whitespace between delimiters

Next steps

Verify implementation matches Chris's Dec 15 spec (especially slot-level override cascade and json-flattener mapping)
Rebase if needed
Request re-review from @cmungall

🤖 Generated with Claude Code

turbomam · 2026-02-11T16:59:40Z

@cmungall Heads up on one deviation from our Dec 15 spec. We discussed slot-level annotations overriding schema-level defaults, but the current implementation only supports schema-level annotations.

Why: json-flattener's GlobalConfig defines csv_list_markers and csv_inner_delimiter at the top level, not on KeyConfig. These get applied uniformly to all columns — there's no per-column configuration path. So slot-level overrides would require extending json-flattener itself.

I think schema-level-only is the right call here. The main use case driving this (MIxS-style "semicolon-delimited, no brackets") is uniform across all multivalued fields in a given schema anyway. If a per-slot use case comes up later, we can add it via json-flattener at that point.

Also rebased onto current main and resolved the conflict with #3118's new converter tests. All 54 tests pass (25 converter + 29 CSV/TSV runtime). Ready for re-review when you get a chance.

turbomam · 2026-02-11T18:39:13Z

Scope and known limitations

What this PR does and doesn't touch

This PR modifies the linkml_runtime loader/dumper (used by linkml-convert). It does not touch the separate linkml.validator.loaders.delimited_file_loader (used by linkml-validate), which is a simpler 79-line loader built on bare csv.DictReader without json-flattener.

That means after this merges, linkml-convert will correctly split a|b into ['a', 'b'], but linkml-validate on the same CSV will still see it as the raw string a|b. I filed #3147 to track unifying these two loaders — that felt like a separate effort.

Schema-level only annotations

Our Dec 15 spec discussed slot-level annotation overrides via SchemaView. The implementation is schema-level only — see my earlier comment for why (json-flattener's GlobalConfig has no per-column delimiter support).

Known edge cases

Delimiter-in-value: If a value contains the delimiter character, round-tripping will break. No escaping mechanism yet. Tracked in a skipped test with a note.
Empty multivalued fields: Skipped test due to a json-flattener json_clean issue — empty lists don't roundtrip cleanly.
Pandera / column-oriented data: @tfliss and @sneakers-the-rat raised broader tabular concerns in Discussion Improved ways of working with tabular data #1996. This PR is row-oriented only and doesn't address those — they feel like follow-up work for the tabular data library discussion.

`list_strip_whitespace` accepts only `true`/`false`

Tightened in e4d955e to only accept case-insensitive "true" or "false". Previously accepted YAML 1.1 conventions (yes/no, 0/1). Changed to stay consistent with the direction in #3144 per Chris's feedback about not mixing boolean conventions.

turbomam · 2026-02-11T19:01:44Z

SSSOM alignment — context from today's linkml-dev meeting

Nico suggested looking at how SSSOM handles multivalued field packing in TSV. Turns out the SSSOM ecosystem is actively debating the same problems we're solving here, literally today.

How SSSOM does it

From the SSSOM/TSV spec:

"Multi-valued slots MUST be serialised as a list of values separated by | characters."

No brackets — plain value1|value2|value3
Pipe is hardcoded in the spec, not configurable
sssom-py strips whitespace: [s.strip() for s in v.split("|")]
Schema-driven — checks multivalued: true to know which columns to split

Active spec work happening right now

mapping-commons/sssom#507 (opened Feb 10) — proposes backslash escaping (\| for literal pipe) within multivalued values, targeting SSSOM 1.1
mapping-commons/sssom#429 — the underlying debate about whether to forbid pipe in values, escape it, or percent-encode it
gouttegd/sssom-java#17 — concrete Java implementation of the escape mechanism
mapping-commons/sssom#504 — Nico is promoting sssom-java as the reference implementation, so its escaping approach will likely set the standard

How PR #3134 compares

Aspect	SSSOM	LinkML PR #3134
Format	No brackets, plain `a` + pipe + `b` + pipe + `c`	Configurable: brackets by default, no brackets with `list_syntax: plaintext`
Delimiter	Pipe, fixed by spec	Configurable via `list_delimiter` annotation
Whitespace	`strip()` on each value	Configurable via `list_strip_whitespace`
Escaping	None in 1.0; backslash escaping proposed for 1.1	None (same gap)
Schema-driven	Checks `multivalued: true`	Same, via SchemaView + `enhance_configmap_for_multivalued_primitives()`

With list_syntax: plaintext and list_delimiter: "|", our output matches SSSOM's format exactly. LinkML generalizes what SSSOM hardcodes — which makes sense for a general-purpose modeling language where different schemas need different conventions (SSSOM uses pipe, MIxS has historically used semicolons, commas, etc.).

Escaping

Neither SSSOM (1.0) nor this PR handle delimiter-in-value escaping. SSSOM is actively working on backslash escaping for 1.1 (backslash-pipe for literal pipe, double-backslash for literal backslash). If/when that lands and we want to support it in LinkML, it would be a follow-up — the annotation-based configuration in this PR provides the right foundation to build on.

Nico filed #2581 (the origin issue for this work), rating configurable delimiters as a "blocker"
SSSOM discussion #428 found 4 of 5 published SSSOM datasets on Zenodo don't follow the spec — reinforcing why good serialization tooling matters
SSSOM issue #491 notes SSSOM does not guarantee order preservation in multivalued slots (for RDF simplicity) — something to be aware of for LinkML's semantics

Add unit and integration tests for configurable multivalued field delimiters in CSV/TSV serialization. Tests follow Chris Mungall's design guidance: logic should be in loader/dumper files, not csvutils.py. Tests include: - Annotation reading (list_syntax, list_delimiter) via SchemaView - Integration tests using personinfo.yaml with dynamic alias injection - Parametrized tests for different delimiter configurations - Edge case tests (empty lists, single values, delimiter in values) All new tests are skipped pending implementation, following TDD approach. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add support for customizing multivalued field formatting in CSV/TSV serialization via slot annotations: - list_syntax: "python" (default, with brackets) or "plaintext" (no brackets) - list_delimiter: custom delimiter between list items (default "|") Implementation: - Add _get_list_config_from_annotations() to read annotations from schema - Add _enhance_configmap_for_multivalued_primitives() for plaintext mode - Update loader and dumper to use annotation-derived configuration - Logic is in loader/dumper files per Chris Mungall's guidance Tests: - Enable plaintext roundtrip tests (now passing) - Enable custom delimiter tests for |, ;, and , (now passing) - 16 tests passing, 14 skipped (edge cases for future work) Documentation: - Add "Customizing multivalued field formatting" section to docs/data/csvs.md - Document list_syntax and list_delimiter annotations with examples Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

) json-flattener's GlobalConfig applies the same csv_list_markers and csv_inner_delimiter to all columns, so slot-level overrides don't make sense. Simplified implementation to only read schema-level annotations. Updated docs and tests to reflect this design constraint. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Ruff UP006/UP035: Use lowercase tuple instead of typing.Tuple Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Use ordinal for temp filename to avoid Windows reserved chars like | Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Remove TestCsvConfigFromAnnotations and test_list_syntax_to_markers (tested helper functions we never implemented - logic is in loader/dumper) - Remove TestPersoninfoAliasesIntegration (used schema without annotations) - Remove unused fixtures and inline schemas - Update comments to reflect schema-level only design Tests: 17 passed, 3 skipped (pre-existing issues outside PR scope) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Test edge cases for _get_list_config_from_annotations and _enhance_configmap_for_multivalued_primitives: - None schemaview returns defaults - Schema without annotations returns defaults - plaintext_mode=False returns original configmap Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add YAML→TSV conversion example using existing test files - Add TSV→YAML conversion example showing plaintext parsing - Use markdown table to show sample TSV data - Update terminology to "python style (bracketed)" Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Add list_strip_whitespace annotation (default true) to control whether whitespace around delimiters is stripped when loading - Add CLI options to linkml-convert to override schema annotations: --list-syntax, --list-delimiter, --list-strip-whitespace - Update documentation with new annotation and CLI options - Add tests for whitespace stripping functionality Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Whitespace stripping now works for both loading and dumping - On input: "a | b" → ['a', 'b'] (stripped) or ['a ', ' b'] (preserved) - On output: ['dog ', 'cat'] → "dog|cat" (stripped) or "dog |cat" (preserved) - Add tests for output whitespace stripping - Update documentation to clarify bidirectional behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move inline schemas to module-level constants (SCHEMA_WITHOUT_ANNOTATIONS, SCHEMA_WHITESPACE_STRIP, SCHEMA_WHITESPACE_PRESERVE) - Add make_delimiter_schema() factory for parametrized delimiter tests - Move fixtures to module level for reusability - Convert loop-based annotation value tests to @pytest.mark.parametrize Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Wrap pipe characters in double backticks in --list-syntax and --list-delimiter help strings. This prevents Sphinx's sphinx-click extension from interpreting them as RST substitution references, which was causing the docs build to fail with warnings. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Pipe characters inside markdown table cells were being interpreted as column delimiters, causing truncated content in the rendered HTML. Escaped with backslash (\|) to render as literal pipes. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Update CLI help and docs to say "when loading and dumping" (not just loading) - Simplify CLI help text to avoid formatting issues with special characters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move annotations from slot-level to schema-level in test schema to match actual implementation behavior - Remove unused variables from skipped test Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tests for --list-syntax, --list-delimiter, and --list-strip-whitespace options in linkml-convert CLI. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Warn users if they provide an invalid list_syntax annotation value (e.g., typo like "plainetxt" instead of "plaintext"). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Convert test_csv_utils.py from hybrid unittest/pytest to pure pytest - Convert class-based tests to plain functions in test_csv_tsv_loader_dumper.py - Remove verbose comment blocks and chatty docstrings - Rename helper functions to public (drop underscore prefix): get_list_config_from_annotations, enhance_configmap_for_multivalued_primitives, strip_whitespace_from_lists Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mechanical conversion: drop class wrapper, remove self, remove import unittest. No behavior change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Only accept case-insensitive "true" or "false" for the list_strip_whitespace annotation, with a warning for invalid values. Aligns with the direction in #3144 to avoid YAML 1.1 boolean conventions (yes/no, on/off, 0/1) in CSV-related configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gouttegd · 2026-02-11T19:46:50Z

Not strictly related to the issue of list delimiters, but since this is noted here:

SSSOM issue #491 notes SSSOM does not guarantee order preservation in multivalued slots (for RDF simplicity) — something to be aware of for LinkML's semantics

This is not a SSSOM-specific limitation. SSSOM does not guarantee the order of the values in a multi-valued slot, because LinkML itself does not guarantee that. The behaviour of SSSOM here was directly taken from the behaviour of the LinkML runtime.

The rules for RDF translations do not cover the case of multi-valued slots, so I don’t know what was the intention here, but in effect the LinkML runtime translates multi-valued slots as simple unstructured list of triples (even if the slot is defined with list_elements_ordered: true). Such a translation cannot ensure that the order of values is preserved. So LinkML may preserve the order of values in all other formats, but as soon as you convert to or from RDF the order of values cannot be expected to be preserved. Ergo, more generally, LinkML does not guarantee that the order of values in multi-valued slots is preserved.

When enabled (via schema annotation or CLI flag), raises ValueError before serializing if any multivalued field value contains the list delimiter character. This catches round-trip corruption at write time rather than silently producing corrupt output. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

turbomam changed the title ~~Add annotation-based CSV delimiter configuration~~ Add annotation-based xSV delimiter configuration Feb 4, 2026

turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from e788b02 to 2264cc3 Compare February 4, 2026 17:49

matentzn assigned turbomam Feb 5, 2026

turbomam changed the title ~~Add annotation-based xSV delimiter configuration~~ Add configurable list formatting for CSV/TSV serialization Feb 5, 2026

turbomam marked this pull request as ready for review February 5, 2026 15:23

Copilot AI review requested due to automatic review settings February 5, 2026 15:23

Copilot started reviewing on behalf of turbomam February 5, 2026 15:24 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

tests/linkml_runtime/test_utils/test_csv_utils.py Show resolved Hide resolved

tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py Outdated Show resolved Hide resolved

tests/linkml_runtime/test_loaders_dumpers/test_csv_tsv_loader_dumper.py Outdated Show resolved Hide resolved

turbomam requested review from cmungall, Copilot and matentzn February 5, 2026 16:20

Copilot started reviewing on behalf of turbomam February 5, 2026 16:20 View session

Copilot AI reviewed Feb 5, 2026

View reviewed changes

packages/linkml_runtime/src/linkml_runtime/loaders/delimited_file_loader.py Show resolved Hide resolved

This was referenced Feb 5, 2026

specify how the LinkML multivalued metaslot should be used with MIxS terms GenomicsStandardsConsortium/mixs#952

Open

add MISIP-MIMS checklist GenomicsStandardsConsortium/mixs#1002

Merged

No truthy values are accepted by CSV validation #2580

Open

cmungall reviewed Feb 5, 2026

View reviewed changes

cmungall requested changes Feb 5, 2026

View reviewed changes

turbomam requested a review from cmungall February 5, 2026 21:41

This was referenced Feb 5, 2026

Add doctest infrastructure to CI #3146

Open

linkml-validate CSV/TSV loader lacks schema-aware parsing (boolean coercion, list splitting) #3147

Open

turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from 9b1e739 to d0aa8d5 Compare February 11, 2026 16:58

turbomam and others added 22 commits February 11, 2026 14:11

Fix deprecated typing.Tuple usage to use builtin tuple

a7f68cc

Ruff UP006/UP035: Use lowercase tuple instead of typing.Tuple Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix Windows filename with reserved character in test

e2f7cac

Use ordinal for temp filename to avoid Windows reserved chars like | Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix consistency in CLI help and docs for list_strip_whitespace

f43c88e

- Update CLI help and docs to say "when loading and dumping" (not just loading) - Simplify CLI help text to avoid formatting issues with special characters Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix test issues identified by Copilot review

f52d165

- Move annotations from slot-level to schema-level in test schema to match actual implementation behavior - Remove unused variables from skipped test Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add CLI tests for list formatting options

e01080c

Tests for --list-syntax, --list-delimiter, and --list-strip-whitespace options in linkml-convert CLI. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Format CLI tests with ruff

342b30e

Add validation warning for invalid list_syntax values

c3a7d3a

Warn users if they provide an invalid list_syntax annotation value (e.g., typo like "plainetxt" instead of "plaintext"). Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Convert old CsvAndTsvGenTestCase from unittest to pure pytest

1fe1edc

Mechanical conversion: drop class wrapper, remove self, remove import unittest. No behavior change. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix ruff formatting: join split f-string into single line

99696b2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

turbomam force-pushed the issue-3041-annotation-based-csv-delimiters branch from ec4030c to 99696b2 Compare February 11, 2026 19:12

turbomam and others added 2 commits February 11, 2026 15:13

Merge branch 'main' into issue-3041-annotation-based-csv-delimiters

751a917



		# =============================================================================
		# pytest-style unit tests for annotation-based CSV configuration (issue #3041)

		return SchemaView(SCHEMA_WHITESPACE_PRESERVE)


		class TestWhitespaceStripping:

Conversation

turbomam commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Summary from 2026-02-11

Origin and design

Deviation from spec: schema-level only

No changes to csvutils.py

SSSOM alignment

What's not in scope

Configuration reference

Schema annotations

CLI options (override schema annotations)

Review feedback addressed

Files changed

Uh oh!

codecov bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

turbomam commented Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cmungall left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

turbomam Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

turbomam commented Feb 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Consolidated unaddressed feedback — list formatting (PR #3134)

1. Chris's CHANGES_REQUESTED review (Feb 5) — addressed in code, awaiting re-review

2. Chris's design spec from Dec 15 rolling meeting notes

3. Chris's helper function visibility comment (PR inline)

4. Copilot review — invalid list_syntax values

5. Discussion #1996 context

6. Feb 9 rolling notes

7. Related issues

Next steps

Uh oh!

turbomam commented Feb 11, 2026

Uh oh!

turbomam commented Feb 11, 2026

Scope and known limitations

What this PR does and doesn't touch

Schema-level only annotations

Known edge cases

list_strip_whitespace accepts only true/false

turbomam commented Feb 4, 2026 •

edited

Loading

codecov bot commented Feb 4, 2026 •

edited

Loading

turbomam commented Feb 5, 2026 •

edited

Loading

turbomam Feb 5, 2026 •

edited

Loading

turbomam commented Feb 11, 2026 •

edited

Loading

4. Copilot review — invalid `list_syntax` values

`list_strip_whitespace` accepts only `true`/`false`