Skip to content

feat(parser_ros, mcap_source): builtin-object handlers + class-level schema catalog + zero-copy#82

Open
pabloinigoblasco wants to merge 20 commits into
mainfrom
feat/parser-ros-canonical-objects
Open

feat(parser_ros, mcap_source): builtin-object handlers + class-level schema catalog + zero-copy#82
pabloinigoblasco wants to merge 20 commits into
mainfrom
feat/parser-ros-canonical-objects

Conversation

@pabloinigoblasco
Copy link
Copy Markdown
Contributor

@pabloinigoblasco pabloinigoblasco commented May 11, 2026

Summary

Rework of parser_ros and mcap_source around the builtin-object pipeline introduced in the companion SDK PR. Builds on the SDK's SchemaHandler table + BufferAnchor + push_message_v2 contract; consumes the builtin types defined in pj_scene_protocol/builtin/.

Posted as a draft for review. Compiles cleanly with the SDK PR.

parser_ros

Schema dispatch (one place, one source of truth)

  • Class-level catalog (catalog() returns a static unordered_map) keyed by ROS type name. Each entry holds member-function pointers — no this capture at class scope. Adding a schema = one entry in the table.
  • bindSchema looks the bound type up and registers a single SchemaHandler tailored to it on this instance (builtin-object handler, scalar handler, or the generic flatten as fallback).
  • Covers 16 specialized scalar schemas (Imu, JointState, Pose, Transform, DiagnosticArray, TFMessage, DataTamer*, PalStatistics*, TSL*, …) plus the builtin-object schemas (sensor_msgs/Image, sensor_msgs/CompressedImage, sensor_msgs/PointCloud2).

Builtin-object handlers (zero-copy)

  • parseImage / parseCompressedImage: both produce the unified sdk::Image introduced in the SDK rework. ROS encodings (rgb8/rgba8/bgr8/bgra8/mono8/mono16/16UC1) flow into sdk::Image::encoding as their string name; the JPEG/PNG/compressedDepth codecs become "jpeg"/"png"/"compressedDepth". There is no longer a separate CompressedImage builtin — the encoding string distinguishes raw from compressed. compressedDepth carries compressed_depth_min/compressed_depth_max on the same struct. Pixel/bytes Span sits over the payload buffer shared via the BufferAnchor.
  • parsePointCloud: walks PointField[] metadata and emits an sdk::PointCloud whose data Span sits over the payload bytes.

Generic + specialized scalar paths

  • parseScalarsGeneric: rosx_introspection's flattenGeneric harvested into a vector<NamedFieldValue>. Used as the default-handler scalar route in bindSchema for any schema not in the specific table.
  • parseScalarsDiscardingLargeArrays: same flow with the array policy forced to DISCARD_LARGE_ARRAYS temporarily. Used by the builtin-object schemas so their bulk byte payload doesn't show up as scalar columns.
  • wrapVoidHandler<Handler> template member function: adapts the existing imperative handle*() void methods to the parse_scalars callable signature. Each instantiation is a member-fn-ptr with the right shape, slotting directly into the catalog.

mcap_source

  • pushMessage with a deferred FetchMessageData callable per message: closure captures the open McapReader (kept alive by a shared_ptr member of the source) and the message coordinates. The host decides via ObjectIngestPolicy when (and whether) to invoke it.
  • readMessageBytesAt returns sdk::PayloadViewSpan<const uint8_t> over a heap-held shared_ptr<vector<uint8_t>> that serves as the BufferAnchor. The chunk-iterator copy is unchanged for now; seek+read direct with an LRU of decompressed chunks is the next optimization step.
  • Reader keeper: shared_ptr<McapReader> stored as a member so the FetchMessageData closures pushed via pushMessage stay valid past importData(). Closed automatically when the source is destroyed.

Build

  • magic_enum/0.9.7 added to conanfile.txt for the SDK vocabulary helpers (BuiltinObjectKind, CommonImageEncoding).

Test plan

  • Build clean (RelWithDebInfo) with the companion SDK PR.
  • Load a real mcap with image and point-cloud topics — pending.
  • Verify lazy FetchMessageData invocations under each policy mode — pending.

Companion work

  • Foundation: PlotJuggler/plotjuggler_core#86 (SDK builtin-object pipeline: SchemaHandler table, BufferAnchor, push_message_v2 ABI, ObjectIngestPolicyResolver, type-erased BuiltinObject via std::any, PJ_message_data_fetcher_t).
  • Plugin author docs: PLUGIN_DEVELOPMENT.md in this repo + pj_plugins/docs/data-source-guide.md + pj_plugins/docs/message-parser-guide.md in the core PR.

…l schema catalog + zero-copy

Rework of parser_ros and mcap_source around the canonical-object pipeline
introduced in plotjuggler_core. Builds on the SDK's SchemaHandler table
+ BufferAnchor + push_message_v2 contract.

parser_ros — schema dispatch:
- Class-level catalog (catalog() returns a static unordered_map) keyed
  by canonical ROS type name. Each entry holds member-function
  pointers — no `this` capture at class scope.
- bindSchema looks the bound type up in the catalog and registers a
  single SchemaHandler tailored to it on this instance (specific
  canonical-object handler, specific scalar handler, or the generic
  flatten as fallback for any type not in the catalog).
- The class catalog covers all 16 specialized scalar schemas
  (Imu, JointState, Pose, Transform, …) plus the 3 canonical-object
  schemas (Image, CompressedImage, PointCloud2).

parser_ros — three canonical-object handlers:
- parseImage: maps ROS encodings (rgb8/rgba8/bgr8/bgra8/mono8/mono16/
  16UC1) to canonical PixelFormat with row_step preserved. BGR variants
  stay as kBGR*; the consumer handles channel order via texture-format
  selection. Zero-copy — pixels Span sits over the payload buffer
  shared via the anchor.
- parseCompressedImage: handles JPEG, PNG, and the ROS compressedDepth
  wrapper (depth_min/depth_max in CompressedImage::extras). Zero-copy
  slice of the payload at the right offset.
- parsePointCloud: walks PointField[] and emits a sdk::PointCloud whose
  data Span sits over the payload bytes.

parser_ros — generic / specialized scalar paths:
- parseScalarsGeneric: rosx_introspection's flattenGeneric harvested
  to a vector<NamedFieldValue>. Used as the default-handler scalar
  route in bindSchema for any schema not in the specific table.
- parseScalarsDiscardingLargeArrays: same flow but with the array
  policy forced to DISCARD_LARGE_ARRAYS temporarily. Used by the
  canonical-object schemas so their bulk byte payload doesn't show up
  as scalar columns.
- wrapVoidHandler<Handler> template member fn: adapts the existing
  imperative handle*() void methods to the parse_scalars callable
  signature. Each instantiation is a member-fn-ptr with the right
  shape, slotting directly into the catalog without bind_front gymnastics.

mcap_source:
- pushMessage with a deferred byte fetcher per message: closure captures
  the open McapReader (shared_ptr held on the source for the session
  lifetime) and the message coordinates (channel id + log_time). The
  host decides via ObjectIngestPolicy when (and whether) to invoke it.
- readMessageBytesAt returns sdk::PayloadView — Span<const uint8_t> over
  a heap-held shared_ptr<vector<uint8_t>> that serves as the BufferAnchor.
  The chunk-iterator copy is unchanged for now; the seek+read direct
  path with an LRU of decompressed chunks is the next optimization step.
- Reader keeper: shared_ptr<McapReader> stored as a member so the
  fetcher closures pushed via pushMessage stay valid past importData().
  Closed automatically when the source is destroyed.

Status: design sketch posted as a draft. Compiles cleanly with the
companion SDK / runtime work; not yet exercised end-to-end against
real mcap files.
…chemaHandler architecture

Forward-looking developer guide for the canonical-object pipeline.

Contents:
- Root README: new "Plugin architecture — the declarative shape" section
  describing the DataSource pushMessage + fetcher contract, the
  MessageParser SchemaHandler catalog, and the end-to-end host dispatch
  flow (kEager / kLazyObjectsEagerScalars / kPureLazy).
- parser_ros/README.md: rewritten around the static schema catalog,
  three canonical-object handlers (Image, CompressedImage, PointCloud2)
  with zero-copy spans over the source payload, and steps to add a new
  schema entry.
- data_load_mcap/README.md: rewritten around the deferred byte fetcher,
  PayloadView + BufferAnchor lifetime, and the loader's role as a pure
  ingest pipe that lets the host decide eager vs lazy materialization.
…with wildcard entry

Contents:
- New PLUGIN_DEVELOPMENT.md as the authoring entry point. Walks
  developers through the two plugin families (DataLoader vs
  DataStream under DataSource, MessageParser) and what each is and is
  not. Frames the historical scalar-extraction model as the primary
  product of the ingest pipeline, then introduces canonical objects
  (sdk::Image, sdk::CompressedImage, sdk::PointCloud, with natural
  extension points for 2D laser scans, meshes, transformation trees)
  as the second narrow channel for non-scalar media, motivating the
  on-demand load callback as the consequence of aggregate payload
  size. Covers PayloadView + BufferAnchor, the DataSource pushMessage
  + fetcher shape, the MessageParser SchemaHandler catalog with
  optional CatalogEntry::kWildcard fallback, host-side eager / lazy /
  pure-lazy dispatch, end-to-end flow, and authoring checklists.
- Trimmed root README architecture section to a one-paragraph
  overview plus a link to PLUGIN_DEVELOPMENT.md.
- parser_ros/README.md: the catalog now declares the generic
  introspection handler as a wildcard ("*") entry, so bindSchema
  collapses to a single
  registerSchemaHandler(makeHandler(catalog().resolve(type_name)))
  call with no branching.
…cher, clarify host policy is not the plugin's problem

Contents:
- Define the fetcher in plain language at its first mention ("a small
  callback the host can invoke later to retrieve the bytes for that
  message"); drop the forward-reference style further down.
- Reframe "What plugins emit" as scalars and canonical objects "in
  time", emphasizing both channels are timestamped streams.
- Replace the "future video panels" placeholder with concrete
  extension examples (2D laser scans, meshes, transformation trees)
  and add a paragraph explaining why canonical objects are typically
  loaded on demand (200 GB MCAP example) and why this motivates the
  fetcher pattern.
- "How the host uses these declarations" now opens with an explicit
  "you do not need to write code for this" note, flags that ingest
  policy will become user-facing in PJ4, and adds a "What this means
  in practice for the plugin" subsection: keep the source open and
  seekable, do not cache decoded data inside the plugin, keep the
  fetcher idempotent.
- Soften the DataLoader sub-shape description: "enumerate" is lighter
  than it sounds — one announcement per message, the host decides
  what to do with each, multi-gigabyte recordings do not have to fit
  in memory.
- Rename CatalogEntry::kWildcard to CatalogEntry::kDefault throughout
  (PLUGIN_DEVELOPMENT.md and parser_ros/README.md); the conventional
  "*" key stays, the prose calls it the default entry.
@pabloinigoblasco pabloinigoblasco marked this pull request as ready for review May 11, 2026 09:25
pabloinigoblasco and others added 15 commits May 11, 2026 11:39
…talogEntry::kDefault

Aligns the implementation with the pattern documented in
PLUGIN_DEVELOPMENT.md and parser_ros/README.md: the catalog itself owns
the catch-all entry under the conventional "*" key, and bindSchema is a
single lookup with at most one fallback.

Behavior is unchanged: a catalog miss still resolves to
parseScalarsGeneric. The std::bind_front + SchemaHandler registration
path is identical.

Contents:
- Add static constexpr const char* CatalogEntry::kDefault = "*".
- Move the local kFallback entry into the catalog() map under
  CatalogEntry::kDefault.
- bindSchema lookup becomes: find(msg_type), and on miss
  find(CatalogEntry::kDefault). The default entry is guaranteed present
  by the catalog construction, so the second find always hits.
…ose comment

bindSchema does six distinct things in sequence (schema string copy,
ROS 2 name normalisation + base bind, rosx_introspection parser
construction, schema feature/deserializer prep, catalog lookup with
kDefault fallback, SchemaHandler bind + register). Each block now
carries a short comment naming its role so the method reads as a
checklist instead of a wall.

No behavior change.
The word "surface" was being used loosely in five places to mean
different things (virtual API, public contract, UI controls, the
extent of a small interface). Replaced each with a term that names
what it actually is, so the prose stops papering over distinct ideas
with a single fuzzy word.

Contents:
- "and that is the entire surface" → "and nothing more"  (MessageParser is-not list).
- "Stable surface" → "Stable for viewers"  (canonical-object variant rationale).
- "no virtual override surface" → "no virtual methods to override"  (SchemaHandler catalog rationale, two locations).
- "that surface lands" → "those controls land"  (future PJ4 ingest-policy UI).
…tput

DataSources hand the host raw bytes via a fetcher; only MessageParsers
emit named scalar columns and canonical-object variants. Renaming the
section makes the source of each output explicit, and the example
plugin list now matches (parser_json, parser_protobuf,
parser_data_tamer instead of a mixed list that included data_load_csv
and the streaming sources, which do not emit scalars/canonical objects
themselves).
…t handlers

Real-world ROS2 MCAP files fall into two camps: those generated by
Python
  tooling write CDR without post-string alignment padding, while files
  recorded by standard ROS2 DDS tools (rosbag2, Foxglove) follow XCDR1
and insert padding bytes so each uint32 field starts at a 4-byte
boundary.

  CdrSeqReader previously assumed no padding, silently mis-reading the
  format/encoding string for aligned files and returning an error from
parseCompressedImage/parseImage, leaving the media preview empty with
no
  visible diagnostic.

  Add a base pointer to track offset from the CDR data start and apply a
heuristic in u32(): if the current offset is not 4-byte aligned and
the
next byte is 0x00, treat it as a padding byte and skip it. String
lengths
  in sensor_msgs are always < 256, so their first LE byte is never 0x00,
  making the heuristic reliable for all known sensor_msgs image types.
…l-object handlers

Replace the local CdrSeqReader with RosMsgParser::Deserializer
(reused from the scalar path). This:

- Drops the "skip-if-next-byte-is-zero" CDR padding heuristic from
  4da1b77; nanocdr::Decoder applies proper XCDR1 origin-relative
  alignment without inspecting payload bytes.
- Honours the CDR encapsulation header's endianness flag (the custom
  reader silently assumed little-endian).
- Gains ROS1 support: readHeader() already branches on isROS2() for
  the ROS1-only seq field, and ROS_Deserializer handles the
  unaligned ROS1 wire format.
- Fixes parsePointCloud2 reading is_dense from the first byte of
  data[] instead of after it.

Zero-copy semantics preserved: deserializeByteSequence() returns a
Span over the payload and advances the cursor past it;
payload.anchor is still propagated into the canonical object.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No behavior change. Re-flows initializer-list entries and
declaration wrapping per .clang-format (Google style, 120-col,
InsertBraces: true). Output of running pre-commit's clang-format
hook over the files touched in this branch.
Resolves conflicts from PR #84 (chore: apply clang-format) landing on
main. The only conflicting file was data_load_mcap/mcap_source.cpp;
all three hunks were lines where this PR introduces the canonical-
object pipeline:

  - reader_keeper_.reset() vs reader.close() on summary-read failure.
    PR keeps the keeper so the deferred fetcher cannot outlive a
    closed reader; resolved by taking PR's version.
  - early-exit on ensureParserBinding error vs old branching shape.
    PR's continue-on-error form is what the rest of the loop expects
    (the bindings.emplace below runs only on the success path).
  - runtimeHost().pushMessage(binding, ts, fetcher) vs the older
    pushRawMessage(binding, ts, span). PR's fetcher-closure path is
    the whole point of the canonical-object pipeline.

Each conflict resolved by taking the PR side (HEAD); no semantic loss.
bindSchema used to call the SDK base class with the *normalised* name
("sensor_msgs/CompressedImage", stripped of "/msg/") and register the
SchemaHandler under the same normalised key. The internal catalog lookup
needs the normalised name, but the SDK's classifySchema /
parseScalars / parseObject all look up by whatever name the host passes
through — which is the *original* "sensor_msgs/msg/CompressedImage".
The mismatch made classifySchema always return kNone for canonical
schemas in real-world ROS2 MCAPs, demoting them to scalar ingest and
collapsing the canonical-object pipeline.

Hand the SDK base the original type_name (so bound_type_name_ matches
the host's lookup key) and register the SchemaHandler under it too.
The internal `msg_type` (with "/msg/" stripped) is still used to look
up the catalog entry inside this function — that's purely an internal
concern about how the catalog keys are spelled, kept out of the
host-facing surface.
…bjects' into feat/parser-ros-canonical-objects
Adopt the renamed and reorganized SDK from plotjuggler_core:

- canonical-object → builtin-object (types, includes, identifiers).
- Switch to pj_plugin_sdk INTERFACE library; parser SDK headers move
  from pj_base/sdk/ to pj_plugins/sdk/.
- parser_ros emits unified sdk::Image with encoding string ("rgb8",
  "jpeg", "compressedDepth", …) instead of the dropped sdk::Image +
  sdk::CompressedImage split.
- Reduce the "fetcher" shorthand in favour of FetchMessageData /
  fetch_message_data (concept name + identifier).
- Add magic_enum/0.9.7 to conanfile (transitively required by the
  scene-protocol vocabulary helpers).

No behaviour change.
Bring the guide up to the post-restructure shape of the SDK:

- canonical-object → builtin-object across naming, headers, and
  conceptual prose.
- `BuiltinObject` is `std::any` (defined in
  `pj_scene_protocol/builtin/BuiltinObject.h`); consumers recover the
  concrete type with `std::any_cast<sdk::Image>(&obj)`.
- Builtin type catalog updated: `Image` unified with `std::string
  encoding` ("rgb8" / "jpeg" / "compressedDepth" / …, replacing
  the old `PixelFormat` enum and the separate `CompressedImage`
  type), `DepthImage` (camera intrinsics: K + D + distortion_model),
  `PointCloud`, `ImageAnnotations` (first-class, with its own
  sub-types).
- Reduce "fetcher" vocabulary: refer to the deferred byte-producing
  callable as `FetchMessageData` (concept) / `fetchMessageData` (C
  ABI field) consistently.
- End-to-end dispatch ASCII updated to match.
- parser_ros reference now describes the unified `sdk::Image` flow.
@pabloinigoblasco pabloinigoblasco changed the title feat(parser_ros, mcap_source): canonical-object handlers + class-level schema catalog + zero-copy feat(parser_ros, mcap_source): builtin-object handlers + class-level schema catalog + zero-copy May 15, 2026
…ting DataSource vs MessageParser)

Restructure the guide around the three plugin shapes that actually
exist in this collection, recovered from earlier prose:

- Shape A — Self-parsing DataSource (kCapabilityDirectIngest):
  CSV / ULog / Parquet. Owns its decoder; writes via
  writeHost().appendRecord(...). No parser binding.
- Shape B — Delegating DataSource (kCapabilityDelegatedIngest):
  MCAP / streams. Two ingest calls:
    File   → pushMessage(handle, ts, fetchMessageData) (lazy callable)
    Stream → pushRawMessage(handle, ts, bytes) (eager span)
  Streams cannot offer a deferred callable; only kEager applies.
- Shape C — MessageParser: no I/O, decodes bytes by schema.

Add a worked example showing one parser (parser_protobuf) bound by
three sources (mcap / zmq / mqtt) through a single encoding-name key,
to make the 'any source × any parser' decoupling claim concrete.

Refresh the authoring checklist with one section per shape.

All content uses the post-restructure SDK shape (BuiltinObject as
std::any, unified sdk::Image with encoding string, DepthImage and
ImageAnnotations as first-class builtins, PJ_message_data_fetcher_t /
FetchMessageData vocabulary).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant