feat(parser_ros, mcap_source): builtin-object handlers + class-level schema catalog + zero-copy#82
Open
pabloinigoblasco wants to merge 20 commits into
Open
feat(parser_ros, mcap_source): builtin-object handlers + class-level schema catalog + zero-copy#82pabloinigoblasco wants to merge 20 commits into
pabloinigoblasco wants to merge 20 commits into
Conversation
…l schema catalog + zero-copy Rework of parser_ros and mcap_source around the canonical-object pipeline introduced in plotjuggler_core. Builds on the SDK's SchemaHandler table + BufferAnchor + push_message_v2 contract. parser_ros — schema dispatch: - Class-level catalog (catalog() returns a static unordered_map) keyed by canonical ROS type name. Each entry holds member-function pointers — no `this` capture at class scope. - bindSchema looks the bound type up in the catalog and registers a single SchemaHandler tailored to it on this instance (specific canonical-object handler, specific scalar handler, or the generic flatten as fallback for any type not in the catalog). - The class catalog covers all 16 specialized scalar schemas (Imu, JointState, Pose, Transform, …) plus the 3 canonical-object schemas (Image, CompressedImage, PointCloud2). parser_ros — three canonical-object handlers: - parseImage: maps ROS encodings (rgb8/rgba8/bgr8/bgra8/mono8/mono16/ 16UC1) to canonical PixelFormat with row_step preserved. BGR variants stay as kBGR*; the consumer handles channel order via texture-format selection. Zero-copy — pixels Span sits over the payload buffer shared via the anchor. - parseCompressedImage: handles JPEG, PNG, and the ROS compressedDepth wrapper (depth_min/depth_max in CompressedImage::extras). Zero-copy slice of the payload at the right offset. - parsePointCloud: walks PointField[] and emits a sdk::PointCloud whose data Span sits over the payload bytes. parser_ros — generic / specialized scalar paths: - parseScalarsGeneric: rosx_introspection's flattenGeneric harvested to a vector<NamedFieldValue>. Used as the default-handler scalar route in bindSchema for any schema not in the specific table. - parseScalarsDiscardingLargeArrays: same flow but with the array policy forced to DISCARD_LARGE_ARRAYS temporarily. Used by the canonical-object schemas so their bulk byte payload doesn't show up as scalar columns. - wrapVoidHandler<Handler> template member fn: adapts the existing imperative handle*() void methods to the parse_scalars callable signature. Each instantiation is a member-fn-ptr with the right shape, slotting directly into the catalog without bind_front gymnastics. mcap_source: - pushMessage with a deferred byte fetcher per message: closure captures the open McapReader (shared_ptr held on the source for the session lifetime) and the message coordinates (channel id + log_time). The host decides via ObjectIngestPolicy when (and whether) to invoke it. - readMessageBytesAt returns sdk::PayloadView — Span<const uint8_t> over a heap-held shared_ptr<vector<uint8_t>> that serves as the BufferAnchor. The chunk-iterator copy is unchanged for now; the seek+read direct path with an LRU of decompressed chunks is the next optimization step. - Reader keeper: shared_ptr<McapReader> stored as a member so the fetcher closures pushed via pushMessage stay valid past importData(). Closed automatically when the source is destroyed. Status: design sketch posted as a draft. Compiles cleanly with the companion SDK / runtime work; not yet exercised end-to-end against real mcap files.
…chemaHandler architecture Forward-looking developer guide for the canonical-object pipeline. Contents: - Root README: new "Plugin architecture — the declarative shape" section describing the DataSource pushMessage + fetcher contract, the MessageParser SchemaHandler catalog, and the end-to-end host dispatch flow (kEager / kLazyObjectsEagerScalars / kPureLazy). - parser_ros/README.md: rewritten around the static schema catalog, three canonical-object handlers (Image, CompressedImage, PointCloud2) with zero-copy spans over the source payload, and steps to add a new schema entry. - data_load_mcap/README.md: rewritten around the deferred byte fetcher, PayloadView + BufferAnchor lifetime, and the loader's role as a pure ingest pipe that lets the host decide eager vs lazy materialization.
…with wildcard entry
Contents:
- New PLUGIN_DEVELOPMENT.md as the authoring entry point. Walks
developers through the two plugin families (DataLoader vs
DataStream under DataSource, MessageParser) and what each is and is
not. Frames the historical scalar-extraction model as the primary
product of the ingest pipeline, then introduces canonical objects
(sdk::Image, sdk::CompressedImage, sdk::PointCloud, with natural
extension points for 2D laser scans, meshes, transformation trees)
as the second narrow channel for non-scalar media, motivating the
on-demand load callback as the consequence of aggregate payload
size. Covers PayloadView + BufferAnchor, the DataSource pushMessage
+ fetcher shape, the MessageParser SchemaHandler catalog with
optional CatalogEntry::kWildcard fallback, host-side eager / lazy /
pure-lazy dispatch, end-to-end flow, and authoring checklists.
- Trimmed root README architecture section to a one-paragraph
overview plus a link to PLUGIN_DEVELOPMENT.md.
- parser_ros/README.md: the catalog now declares the generic
introspection handler as a wildcard ("*") entry, so bindSchema
collapses to a single
registerSchemaHandler(makeHandler(catalog().resolve(type_name)))
call with no branching.
…cher, clarify host policy is not the plugin's problem
Contents:
- Define the fetcher in plain language at its first mention ("a small
callback the host can invoke later to retrieve the bytes for that
message"); drop the forward-reference style further down.
- Reframe "What plugins emit" as scalars and canonical objects "in
time", emphasizing both channels are timestamped streams.
- Replace the "future video panels" placeholder with concrete
extension examples (2D laser scans, meshes, transformation trees)
and add a paragraph explaining why canonical objects are typically
loaded on demand (200 GB MCAP example) and why this motivates the
fetcher pattern.
- "How the host uses these declarations" now opens with an explicit
"you do not need to write code for this" note, flags that ingest
policy will become user-facing in PJ4, and adds a "What this means
in practice for the plugin" subsection: keep the source open and
seekable, do not cache decoded data inside the plugin, keep the
fetcher idempotent.
- Soften the DataLoader sub-shape description: "enumerate" is lighter
than it sounds — one announcement per message, the host decides
what to do with each, multi-gigabyte recordings do not have to fit
in memory.
- Rename CatalogEntry::kWildcard to CatalogEntry::kDefault throughout
(PLUGIN_DEVELOPMENT.md and parser_ros/README.md); the conventional
"*" key stays, the prose calls it the default entry.
…talogEntry::kDefault Aligns the implementation with the pattern documented in PLUGIN_DEVELOPMENT.md and parser_ros/README.md: the catalog itself owns the catch-all entry under the conventional "*" key, and bindSchema is a single lookup with at most one fallback. Behavior is unchanged: a catalog miss still resolves to parseScalarsGeneric. The std::bind_front + SchemaHandler registration path is identical. Contents: - Add static constexpr const char* CatalogEntry::kDefault = "*". - Move the local kFallback entry into the catalog() map under CatalogEntry::kDefault. - bindSchema lookup becomes: find(msg_type), and on miss find(CatalogEntry::kDefault). The default entry is guaranteed present by the catalog construction, so the second find always hits.
…ose comment bindSchema does six distinct things in sequence (schema string copy, ROS 2 name normalisation + base bind, rosx_introspection parser construction, schema feature/deserializer prep, catalog lookup with kDefault fallback, SchemaHandler bind + register). Each block now carries a short comment naming its role so the method reads as a checklist instead of a wall. No behavior change.
The word "surface" was being used loosely in five places to mean different things (virtual API, public contract, UI controls, the extent of a small interface). Replaced each with a term that names what it actually is, so the prose stops papering over distinct ideas with a single fuzzy word. Contents: - "and that is the entire surface" → "and nothing more" (MessageParser is-not list). - "Stable surface" → "Stable for viewers" (canonical-object variant rationale). - "no virtual override surface" → "no virtual methods to override" (SchemaHandler catalog rationale, two locations). - "that surface lands" → "those controls land" (future PJ4 ingest-policy UI).
…tput DataSources hand the host raw bytes via a fetcher; only MessageParsers emit named scalar columns and canonical-object variants. Renaming the section makes the source of each output explicit, and the example plugin list now matches (parser_json, parser_protobuf, parser_data_tamer instead of a mixed list that included data_load_csv and the streaming sources, which do not emit scalars/canonical objects themselves).
…t handlers Real-world ROS2 MCAP files fall into two camps: those generated by Python tooling write CDR without post-string alignment padding, while files recorded by standard ROS2 DDS tools (rosbag2, Foxglove) follow XCDR1 and insert padding bytes so each uint32 field starts at a 4-byte boundary. CdrSeqReader previously assumed no padding, silently mis-reading the format/encoding string for aligned files and returning an error from parseCompressedImage/parseImage, leaving the media preview empty with no visible diagnostic. Add a base pointer to track offset from the CDR data start and apply a heuristic in u32(): if the current offset is not 4-byte aligned and the next byte is 0x00, treat it as a padding byte and skip it. String lengths in sensor_msgs are always < 256, so their first LE byte is never 0x00, making the heuristic reliable for all known sensor_msgs image types.
…l-object handlers Replace the local CdrSeqReader with RosMsgParser::Deserializer (reused from the scalar path). This: - Drops the "skip-if-next-byte-is-zero" CDR padding heuristic from 4da1b77; nanocdr::Decoder applies proper XCDR1 origin-relative alignment without inspecting payload bytes. - Honours the CDR encapsulation header's endianness flag (the custom reader silently assumed little-endian). - Gains ROS1 support: readHeader() already branches on isROS2() for the ROS1-only seq field, and ROS_Deserializer handles the unaligned ROS1 wire format. - Fixes parsePointCloud2 reading is_dense from the first byte of data[] instead of after it. Zero-copy semantics preserved: deserializeByteSequence() returns a Span over the payload and advances the cursor past it; payload.anchor is still propagated into the canonical object. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
No behavior change. Re-flows initializer-list entries and declaration wrapping per .clang-format (Google style, 120-col, InsertBraces: true). Output of running pre-commit's clang-format hook over the files touched in this branch.
Resolves conflicts from PR #84 (chore: apply clang-format) landing on main. The only conflicting file was data_load_mcap/mcap_source.cpp; all three hunks were lines where this PR introduces the canonical- object pipeline: - reader_keeper_.reset() vs reader.close() on summary-read failure. PR keeps the keeper so the deferred fetcher cannot outlive a closed reader; resolved by taking PR's version. - early-exit on ensureParserBinding error vs old branching shape. PR's continue-on-error form is what the rest of the loop expects (the bindings.emplace below runs only on the success path). - runtimeHost().pushMessage(binding, ts, fetcher) vs the older pushRawMessage(binding, ts, span). PR's fetcher-closure path is the whole point of the canonical-object pipeline. Each conflict resolved by taking the PR side (HEAD); no semantic loss.
bindSchema used to call the SDK base class with the *normalised* name
("sensor_msgs/CompressedImage", stripped of "/msg/") and register the
SchemaHandler under the same normalised key. The internal catalog lookup
needs the normalised name, but the SDK's classifySchema /
parseScalars / parseObject all look up by whatever name the host passes
through — which is the *original* "sensor_msgs/msg/CompressedImage".
The mismatch made classifySchema always return kNone for canonical
schemas in real-world ROS2 MCAPs, demoting them to scalar ingest and
collapsing the canonical-object pipeline.
Hand the SDK base the original type_name (so bound_type_name_ matches
the host's lookup key) and register the SchemaHandler under it too.
The internal `msg_type` (with "/msg/" stripped) is still used to look
up the catalog entry inside this function — that's purely an internal
concern about how the catalog keys are spelled, kept out of the
host-facing surface.
…bjects' into feat/parser-ros-canonical-objects
Adopt the renamed and reorganized SDK from plotjuggler_core:
- canonical-object → builtin-object (types, includes, identifiers).
- Switch to pj_plugin_sdk INTERFACE library; parser SDK headers move
from pj_base/sdk/ to pj_plugins/sdk/.
- parser_ros emits unified sdk::Image with encoding string ("rgb8",
"jpeg", "compressedDepth", …) instead of the dropped sdk::Image +
sdk::CompressedImage split.
- Reduce the "fetcher" shorthand in favour of FetchMessageData /
fetch_message_data (concept name + identifier).
- Add magic_enum/0.9.7 to conanfile (transitively required by the
scene-protocol vocabulary helpers).
No behaviour change.
Bring the guide up to the post-restructure shape of the SDK:
- canonical-object → builtin-object across naming, headers, and
conceptual prose.
- `BuiltinObject` is `std::any` (defined in
`pj_scene_protocol/builtin/BuiltinObject.h`); consumers recover the
concrete type with `std::any_cast<sdk::Image>(&obj)`.
- Builtin type catalog updated: `Image` unified with `std::string
encoding` ("rgb8" / "jpeg" / "compressedDepth" / …, replacing
the old `PixelFormat` enum and the separate `CompressedImage`
type), `DepthImage` (camera intrinsics: K + D + distortion_model),
`PointCloud`, `ImageAnnotations` (first-class, with its own
sub-types).
- Reduce "fetcher" vocabulary: refer to the deferred byte-producing
callable as `FetchMessageData` (concept) / `fetchMessageData` (C
ABI field) consistently.
- End-to-end dispatch ASCII updated to match.
- parser_ros reference now describes the unified `sdk::Image` flow.
…ting DataSource vs MessageParser)
Restructure the guide around the three plugin shapes that actually
exist in this collection, recovered from earlier prose:
- Shape A — Self-parsing DataSource (kCapabilityDirectIngest):
CSV / ULog / Parquet. Owns its decoder; writes via
writeHost().appendRecord(...). No parser binding.
- Shape B — Delegating DataSource (kCapabilityDelegatedIngest):
MCAP / streams. Two ingest calls:
File → pushMessage(handle, ts, fetchMessageData) (lazy callable)
Stream → pushRawMessage(handle, ts, bytes) (eager span)
Streams cannot offer a deferred callable; only kEager applies.
- Shape C — MessageParser: no I/O, decodes bytes by schema.
Add a worked example showing one parser (parser_protobuf) bound by
three sources (mcap / zmq / mqtt) through a single encoding-name key,
to make the 'any source × any parser' decoupling claim concrete.
Refresh the authoring checklist with one section per shape.
All content uses the post-restructure SDK shape (BuiltinObject as
std::any, unified sdk::Image with encoding string, DepthImage and
ImageAnnotations as first-class builtins, PJ_message_data_fetcher_t /
FetchMessageData vocabulary).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rework of
parser_rosandmcap_sourcearound the builtin-object pipeline introduced in the companion SDK PR. Builds on the SDK'sSchemaHandlertable +BufferAnchor+push_message_v2contract; consumes the builtin types defined inpj_scene_protocol/builtin/.Posted as a draft for review. Compiles cleanly with the SDK PR.
parser_ros
Schema dispatch (one place, one source of truth)
catalog()returns astatic unordered_map) keyed by ROS type name. Each entry holds member-function pointers — nothiscapture at class scope. Adding a schema = one entry in the table.bindSchemalooks the bound type up and registers a singleSchemaHandlertailored to it on this instance (builtin-object handler, scalar handler, or the generic flatten as fallback).sensor_msgs/Image,sensor_msgs/CompressedImage,sensor_msgs/PointCloud2).Builtin-object handlers (zero-copy)
parseImage/parseCompressedImage: both produce the unifiedsdk::Imageintroduced in the SDK rework. ROS encodings (rgb8/rgba8/bgr8/bgra8/mono8/mono16/16UC1) flow intosdk::Image::encodingas their string name; the JPEG/PNG/compressedDepthcodecs become"jpeg"/"png"/"compressedDepth". There is no longer a separateCompressedImagebuiltin — theencodingstring distinguishes raw from compressed.compressedDepthcarriescompressed_depth_min/compressed_depth_maxon the same struct. Pixel/bytesSpansits over the payload buffer shared via theBufferAnchor.parsePointCloud: walksPointField[]metadata and emits ansdk::PointCloudwhosedataSpan sits over the payload bytes.Generic + specialized scalar paths
parseScalarsGeneric: rosx_introspection'sflattenGenericharvested into avector<NamedFieldValue>. Used as the default-handler scalar route inbindSchemafor any schema not in the specific table.parseScalarsDiscardingLargeArrays: same flow with the array policy forced toDISCARD_LARGE_ARRAYStemporarily. Used by the builtin-object schemas so their bulk byte payload doesn't show up as scalar columns.wrapVoidHandler<Handler>template member function: adapts the existing imperativehandle*()void methods to theparse_scalarscallable signature. Each instantiation is a member-fn-ptr with the right shape, slotting directly into the catalog.mcap_source
pushMessagewith a deferredFetchMessageDatacallable per message: closure captures the openMcapReader(kept alive by ashared_ptrmember of the source) and the message coordinates. The host decides viaObjectIngestPolicywhen (and whether) to invoke it.readMessageBytesAtreturnssdk::PayloadView—Span<const uint8_t>over a heap-heldshared_ptr<vector<uint8_t>>that serves as theBufferAnchor. The chunk-iterator copy is unchanged for now; seek+read direct with an LRU of decompressed chunks is the next optimization step.shared_ptr<McapReader>stored as a member so theFetchMessageDataclosures pushed viapushMessagestay valid pastimportData(). Closed automatically when the source is destroyed.Build
magic_enum/0.9.7added toconanfile.txtfor the SDK vocabulary helpers (BuiltinObjectKind,CommonImageEncoding).Test plan
FetchMessageDatainvocations under each policy mode — pending.Companion work
PlotJuggler/plotjuggler_core#86(SDK builtin-object pipeline:SchemaHandlertable,BufferAnchor,push_message_v2ABI,ObjectIngestPolicyResolver, type-erasedBuiltinObjectviastd::any,PJ_message_data_fetcher_t).PLUGIN_DEVELOPMENT.mdin this repo +pj_plugins/docs/data-source-guide.md+pj_plugins/docs/message-parser-guide.mdin the core PR.