AprilTag marker 3D detector by bogwi · Pull Request #2107 · dimensionalOS/dimos

bogwi · 2026-05-16T08:35:36Z

Supersedes #2098 so self-hosted CI can run. Prior review discussion is preserved there.

This ship AprilTag marker 3D detector as per #2036

Goes in two parts:

I ) included a camera calibration tool - dedicated utility to calibrate camera and obtain camera_info.yaml
dimos/dimos/utils/cli/cameracalibrate

uv run pytest dimos/utils/cli/cameracalibrate

How to calibrate is explained in the dimos/docs/usage/camera_calibration.md

II ) The detector module here dimos/dimos/perception/fiducial
The module has been tested in real-life detecting

individual tags
a group of twelve tags
all tried on on distances up to 4m, and various yaw, pitch, roll changes, slant changes.

verified streaming into rerun.

or partial detection, notice correctly detected tags on the bottom

APRILTAG_36h11_12_A4_rerun_screenshot_2026-05-16 at 7 17 08

If you have obtained for your camera a camera_info.yaml after the calibration step then you can sub it here dimos/dimos/perception/fiducial/blueprints/fixtures/camera_info.yaml and reproduce testing steps as

Manual sequence (two terminals from dimos repo, uv run dimos):

term1: uv run dimos stop then uv run dimos run desk-marker-tf --daemon — note the printed Log: path.
term2: uv run dimos rerun-bridge — default opens a native viewer (--rerun-open web for browser, none if headless). Waits until Ctrl+C.
In the Rerun timeline / 3D view, expand entities under world/tf/ and confirm base_link, camera_optical, marker_tf/markers, marker_tf/marker_<id> when the printed tag is in view (markers appear in bursts matching detection).
End: Ctrl+C on the bridge, uv run dimos stop.

Python tests are based on:

fixture PNGs load; (obtained from rerun)
OpenCV detects expected AprilTag IDs;
detected corners match the generated 12-tag PDF layout via homography;
swapped IDs/layout mismatches fail;
MarkerTfModule publishes expected marker frames with finite transforms;
AprilTag PDF generator layout remains stable;
existing MarkerTf unit behavior still passes.

uv run pytest dimos/perception/fiducial

…nto apriltag-generator

Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5 in deps), matching the project's pattern for the other ~15 types-* packages. The stubs immediately caught a real bug — Canvas.setKeywords expects str | None, not list[str].

# Conflicts: # uv.lock

…ration)

Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.

…annel: str, data: bytes) -> None

# Conflicts: # dimos/robot/cli/dimos.py # pyproject.toml # uv.lock

greptile-apps · 2026-05-16T08:43:26Z

Greptile Summary

This PR ships an AprilTag 3D marker detection pipeline (Part II of #2036), including a camera calibration CLI tool and a MarkerTfModule that subscribes to color images + camera intrinsics, runs OpenCV ArucoDetector + solvePnP, and publishes per-marker TF transforms into the world frame. A DeskStaticTfModule republishes the fixed camera-to-base TF chain so TF lookups stay within the tolerance window.

marker_tf_module.py: Core perception module — detects AprilTag 36h11 markers per frame via ArucoDetector, estimates 3D pose with SOLVEPNP_IPPE_SQUARE, chains TF through world → base_link → camera_optical → marker, and publishes all marker frames in one batch.
cameracalibrate.py: Full camera-calibration workflow supporting webcam interactive capture or offline folder mode, writing ROS-style CameraInfo YAML.
fixture_verification.py: Board-image quality verifier — reprojects PDF layout via homography, classifies coverage, and gates on p95 reprojection error.

Confidence Score: 5/5

Safe to merge; the one finding is a defensive threading hygiene fix that only matters if stop() join times out, which is unlikely given the lightweight work done in publish_static_chain().

The detection pipeline, TF chaining, calibration workflow, and fixture verification are all well-implemented and covered by a thorough test suite including a live LCM integration test with a NumPy SE(3) oracle. The one concrete issue is the republish thread closure reading self._republish_stop via the instance on every iteration — if stop() join ever times out, the subsequent nullification of the attribute could cause an AttributeError in the daemon thread. In normal operation this path is never reached.

dimos/perception/fiducial/blueprints/desk_marker_tf.py — the _republish_loop closure

Important Files Changed

Filename	Overview
dimos/perception/fiducial/marker_tf_module.py	Core AprilTag detection module: subscribes to image+camera_info, runs ArucoDetector + solvePnP (IPPE_SQUARE), chains TF through world→base→optical→marker. Logic is sound.
dimos/perception/fiducial/blueprints/desk_marker_tf.py	Blueprint wiring DeskStaticTfModule + CameraModule + MarkerTfModule. The republish thread has a minor closure safety issue (nullified event reference after stop()).
dimos/utils/cli/cameracalibrate/cameracalibrate.py	Full camera calibration pipeline: SB/classic chessboard fallback, multi-candidate pattern detection, ROS CameraInfo YAML output. Both RuntimeError and ValueError are caught at CLI boundaries.
dimos/perception/fiducial/fixture_verification.py	Board-image quality verifier using homography reprojection. Depends on private _grid_layout symbol (noted in prior threads).
dimos/robot/cli/topic.py	Refactored on_msg into _decode_typed_lcm_message backed by resolve_msg_type; adds round-trip test. importlib still used by _resolve_type so no regression.
dimos/perception/fiducial/test_marker_tf_integration.py	End-to-end LCM integration test with NumPy SE(3) oracle derived from OpenCV solvePnP. Tolerances appropriate for synthetic detection.

Sequence Diagram

sequenceDiagram
    participant WC as Webcam
    participant CM as CameraModule
    participant MTF as MarkerTfModule
    participant DST as DeskStaticTfModule
    participant TF as TF Bus

    DST->>TF: "Publish world->base_link (identity)"
    DST->>TF: "Publish base_link->camera_optical (fixed offset)"
    Note over DST,TF: Republish at 10 Hz

    WC->>CM: BGR frame + CameraInfo
    CM->>MTF: color_image
    CM->>MTF: camera_info

    MTF->>TF: "Lookup world->base_link at image.ts"
    TF-->>MTF: T_world_base
    MTF->>TF: "Lookup base_link->camera_optical at image.ts"
    TF-->>MTF: T_base_optical

    MTF->>MTF: ArucoDetector.detectMarkers(gray)
    loop For each detected marker
        MTF->>MTF: solvePnP(IPPE_SQUARE)
        MTF->>MTF: "T_world_marker = T_wb x T_bo x T_om"
    end

    MTF->>TF: "Publish world->markers_parent"
    MTF->>TF: "Publish markers_parent->marker_N"

_{Reviews (2): Last reviewed commit: "Merge upstream/main into feat/2036-with-..." | Re-trigger Greptile}

leshy · 2026-05-16T08:54:00Z

+
+class DeskStaticTfModuleConfig(ModuleConfig):
+    world_frame: str = "world"
+    base_frame: str = "base_link"


why does marker care about world, base link etc, those frames are emitted by other modules, it should only care about camera_optical -> it's own detections.

Strictly speaking it is not required by the math of marker detection. OpenCV only gives camera optical <- marker.

My point was: publish marker transforms so that everything downstream can treat markers like any other object in the robot’s world frame.

a) Without folding in world -> base -> optical, you would only have optical <- marker.
Then:

A marker sitting on a desk would jump in world whenever the robot moves because optical moves with the robot.
Nav, planning, maps, and multi-module stacks that already reason in world / map and base_link would each have to repeat the same chain as I understand: look up base and camera, compose with marker, and stay in sync on timestamps.

b) With it, the module publishes markers -> marker_id (and the world -> markers identity) so that marker poses are stable in world (given a world -> base and base -> optical) and consumers do not need camera TF or PnP details.

oh I understand, true yes. What you could do is ask a tf module for

world -> optical_frame transform.
then because you know optical_frame -> marker_1 you can publish

world -> markers -> marker_1

do we need base_link?

We do not need base_link as a mathematical step.

I wanted it doing two lookups, thinking to separate failure lanes:

a) world -> base fails -- you blame localization | odom | world naming.
b) base -> optical fails -- you blame the camera stuff

dimos/dimos/perception/fiducial/marker_tf_module.py 251; 266

class MarkerTfModuleConfig(ModuleConfig): """Configuration for :class:`MarkerTfModule`. ``marker_length_m`` is the physical edge length of the printed square marker in meters (required; no default). """ world_frame: str = "world" base_frame: str = "base_link" ...

we have:

if t_world_base is None: logger.debug( "MarkerTfModule: no TF %s -> %s at ts=%s", self.config.world_frame, self.config.base_frame, image.ts, ) return

and

if t_base_optical is None: logger.debug( "MarkerTfModule: no TF %s -> %s at ts=%s", self.config.base_frame, optical, image.ts, ) return

and then at the bottom of
def _process_color_image(self, image: Image) -> None:

t_base_marker = t_base_optical + t_optical_marker t_world_marker = t_world_base + t_base_marker out.append( Transform( translation=t_world_marker.translation, rotation=t_world_marker.rotation, frame_id=markers_parent, child_frame_id=self._marker_child_frame(mid), ts=ts, ) )

On the desk demo we do not publish one world > camera_optical transform, yet we publish two edges, world--base_link and base_link--camera_optical: localization - world attachment on one side and robot + camera mount on the other.

class DeskStaticTfModuleConfig(ModuleConfig): world_frame: str = "world" base_frame: str = "base_link" ...

def publish_static_chain(self) -> None: ts = time.time() self._last_publish_ts = ts roll, pitch, yaw = self.config.camera_rotation_rpy_rad x, y, z = self.config.camera_translation_m self.tf.publish( Transform( translation=Vector3(0.0, 0.0, 0.0), rotation=Quaternion(0.0, 0.0, 0.0, 1.0), # edge 1 frame_id=self.config.world_frame, child_frame_id=self.config.base_frame, ts=ts, ), Transform( # Default desk camera pose: about 25 cm forward and 15 cm above base_link. translation=Vector3(x, y, z), rotation=Quaternion.from_euler(Vector3(roll, pitch, yaw)), # edge 2 frame_id=self.config.base_frame, child_frame_id=self.config.camera_optical_frame, ts=ts, ), )

base_link here is the parent of the camera in the TF tree not an extra stuff the marker invented. That seemed logical.

leshy · 2026-05-16T08:54:51Z

+    )
+
+
+class DeskStaticTfModuleConfig(ModuleConfig):


if I want to plug this into for example go2
https://github.com/dimensionalOS/dimos/blob/main/dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py#L34

how I should add this module? How do I tell it what's the CameraInfo for that robot?

unitree_go2 = autoconnect( unitree_go2_basic, VoxelGridMapper.blueprint(), CostMapper.blueprint(), ReplanningAStarPlanner.blueprint(), WavefrontFrontierExplorer.blueprint(), PatrollingModule.blueprint(), MovementManager.blueprint(), ).global_config(n_workers=10, robot_model="unitree_go2")

You add the fiducial module the same way

from dimos.perception.fiducial.marker_tf_module import MarkerTfModule unitree_go2 = autoconnect( unitree_go2_basic, MarkerTfModule.blueprint( marker_length_m=..., # physical edge length of the printed tag, meters # optional: aruco_dictionary, marker_namespace_prefix, world_frame, base_frame, max_freq, ... ), VoxelGridMapper.blueprint(), # ... rest unchanged ).global_config(n_workers=10, robot_model="unitree_go2")

You do not pass CameraInfo into MarkerTfModule config. That module has In[CameraInfo] (and In[Image]) and uses whatever stream is connected.

tested, it's all good! we considered using CameraInfo as just a static cli argument vs topic, but now with autoconnect topic is more convinient

…ated to cameracalibrate.py

codecov · 2026-05-17T02:09:56Z

Codecov Report

❌ Patch coverage is 85.43628% with 232 lines in your changes missing coverage. Please review.
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
dimos/utils/cli/cameracalibrate/cameracalibrate.py	73.31%	55 Missing and 40 partials ⚠️
dimos/perception/fiducial/fixture_verification.py	69.13%	63 Missing and 8 partials ⚠️
dimos/perception/fiducial/marker_tf_module.py	84.10%	15 Missing and 9 partials ⚠️
dimos/robot/cli/dimos.py	13.33%	13 Missing ⚠️
.../utils/cli/cameracalibrate/test_cameracalibrate.py	96.56%	9 Missing and 2 partials ⚠️
...s/perception/fiducial/blueprints/desk_marker_tf.py	92.30%	1 Missing and 4 partials ⚠️
.../perception/fiducial/test_marker_tf_integration.py	95.68%	1 Missing and 4 partials ⚠️
...perception/fiducial/testing/manual_frame_camera.py	84.21%	3 Missing ⚠️
dimos/robot/cli/topic.py	62.50%	2 Missing and 1 partial ⚠️
...s/perception/fiducial/test_fixture_verification.py	98.46%	1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

leshy and others added 30 commits May 9, 2026 22:01

first pass on apriltag generator

c7b15b5

good spacing

9c92334

Merge branch 'main' into apriltag-generator

7b3f657

fixes, tests

43047b9

Merge branch 'apriltag-generator' of github.com:dimensionalOS/dimos i…

a684f01

…nto apriltag-generator

real reportlab type stubs

40087f5

Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5 in deps), matching the project's pattern for the other ~15 types-* packages. The stubs immediately caught a real bug — Canvas.setKeywords expects str | None, not list[str].

Merge branch 'main' into apriltag-generator

a1edd38

impl_init: AprilTag marker 3D detector #2036

795883f

add aruco families and letter page size to apriltag generator

6ccca17

Merge remote-tracking branch 'origin/main' into apriltag-generator

671d475

# Conflicts: # uv.lock

add: new integ test; small improv

8deedbe

Merge feat/2036-apriltag-marker into feat/2036-with-2037 (#2037 integ…

dc1ed54

…ration)

Pin family and size to DICT_APRILTAG_36h11 in MarkerTfModuleConfig

0f0f99e

Fail loud at import time when contrib is missing

9d38d4f

CI: minimal aruco smoke test

8fd2138

Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.

Create the package skeleton to ship dimos cameracalibrate cli

e313790

Pure function: serialize calibration to ROS YAML

fd83e02

Test: YAML round-trip

c93aa07

Pure function: corner detection on a single frame

c796f38

Pure function: calibrate from a list of frames

0d7f2fc

Test: synthetic calibration

3779de1

Frame source: image folder

3b52b6d

Frame source: live webcam capture (interactive)

4216540

Typer command + dimos cameracalibrate registration

3921024

Save preview overlay PNG next to YAML

e27e032

Lint and CI guard

10df129

Add Camera calibration runbook

a3c6c72

Add test: calib YAML works DimosCameraInfo.from_yaml(str(out))

a158641

Write "Capture practice" section

51e4560

Write "Run dimos cameracalibrate" section

e4b27b3

bogwi and others added 17 commits May 12, 2026 01:38

Update cameracalibrate on e2e test; add debug logger

b8d81ac

Write "Verify the YAML" section && Cross-link runbook from code

b96788d

Static TF publisher helper

a58be3a

Wire Webcam -> CameraModule; fixed bug in topic.py::def on_msg(ch…

4ba5f11

…annel: str, data: bytes) -> None

Wire MarkerTfModule into desk blueprint

a656949

rename assets to fixtures

8d46b52

add fixtures for apriltag detection verification

e3e71a4

add new testing

d09783e

cleanup

3a4edcd

Merge upstream/main into feat/2036-with-2037

86ae3a4

# Conflicts: # dimos/robot/cli/dimos.py # pyproject.toml # uv.lock

Move manual frame camera stub under fiducial/testing for worker pickling

1f5efa7

fix: ci failures

f48184f

[autofix.ci] apply automated fixes

62c737e

fix: test_marker_tf_deploy_lcm_tf_integration

5633f70

fix: paul's cmt

0bb720c

add limit in camera read loop

3efdd41

Ignore layout.tags for false positive; ruff

9ef0649

greptile-apps Bot reviewed May 16, 2026

View reviewed changes

Comment thread dimos/robot/cli/dimos.py

Comment thread dimos/utils/cli/cameracalibrate/cameracalibrate.py

Comment thread dimos/perception/fiducial/fixture_verification.py

Comment thread dimos/perception/fiducial/fixture_verification.py

bogwi mentioned this pull request May 16, 2026

Feat/2036 with 2037 #2098

Closed

leshy reviewed May 16, 2026

View reviewed changes

bogwi changed the title ~~Feat/2036 with 2037~~ AprilTag marker 3D detector May 16, 2026

bogwi added 3 commits May 17, 2026 10:31

add except (ValueError, RuntimeError) to collect RuntimeError rel…

e299d1c

…ated to cameracalibrate.py

optimize: boost cameracalibrate.py performance; various improvs

4570f82

Merge upstream/main into feat/2036-with-2037

168f99d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AprilTag marker 3D detector #2107

AprilTag marker 3D detector #2107
bogwi wants to merge 50 commits into
mainfrom
feat/2036-with-2037

bogwi commented May 16, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented May 16, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leshy May 16, 2026

Uh oh!

bogwi May 16, 2026 •

edited

Loading

Uh oh!

leshy May 16, 2026

Uh oh!

bogwi May 16, 2026 •

edited

Loading

Uh oh!

leshy May 16, 2026

Uh oh!

bogwi May 16, 2026 •

edited

Loading

Uh oh!

leshy May 16, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bogwi commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps Bot commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

leshy May 16, 2026

Choose a reason for hiding this comment

Uh oh!

bogwi May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leshy May 16, 2026

Choose a reason for hiding this comment

Uh oh!

bogwi May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leshy May 16, 2026

Choose a reason for hiding this comment

Uh oh!

bogwi May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

leshy May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bogwi commented May 16, 2026 •

edited

Loading

greptile-apps Bot commented May 16, 2026 •

edited

Loading

bogwi May 16, 2026 •

edited

Loading

bogwi May 16, 2026 •

edited

Loading

bogwi May 16, 2026 •

edited

Loading

leshy May 16, 2026 •

edited

Loading

codecov Bot commented May 17, 2026 •

edited

Loading