Skip to content

AprilTag marker 3D detector #2107

Open
bogwi wants to merge 50 commits into
mainfrom
feat/2036-with-2037
Open

AprilTag marker 3D detector #2107
bogwi wants to merge 50 commits into
mainfrom
feat/2036-with-2037

Conversation

@bogwi
Copy link
Copy Markdown
Collaborator

@bogwi bogwi commented May 16, 2026

Supersedes #2098 so self-hosted CI can run. Prior review discussion is preserved there.


This ship AprilTag marker 3D detector as per #2036

Goes in two parts:

I ) included a camera calibration tool - dedicated utility to calibrate camera and obtain camera_info.yaml
dimos/dimos/utils/cli/cameracalibrate

uv run pytest dimos/utils/cli/cameracalibrate

How to calibrate is explained in the dimos/docs/usage/camera_calibration.md


II ) The detector module here dimos/dimos/perception/fiducial
The module has been tested in real-life detecting

  • individual tags
  • a group of twelve tags
    all tried on on distances up to 4m, and various yaw, pitch, roll changes, slant changes.

verified streaming into rerun.
APRILTAG_36h11_12_A4_rerun_screenshot_2026-05-16 at 7 06 52

or partial detection, notice correctly detected tags on the bottom

APRILTAG_36h11_12_A4_rerun_screenshot_2026-05-16 at 7 17 08

If you have obtained for your camera a camera_info.yaml after the calibration step then you can sub it here dimos/dimos/perception/fiducial/blueprints/fixtures/camera_info.yaml and reproduce testing steps as

Manual sequence (two terminals from dimos repo, uv run dimos):

  1. term1: uv run dimos stop then uv run dimos run desk-marker-tf --daemon — note the printed Log: path.
  2. term2: uv run dimos rerun-bridge — default opens a native viewer (--rerun-open web for browser, none if headless). Waits until Ctrl+C.
  3. In the Rerun timeline / 3D view, expand entities under world/tf/ and confirm base_link, camera_optical, marker_tf/markers, marker_tf/marker_<id> when the printed tag is in view (markers appear in bursts matching detection).
  4. End: Ctrl+C on the bridge, uv run dimos stop.

Python tests are based on:

  • fixture PNGs load; (obtained from rerun)
  • OpenCV detects expected AprilTag IDs;
  • detected corners match the generated 12-tag PDF layout via homography;
  • swapped IDs/layout mismatches fail;
  • MarkerTfModule publishes expected marker frames with finite transforms;
  • AprilTag PDF generator layout remains stable;
  • existing MarkerTf unit behavior still passes.

uv run pytest dimos/perception/fiducial

leshy and others added 30 commits May 9, 2026 22:01
Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5
in deps), matching the project's pattern for the other ~15 types-* packages.
The stubs immediately caught a real bug — Canvas.setKeywords expects str |
None, not list[str].
Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 16, 2026

Greptile Summary

This PR ships an AprilTag 3D marker detection pipeline (Part II of #2036), including a camera calibration CLI tool and a MarkerTfModule that subscribes to color images + camera intrinsics, runs OpenCV ArucoDetector + solvePnP, and publishes per-marker TF transforms into the world frame. A DeskStaticTfModule republishes the fixed camera-to-base TF chain so TF lookups stay within the tolerance window.

  • marker_tf_module.py: Core perception module — detects AprilTag 36h11 markers per frame via ArucoDetector, estimates 3D pose with SOLVEPNP_IPPE_SQUARE, chains TF through world → base_link → camera_optical → marker, and publishes all marker frames in one batch.
  • cameracalibrate.py: Full camera-calibration workflow supporting webcam interactive capture or offline folder mode, writing ROS-style CameraInfo YAML.
  • fixture_verification.py: Board-image quality verifier — reprojects PDF layout via homography, classifies coverage, and gates on p95 reprojection error.

Confidence Score: 5/5

Safe to merge; the one finding is a defensive threading hygiene fix that only matters if stop() join times out, which is unlikely given the lightweight work done in publish_static_chain().

The detection pipeline, TF chaining, calibration workflow, and fixture verification are all well-implemented and covered by a thorough test suite including a live LCM integration test with a NumPy SE(3) oracle. The one concrete issue is the republish thread closure reading self._republish_stop via the instance on every iteration — if stop() join ever times out, the subsequent nullification of the attribute could cause an AttributeError in the daemon thread. In normal operation this path is never reached.

dimos/perception/fiducial/blueprints/desk_marker_tf.py — the _republish_loop closure

Important Files Changed

Filename Overview
dimos/perception/fiducial/marker_tf_module.py Core AprilTag detection module: subscribes to image+camera_info, runs ArucoDetector + solvePnP (IPPE_SQUARE), chains TF through world→base→optical→marker. Logic is sound.
dimos/perception/fiducial/blueprints/desk_marker_tf.py Blueprint wiring DeskStaticTfModule + CameraModule + MarkerTfModule. The republish thread has a minor closure safety issue (nullified event reference after stop()).
dimos/utils/cli/cameracalibrate/cameracalibrate.py Full camera calibration pipeline: SB/classic chessboard fallback, multi-candidate pattern detection, ROS CameraInfo YAML output. Both RuntimeError and ValueError are caught at CLI boundaries.
dimos/perception/fiducial/fixture_verification.py Board-image quality verifier using homography reprojection. Depends on private _grid_layout symbol (noted in prior threads).
dimos/robot/cli/topic.py Refactored on_msg into _decode_typed_lcm_message backed by resolve_msg_type; adds round-trip test. importlib still used by _resolve_type so no regression.
dimos/perception/fiducial/test_marker_tf_integration.py End-to-end LCM integration test with NumPy SE(3) oracle derived from OpenCV solvePnP. Tolerances appropriate for synthetic detection.

Sequence Diagram

sequenceDiagram
    participant WC as Webcam
    participant CM as CameraModule
    participant MTF as MarkerTfModule
    participant DST as DeskStaticTfModule
    participant TF as TF Bus

    DST->>TF: "Publish world->base_link (identity)"
    DST->>TF: "Publish base_link->camera_optical (fixed offset)"
    Note over DST,TF: Republish at 10 Hz

    WC->>CM: BGR frame + CameraInfo
    CM->>MTF: color_image
    CM->>MTF: camera_info

    MTF->>TF: "Lookup world->base_link at image.ts"
    TF-->>MTF: T_world_base
    MTF->>TF: "Lookup base_link->camera_optical at image.ts"
    TF-->>MTF: T_base_optical

    MTF->>MTF: ArucoDetector.detectMarkers(gray)
    loop For each detected marker
        MTF->>MTF: solvePnP(IPPE_SQUARE)
        MTF->>MTF: "T_world_marker = T_wb x T_bo x T_om"
    end

    MTF->>TF: "Publish world->markers_parent"
    MTF->>TF: "Publish markers_parent->marker_N"
Loading

Reviews (2): Last reviewed commit: "Merge upstream/main into feat/2036-with-..." | Re-trigger Greptile

Comment thread dimos/robot/cli/dimos.py
Comment thread dimos/utils/cli/cameracalibrate/cameracalibrate.py
Comment thread dimos/perception/fiducial/fixture_verification.py
Comment thread dimos/perception/fiducial/fixture_verification.py
@bogwi bogwi mentioned this pull request May 16, 2026

class DeskStaticTfModuleConfig(ModuleConfig):
world_frame: str = "world"
base_frame: str = "base_link"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does marker care about world, base link etc, those frames are emitted by other modules, it should only care about camera_optical -> it's own detections.

Copy link
Copy Markdown
Collaborator Author

@bogwi bogwi May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking it is not required by the math of marker detection. OpenCV only gives camera optical <- marker.

My point was: publish marker transforms so that everything downstream can treat markers like any other object in the robot’s world frame.

a) Without folding in world -> base -> optical, you would only have optical <- marker.
Then:

A marker sitting on a desk would jump in world whenever the robot moves because optical moves with the robot.
Nav, planning, maps, and multi-module stacks that already reason in world / map and base_link would each have to repeat the same chain as I understand: look up base and camera, compose with marker, and stay in sync on timestamps.

b) With it, the module publishes markers -> marker_id (and the world -> markers identity) so that marker poses are stable in world (given a world -> base and base -> optical) and consumers do not need camera TF or PnP details.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I understand, true yes. What you could do is ask a tf module for

world -> optical_frame transform.
then because you know optical_frame -> marker_1 you can publish

world -> markers -> marker_1

do we need base_link?

Copy link
Copy Markdown
Collaborator Author

@bogwi bogwi May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not need base_link as a mathematical step.

I wanted it doing two lookups, thinking to separate failure lanes:

a) world -> base fails -- you blame localization | odom | world naming.
b) base -> optical fails -- you blame the camera stuff

dimos/dimos/perception/fiducial/marker_tf_module.py 251; 266

class MarkerTfModuleConfig(ModuleConfig):
    """Configuration for :class:`MarkerTfModule`.

    ``marker_length_m`` is the physical edge length of the printed square marker
    in meters (required; no default).
    """

    world_frame: str = "world"
    base_frame: str = "base_link"
    ...

we have:

if t_world_base is None:
            logger.debug(
                "MarkerTfModule: no TF %s -> %s at ts=%s",
                self.config.world_frame,
                self.config.base_frame,
                image.ts,
            )
            return

and

if t_base_optical is None:
            logger.debug(
                "MarkerTfModule: no TF %s -> %s at ts=%s",
                self.config.base_frame,
                optical,
                image.ts,
            )
            return

and then at the bottom of
def _process_color_image(self, image: Image) -> None:

    t_base_marker = t_base_optical + t_optical_marker
    t_world_marker = t_world_base + t_base_marker
            out.append(
                Transform(
                    translation=t_world_marker.translation,
                    rotation=t_world_marker.rotation,
                    frame_id=markers_parent,
                    child_frame_id=self._marker_child_frame(mid),
                    ts=ts,
                )
            )

On the desk demo we do not publish one world > camera_optical transform, yet we publish two edges, world--base_link and base_link--camera_optical: localization - world attachment on one side and robot + camera mount on the other.

class DeskStaticTfModuleConfig(ModuleConfig):
    world_frame: str = "world"
    base_frame: str = "base_link"
    ...
def publish_static_chain(self) -> None:
        ts = time.time()
        self._last_publish_ts = ts
        roll, pitch, yaw = self.config.camera_rotation_rpy_rad
        x, y, z = self.config.camera_translation_m

        self.tf.publish(
            Transform(
                translation=Vector3(0.0, 0.0, 0.0),
                rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
                # edge 1
                frame_id=self.config.world_frame,
                child_frame_id=self.config.base_frame,
                ts=ts,
            ),
            Transform(
                # Default desk camera pose: about 25 cm forward and 15 cm above base_link.
                translation=Vector3(x, y, z),
                rotation=Quaternion.from_euler(Vector3(roll, pitch, yaw)),
                # edge 2
                frame_id=self.config.base_frame,
                child_frame_id=self.config.camera_optical_frame,
                ts=ts,
            ),
        )

base_link here is the parent of the camera in the TF tree not an extra stuff the marker invented. That seemed logical.

)


class DeskStaticTfModuleConfig(ModuleConfig):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I want to plug this into for example go2
https://github.com/dimensionalOS/dimos/blob/main/dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py#L34

how I should add this module? How do I tell it what's the CameraInfo for that robot?

Copy link
Copy Markdown
Collaborator Author

@bogwi bogwi May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unitree_go2 = autoconnect(
    unitree_go2_basic,
    VoxelGridMapper.blueprint(),
    CostMapper.blueprint(),
    ReplanningAStarPlanner.blueprint(),
    WavefrontFrontierExplorer.blueprint(),
    PatrollingModule.blueprint(),
    MovementManager.blueprint(),
).global_config(n_workers=10, robot_model="unitree_go2")

You add the fiducial module the same way

from dimos.perception.fiducial.marker_tf_module import MarkerTfModule

unitree_go2 = autoconnect(
    unitree_go2_basic,
    MarkerTfModule.blueprint(
        marker_length_m=...,  # physical edge length of the printed tag, meters
        # optional: aruco_dictionary, marker_namespace_prefix, world_frame, base_frame, max_freq, ...
    ),
    VoxelGridMapper.blueprint(),
    # ... rest unchanged
).global_config(n_workers=10, robot_model="unitree_go2")

You do not pass CameraInfo into MarkerTfModule config. That module has In[CameraInfo] (and In[Image]) and uses whatever stream is connected.

Copy link
Copy Markdown
Contributor

@leshy leshy May 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tested, it's all good! we considered using CameraInfo as just a static cli argument vs topic, but now with autoconnect topic is more convinient

@bogwi bogwi changed the title Feat/2036 with 2037 AprilTag marker 3D detector May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants