AprilTag marker 3D detector #2107
Conversation
…nto apriltag-generator
Switch from mypy-ignore to types-reportlab>=4.5.0 (matches reportlab 4.5 in deps), matching the project's pattern for the other ~15 types-* packages. The stubs immediately caught a real bug — Canvas.setKeywords expects str | None, not list[str].
# Conflicts: # uv.lock
Add a top-level `pytest.importorskip("cv2.aruco")` if not already present so CI without the extra skips, not errors.
…annel: str, data: bytes) -> None
# Conflicts: # dimos/robot/cli/dimos.py # pyproject.toml # uv.lock
Greptile SummaryThis PR ships an AprilTag 3D marker detection pipeline (Part II of #2036), including a camera calibration CLI tool and a
Confidence Score: 5/5Safe to merge; the one finding is a defensive threading hygiene fix that only matters if stop() join times out, which is unlikely given the lightweight work done in publish_static_chain(). The detection pipeline, TF chaining, calibration workflow, and fixture verification are all well-implemented and covered by a thorough test suite including a live LCM integration test with a NumPy SE(3) oracle. The one concrete issue is the republish thread closure reading self._republish_stop via the instance on every iteration — if stop() join ever times out, the subsequent nullification of the attribute could cause an AttributeError in the daemon thread. In normal operation this path is never reached. dimos/perception/fiducial/blueprints/desk_marker_tf.py — the _republish_loop closure Important Files Changed
Sequence DiagramsequenceDiagram
participant WC as Webcam
participant CM as CameraModule
participant MTF as MarkerTfModule
participant DST as DeskStaticTfModule
participant TF as TF Bus
DST->>TF: "Publish world->base_link (identity)"
DST->>TF: "Publish base_link->camera_optical (fixed offset)"
Note over DST,TF: Republish at 10 Hz
WC->>CM: BGR frame + CameraInfo
CM->>MTF: color_image
CM->>MTF: camera_info
MTF->>TF: "Lookup world->base_link at image.ts"
TF-->>MTF: T_world_base
MTF->>TF: "Lookup base_link->camera_optical at image.ts"
TF-->>MTF: T_base_optical
MTF->>MTF: ArucoDetector.detectMarkers(gray)
loop For each detected marker
MTF->>MTF: solvePnP(IPPE_SQUARE)
MTF->>MTF: "T_world_marker = T_wb x T_bo x T_om"
end
MTF->>TF: "Publish world->markers_parent"
MTF->>TF: "Publish markers_parent->marker_N"
Reviews (2): Last reviewed commit: "Merge upstream/main into feat/2036-with-..." | Re-trigger Greptile |
|
|
||
| class DeskStaticTfModuleConfig(ModuleConfig): | ||
| world_frame: str = "world" | ||
| base_frame: str = "base_link" |
There was a problem hiding this comment.
why does marker care about world, base link etc, those frames are emitted by other modules, it should only care about camera_optical -> it's own detections.
There was a problem hiding this comment.
Strictly speaking it is not required by the math of marker detection. OpenCV only gives camera optical <- marker.
My point was: publish marker transforms so that everything downstream can treat markers like any other object in the robot’s world frame.
a) Without folding in world -> base -> optical, you would only have optical <- marker.
Then:
A marker sitting on a desk would jump in world whenever the robot moves because optical moves with the robot.
Nav, planning, maps, and multi-module stacks that already reason in world / map and base_link would each have to repeat the same chain as I understand: look up base and camera, compose with marker, and stay in sync on timestamps.
b) With it, the module publishes markers -> marker_id (and the world -> markers identity) so that marker poses are stable in world (given a world -> base and base -> optical) and consumers do not need camera TF or PnP details.
There was a problem hiding this comment.
oh I understand, true yes. What you could do is ask a tf module for
world -> optical_frame transform.
then because you know optical_frame -> marker_1 you can publish
world -> markers -> marker_1
do we need base_link?
There was a problem hiding this comment.
We do not need base_link as a mathematical step.
I wanted it doing two lookups, thinking to separate failure lanes:
a) world -> base fails -- you blame localization | odom | world naming.
b) base -> optical fails -- you blame the camera stuff
dimos/dimos/perception/fiducial/marker_tf_module.py 251; 266
class MarkerTfModuleConfig(ModuleConfig):
"""Configuration for :class:`MarkerTfModule`.
``marker_length_m`` is the physical edge length of the printed square marker
in meters (required; no default).
"""
world_frame: str = "world"
base_frame: str = "base_link"
...we have:
if t_world_base is None:
logger.debug(
"MarkerTfModule: no TF %s -> %s at ts=%s",
self.config.world_frame,
self.config.base_frame,
image.ts,
)
returnand
if t_base_optical is None:
logger.debug(
"MarkerTfModule: no TF %s -> %s at ts=%s",
self.config.base_frame,
optical,
image.ts,
)
returnand then at the bottom of
def _process_color_image(self, image: Image) -> None:
t_base_marker = t_base_optical + t_optical_marker
t_world_marker = t_world_base + t_base_marker
out.append(
Transform(
translation=t_world_marker.translation,
rotation=t_world_marker.rotation,
frame_id=markers_parent,
child_frame_id=self._marker_child_frame(mid),
ts=ts,
)
)On the desk demo we do not publish one world > camera_optical transform, yet we publish two edges, world--base_link and base_link--camera_optical: localization - world attachment on one side and robot + camera mount on the other.
class DeskStaticTfModuleConfig(ModuleConfig):
world_frame: str = "world"
base_frame: str = "base_link"
...def publish_static_chain(self) -> None:
ts = time.time()
self._last_publish_ts = ts
roll, pitch, yaw = self.config.camera_rotation_rpy_rad
x, y, z = self.config.camera_translation_m
self.tf.publish(
Transform(
translation=Vector3(0.0, 0.0, 0.0),
rotation=Quaternion(0.0, 0.0, 0.0, 1.0),
# edge 1
frame_id=self.config.world_frame,
child_frame_id=self.config.base_frame,
ts=ts,
),
Transform(
# Default desk camera pose: about 25 cm forward and 15 cm above base_link.
translation=Vector3(x, y, z),
rotation=Quaternion.from_euler(Vector3(roll, pitch, yaw)),
# edge 2
frame_id=self.config.base_frame,
child_frame_id=self.config.camera_optical_frame,
ts=ts,
),
)base_link here is the parent of the camera in the TF tree not an extra stuff the marker invented. That seemed logical.
| ) | ||
|
|
||
|
|
||
| class DeskStaticTfModuleConfig(ModuleConfig): |
There was a problem hiding this comment.
if I want to plug this into for example go2
https://github.com/dimensionalOS/dimos/blob/main/dimos/robot/unitree/go2/blueprints/smart/unitree_go2.py#L34
how I should add this module? How do I tell it what's the CameraInfo for that robot?
There was a problem hiding this comment.
unitree_go2 = autoconnect(
unitree_go2_basic,
VoxelGridMapper.blueprint(),
CostMapper.blueprint(),
ReplanningAStarPlanner.blueprint(),
WavefrontFrontierExplorer.blueprint(),
PatrollingModule.blueprint(),
MovementManager.blueprint(),
).global_config(n_workers=10, robot_model="unitree_go2")You add the fiducial module the same way
from dimos.perception.fiducial.marker_tf_module import MarkerTfModule
unitree_go2 = autoconnect(
unitree_go2_basic,
MarkerTfModule.blueprint(
marker_length_m=..., # physical edge length of the printed tag, meters
# optional: aruco_dictionary, marker_namespace_prefix, world_frame, base_frame, max_freq, ...
),
VoxelGridMapper.blueprint(),
# ... rest unchanged
).global_config(n_workers=10, robot_model="unitree_go2")You do not pass CameraInfo into MarkerTfModule config. That module has In[CameraInfo] (and In[Image]) and uses whatever stream is connected.
There was a problem hiding this comment.
tested, it's all good! we considered using CameraInfo as just a static cli argument vs topic, but now with autoconnect topic is more convinient
Supersedes #2098 so self-hosted CI can run. Prior review discussion is preserved there.
This ship AprilTag marker 3D detector as per #2036
Goes in two parts:
I ) included a camera calibration tool - dedicated utility to calibrate camera and obtain camera_info.yaml
dimos/dimos/utils/cli/cameracalibrate
uv run pytest dimos/utils/cli/cameracalibrateHow to calibrate is explained in the dimos/docs/usage/camera_calibration.md
II ) The detector module here dimos/dimos/perception/fiducial
The module has been tested in real-life detecting
all tried on on distances up to 4m, and various yaw, pitch, roll changes, slant changes.
verified streaming into rerun.

or partial detection, notice correctly detected tags on the bottom
If you have obtained for your camera a camera_info.yaml after the calibration step then you can sub it here dimos/dimos/perception/fiducial/blueprints/fixtures/camera_info.yaml and reproduce testing steps as
Manual sequence (two terminals from
dimosrepo,uv run dimos):uv run dimos stopthenuv run dimos run desk-marker-tf --daemon— note the printedLog:path.uv run dimos rerun-bridge— default opens a native viewer (--rerun-open webfor browser,noneif headless). Waits until Ctrl+C.world/tf/and confirmbase_link,camera_optical,marker_tf/markers,marker_tf/marker_<id>when the printed tag is in view (markers appear in bursts matching detection).uv run dimos stop.Python tests are based on:
uv run pytest dimos/perception/fiducial