Skip to content

feat(deployment): centerpoint deployment integration#181

Open
vividf wants to merge 27 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
vividf:feat/centerpoint_deployment_integration
Open

feat(deployment): centerpoint deployment integration#181
vividf wants to merge 27 commits intotier4:feat/new_deployment_and_evaluation_pipelinefrom
vividf:feat/centerpoint_deployment_integration

Conversation

@vividf
Copy link
Collaborator

@vividf vividf commented Feb 2, 2026

Summary

Integrates CenterPoint into the unified deployment framework, enabling deployment and evaluation of ONNX and TensorRT models.

Note, this PR include changes in #180

Changes

  • Integrated CenterPoint with deployment framework:
    • Moved deployment code from projects/CenterPoint to deployment/projects/centerpoint
    • Implemented component-based export pipeline for ONNX and TensorRT
    • Added runtime inference support with PyTorch, ONNX Runtime, and TensorRT backends
  • Deployment capabilities:
    • Export CenterPoint models to ONNX format
    • Export CenterPoint models to TensorRT engines
    • Component-based architecture (voxel encoder, backbone+head) for flexible deployment
  • Evaluation capabilities:
    • Evaluate ONNX models using ONNX Runtime
    • Evaluate TensorRT engines
    • Integrated metrics evaluation with deployment pipeline
  • Updated CLI: Replaced old deploy.py script with new unified CLI (deployment.cli.main)
  • Added Docker support: Created Dockerfile for deployment environment with TensorRT dependencies
  • Updated documentation: Added deployment and evaluation instructions in README

Migration Notes

  • Old deployment script (projects/CenterPoint/scripts/deploy.py) is removed
  • Use new CLI: python -m deployment.cli.main centerpoint <deploy_config> <model_config>
  • ONNX model variants are now registered via deployment.projects.centerpoint.onnx_models

How to run

python -m deployment.cli.main centerpoint   deployment/projects/centerpoint/config/deploy_config.py   projects/CenterPoint/configs/t4dataset/Centerpoint/second_secfpn_4xb16_121m_j6gen2_base_amp_t4metric_v2.py   --rot-y-axis-reference

Exported ONNX (Same)

Voxel Encoder
image

Backbone Head
image

@vividf vividf changed the title Feat/centerpoint deployment integration feat(deployment): centerpoint deployment integration Feb 2, 2026
@vividf vividf requested review from KSeangTan and yamsam February 2, 2026 16:33
@vividf vividf self-assigned this Feb 2, 2026
@vividf vividf marked this pull request as ready for review February 3, 2026 04:31
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch 2 times, most recently from bfb778f to 441d06e Compare February 16, 2026 06:08
Copy link
Collaborator

@KSeangTan KSeangTan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done the first round of reviewing, please consider to use dataclass and pydantic for configs, and do type checking there.

Therefore, we can remove all the type checking in the code

verification = dict(
enabled=False,
tolerance=1e-1,
tolerance=1,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explain what is tolerance here, and why updating from 0.1 to 1

Copy link
Collaborator Author

@vividf vividf Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value was originally set for calibration classification and later copied to CenterPoint, but it does not work correctly for CenterPoint.

INFO:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) latency: 205.08 ms
INFO:deployment.core.evaluation.verification_mixin:  output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.070197, mean_diff=0.007674
INFO:deployment.core.evaluation.verification_mixin:  output[reg]: shape=(1, 2, 510, 510), max_diff=0.007944, mean_diff=0.001120
INFO:deployment.core.evaluation.verification_mixin:  output[height]: shape=(1, 1, 510, 510), max_diff=0.025401, mean_diff=0.002122
INFO:deployment.core.evaluation.verification_mixin:  output[dim]: shape=(1, 3, 510, 510), max_diff=0.031920, mean_diff=0.001143
INFO:deployment.core.evaluation.verification_mixin:  output[rot]: shape=(1, 2, 510, 510), max_diff=0.075215, mean_diff=0.004582
INFO:deployment.core.evaluation.verification_mixin:  output[vel]: shape=(1, 2, 510, 510), max_diff=0.221999, mean_diff=0.004940
INFO:deployment.core.evaluation.verification_mixin:
  Overall Max difference: 0.221999
INFO:deployment.core.evaluation.verification_mixin:  Overall Mean difference: 0.004347
WARNING:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.221999 > tolerance: 0.100000)
INFO:deployment.core.evaluation.verification_mixin:

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know any reason why it fail? Since it seems like a verification, it's always better to check the reason rather than update the tolerance

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't necessarily indicate a failure.
When converting from PyTorch to TensorRT, some numerical differences are expected due to different kernels, precision handling, and TensorRT optimizations.

The verification is mainly used as a safeguard to detect major issues (e.g., incorrect conversion settings) rather than to enforce exact numerical equivalence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since 1e-1 is when we set for resnet18 for calibration classification, it is different in the cases.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, this is the verification result in tensorrt fp16 right? If that's the case, it makes sense

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway, 5e-1 can be a better value

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running onnx (cuda:0) reference...
2026-03-10 15:20:07.511273431 [V:onnxruntime:, execution_steps.cc:103 Execute] stream 0 activate notification with index 0
2026-03-10 15:20:07.567219724 [V:onnxruntime:, execution_steps.cc:47 Execute] stream 0 wait on Notification with id: 0
INFO:deployment.core.evaluation.verification_mixin:  onnx (cuda:0) latency: 1423.80 ms
INFO:deployment.core.evaluation.verification_mixin:
Running tensorrt (cuda:0) test...
INFO:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) latency: 1141.26 ms
INFO:deployment.core.evaluation.verification_mixin:  output[heatmap]: shape=(1, 5, 510, 510), max_diff=0.464849, mean_diff=0.056135
INFO:deployment.core.evaluation.verification_mixin:  output[reg]: shape=(1, 2, 510, 510), max_diff=0.056639, mean_diff=0.006198
INFO:deployment.core.evaluation.verification_mixin:  output[height]: shape=(1, 1, 510, 510), max_diff=0.227012, mean_diff=0.065522
INFO:deployment.core.evaluation.verification_mixin:  output[dim]: shape=(1, 3, 510, 510), max_diff=0.336713, mean_diff=0.028087
INFO:deployment.core.evaluation.verification_mixin:  output[rot]: shape=(1, 2, 510, 510), max_diff=0.515039, mean_diff=0.023962
INFO:deployment.core.evaluation.verification_mixin:  output[vel]: shape=(1, 2, 510, 510), max_diff=0.932002, mean_diff=0.034206
INFO:deployment.core.evaluation.verification_mixin:
  Overall Max difference: 0.932002
INFO:deployment.core.evaluation.verification_mixin:  Overall Mean difference: 0.037279
WARNING:deployment.core.evaluation.verification_mixin:  tensorrt (cuda:0) verification FAILED ✗ (max diff: 0.932002 > tolerance: 0.500000)

On a different computer, it can have different values.
I will leave 1 for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you set any random seed to set this validation since the randomness (for example, shuffling pointclouds) significantly affects the results. Otherwise, i believe the difference between computer is too huge

@property
def _components_cfg(self) -> Dict[str, Any]:
"""Get unified components configuration."""
if "components" not in self.config.deploy_cfg:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use assert here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this is fixed in the big refactor b3b8355

def _onnx_config(self) -> Dict[str, Any]:
"""Get shared ONNX export settings."""
onnx_config_raw = self.config.onnx_config
if onnx_config_raw is None:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, consider to use assert

use ONNX model variants.
"""
# Import triggers @MODELS.register_module() registrations
from deployment.projects.centerpoint.onnx_models import centerpoint_head_onnx as _head # noqa: F401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove register_models and simply move these to the top statements, and call them in __all__ individually

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed in b3b8355

try:
outputs = self._components_cfg["backbone_head"]["io"]["outputs"]
except KeyError as exc:
raise KeyError("Missing required config path: components_cfg['backbone_head']['io']['outputs']") from exc
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need these? I believe we should just use dataclass instead of dict, and thus we can remove these checking

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch from caa92a6 to 93e5558 Compare March 5, 2026 17:24
@vividf vividf changed the base branch from feat/new_deployment_and_evaluation_pipeline to main March 5, 2026 17:27
@vividf vividf changed the base branch from main to feat/new_deployment_and_evaluation_pipeline March 5, 2026 17:27
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch 3 times, most recently from de7020e to 6470ac5 Compare March 10, 2026 14:40
@KSeangTan
Copy link
Collaborator

Some of the modules, for example, dataloader should be able to be reused for the same detection3d tasks right?

model_cfg = Config.fromfile(args.model_cfg)
config = BaseDeploymentConfig(deploy_cfg)

_validate_required_components(config.components_cfg)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move _validate_required_components to BaseDeploymentConfig


context = CenterPointExportContext(rot_y_axis_reference=bool(getattr(args, "rot_y_axis_reference", False)))
runner.run(context=context)
return 0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to return status code here?

def _release_gpu_resources(self) -> None:
"""Release TensorRT resources (engines and contexts) and CUDA events."""
# Destroy CUDA events
if hasattr(self, "_backbone_start_event"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use for-loop to achieve this

@vividf vividf requested a review from KSeangTan March 11, 2026 04:01
}

for component_name, engine_path in engine_files.items():
if not osp.exists(engine_path):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error validation should be done in resolve_artifact_path

@vividf vividf force-pushed the feat/new_deployment_and_evaluation_pipeline branch from 5256306 to 2b28f60 Compare March 11, 2026 04:27
vividf added 7 commits March 11, 2026 13:28
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
vividf and others added 20 commits March 11, 2026 13:28
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
…erpoint

Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
Signed-off-by: vividf <yihsiang.fang@tier4.jp>
@vividf vividf force-pushed the feat/centerpoint_deployment_integration branch from 1ca0e1c to a6b9840 Compare March 11, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants