FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion
Dian Shaoβ , Mingfei Shi, Like Liu
β Corresponding Author
Northwestern Polytechnical University
The 40th Annual AAAI Conference on Artificial Intelligence (AAAI-26)
Recognizing fine-grained actions from temporally corrupted skeleton sequences remains a significant challenge, particularly in real-world scenarios where online pose estimation often yields substantial missing data. Existing methods often struggle to accurately recover temporal dynamics and fine-grained spatial structures, resulting in the loss of subtle motion cues crucial for distinguishing similar actions. To address this, we propose FineTec, a unified framework for Fine-grained action recognition under Temporal Corruption. FineTec first restores a base skeleton sequence from corrupted input using context-aware completion with diverse temporal masking. Next, a skeleton-based spatial decomposition module partitions the skeleton into five semantic regions, further divides them into dynamic and static subgroups based on motion variance, and generates two augmented skeleton sequences via targeted perturbation. These, along with the base sequence, are then processed by a physics-driven estimation module, which utilizes Lagrangian dynamics to estimate joint accelerations. Finally, both the fused skeleton position sequence and the fused acceleration sequence are jointly fed into a GCN-based action recognition head. Extensive experiments on both coarse-grained (NTU-60, NTU-120) and fine-grained (Gym99, Gym288) benchmarks show that FineTec significantly outperforms previous methods under various levels of temporal corruption. Specifically, FineTec achieves top-1 accuracies of 89.1% and 78.1% on the challenging Gym99-severe and Gym288-severe settings, respectively, demonstrating its robustness and generalizability.
- [2025.12.31]: Paper released on arXiv.
- [2025.12.30]: Dataset Gym288-skeleton released on Hugging Face.
- [2025.11.08]: Github repository initialized.
- Release paper.
- Release dataset.
- Release training code
- Release inference code.
- Release model weights.
The Gym288-skeleton dataset is a human skeleton-based action recognition benchmark derived from the Gym288 subset of the FineGym dataset. It provides temporally precise, fine-grained annotations of gymnastic actions along with 2D human pose sequences extracted from original video frames.
This dataset is designed to support research in:
- Fine-grained action recognition
- Temporally corrupted or incomplete action modeling
- Skeleton-based representation learning
- Physics-aware motion understanding
Key Statistics:
- Total instances: 38,223 action sequences
- Action classes: 288 fine-grained gymnastic elements
- Training samples: 28,739
- Test samples: 9,484
- Keypoint format: 17 COCO-style 2D joints per frame
- Apparatuses: Floor Exercise (FX), Balance Beam (BB), Uneven Bars (UB), Vault (VT)
- Pose estimator: HRNet
The dataset and detail information are available on Hugging Face.
Coming Soon~
Please consider citing our paper if our work is useful. Also cite FineGym if you use dataset Gym288-skeleton.
@misc{shao2025finetec,
title={FineTec: Fine-Grained Action Recognition Under Temporal Corruption via Skeleton Decomposition and Sequence Completion},
author={Dian Shao and Mingfei Shi and Like Liu},
year={2025},
eprint={2512.25067},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2512.25067}
}For any question, feel free to email mingfeishi5@mail.nwpu.edu.cn.