TACMT

This is the official implementation of TACMT：Text-Aware Cross-Modal Transformer for Visual Grounding on High-Resolution SAR Images, which proposed to constructing dataset and developing multimodal deep learning models for the SARVG task.

Introduction

we have built a new benchmark of SARVG based on images from different SAR sensors to promote SARVG research fully. Subsequently, a novel text-aware cross-modal Transformer (TACMT) is proposed which follows DETR’s architecture. We develop a cross-modal encoder to enhance the visual features associated with the textual descriptions. Next, a text-aware query selection module is devised to select relevant context features as the decoder query. To retrieve the object from various scenes, we further design a cross-scale fusion module to fuse features from different levels for accurate target localization.

Installation

1.Clone the repository

git clone https://github.com/CAESAR-Radi/TACMT.git

2.Install PyTorch 1.9.1 and torchvision 0.10.1.

pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 torchaudio==0.9.1 -f https://download.pytorch.org/whl/torch_stable.html

3.Install the other dependencies

pip install pytorch-pretrained-bert
pip install rasterio

Dataset and Weights

You can download the dataset in three ways
You can download the dataset and checkpoint weight from Google Drive
You can download the dataset and checkpoint weight from Baidu Netdisk.
You can download the dataset and checkpoint weight from Google Drive.

Data Usage

This dataset is created based on data from the following sources:

Capella SAR Data
- Data source: "Capella SAR Dataset" by Capella Space.
- Licensed under CC BY 4.0 License: https://creativecommons.org/licenses/by/4.0/
GF-3 SAR Data
- Data source: "GF-3 SAR Dataset" by China Center for Resources Satellite Data and Application.
Iceye SAR Data
- Data source: "Iceye SAR Dataset" by Iceye.
- Licensed under CC BY-NC 4.0 License: https://creativecommons.org/licenses/by-nc/4.0/

Modifications

This dataset has been processed and annotated for this project.
Modifications include:
- Image Cropping: Images have been cropped to focus on relevant areas of interest.
- Image Stretching: Grayscale image pixel values have been stretched to the range 0-255 for better visualization.
- Data Annotation: Relevant features in the images have been annotated for visual grounding tasks.

Training

The following is an example of model training.

python -m torch.distributed.launch --nproc_per_node=2 --use_env train.py --config configs/SARVG_R101.py --world_size 2 --checkpoint_best --enable_batch_accum --batch_size 10 --freeze_epochs 10

Evaluation

Run the following script to evaluate the trained model with a single GPU.

python inference.py --config configs/SARVG_R101.py --checkpoint ./checkpoint_best_acc.pth --test_split val

Acknowledgement

Part of our code is based on the previous works VLTVG and RT-DETR, thanks for the authors.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
SARVG1.0		SARVG1.0
configs		configs
datasets		datasets
figures		figures
models		models
util		util
README.md		README.md
engine.py		engine.py
inference.py		inference.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TACMT

Introduction

Installation

Dataset and Weights

Data Usage

Modifications

Training

Evaluation

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

CAESAR-Radi/TACMT

Folders and files

Latest commit

History

Repository files navigation

TACMT

Introduction

Installation

Dataset and Weights

Data Usage

Modifications

Training

Evaluation

Acknowledgement

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages