Skip to content

33-Papers/ViT-Vision-Transformers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vision Transformer (ViT) Replication

This repository contains the code and resources for replicating the Vision Transformer (ViT) architecture, a deep learning model that has shown remarkable performance in computer vision tasks.

Introduction

The Vision Transformer (ViT) is a neural network architecture that applies the principles of the Transformer architecture, originally designed for natural language processing, to computer vision tasks. ViT has shown competitive performance on image classification tasks and is known for its simplicity and scalability.

This project aims to replicate the Vision Transformer (ViT) paper titled "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale" using PyTorch, providing a complete codebase to train and evaluate the model on standard image classification datasets.

Paper & Code Reference

You can access the paper using the following link: ViT Paper. This paper provides in-depth information about the ViT architecture, its applications, and experimental results.

The official GitHub repository for the Vision Transformer (ViT) implementation by Google Research can be found at: ViT GitHub Repository. This repository contains the source code, pre-trained models, and resources related to the Vision Transformer.

Citation

If you use this code or replicate the results, please consider citing the original paper:

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

Contribution

Contributions to this replication project are welcome. Whether you have suggestions, improvements, or new findings, please feel free to submit issues and pull requests. Collaborative efforts will help in achieving a successful replication.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages