Vision Transformer (ViT) Replication

This repository contains the code and resources for replicating the Vision Transformer (ViT) architecture, a deep learning model that has shown remarkable performance in computer vision tasks.

Introduction

The Vision Transformer (ViT) is a neural network architecture that applies the principles of the Transformer architecture, originally designed for natural language processing, to computer vision tasks. ViT has shown competitive performance on image classification tasks and is known for its simplicity and scalability.

This project aims to replicate the Vision Transformer (ViT) paper titled "An Image Is Worth 16x16 Words: Transformers for Image Recognition at Scale" using PyTorch, providing a complete codebase to train and evaluate the model on standard image classification datasets.

Paper & Code Reference

You can access the paper using the following link: ViT Paper. This paper provides in-depth information about the ViT architecture, its applications, and experimental results.

The official GitHub repository for the Vision Transformer (ViT) implementation by Google Research can be found at: ViT GitHub Repository. This repository contains the source code, pre-trained models, and resources related to the Vision Transformer.

Citation

If you use this code or replicate the results, please consider citing the original paper:

@article{dosovitskiy2020vit,
  title={An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale},
  author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and  Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and Uszkoreit, Jakob and Houlsby, Neil},
  journal={ICLR},
  year={2021}
}

Contribution

Contributions to this replication project are welcome. Whether you have suggestions, improvements, or new findings, please feel free to submit issues and pull requests. Collaborative efforts will help in achieving a successful replication.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.idea		.idea
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
multi_head_self_attention.py		multi_head_self_attention.py
patch_embeddings.py		patch_embeddings.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Vision Transformer (ViT) Replication

Introduction

Paper & Code Reference

Citation

Contribution

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Vision Transformer (ViT) Replication

Introduction

Paper & Code Reference

Citation

Contribution

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages