Vision Transformer Model Enhancements

After merging #80 more optimization can possibly be applied:

- Evaluate if the attn_mask (mask out padded inputs and cls token) is necessary for training
  - If yes, then try to use https://pytorch.org/blog/flexattention/ 
  - If no, remove the mask during training and benefit from flash attention
- Evaluate if we can change the feed-forward dimension back to 512 (like in torch.v1)
- Try to implement `torch.compile` for deployment (probably not working due to variable input shapes) and for preprocessing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision Transformer Model Enhancements #82

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vision Transformer Model Enhancements #82

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions