Skip to content

Some problems when using RICE-ViT #8

@zhangym0213

Description

@zhangym0213

Hello @anxiangsir 🤗

I'm Yiming and study in NTU. Recently I’ve been working with RICE-ViT and trying to reproduce baseline built on Qwen2.5-7B-Instruct. I ran into a couple of questions and would really appreciate your help:

About reproducing ViT-L-14-336px results

I used rice-vit-large-patch14-560 and modify the crop_size and shortest_edge in preprocessor_config to 336, attempting to match the ViT-L-14-336px setup. Is this the correct way to reproduce the 336px version? If not, where can I find the checkpoint specifically trained for ViT-L-14-336px?

Which MLCDVisionModel to use

I noticed that there are two version of MLCDVisionModel,

For RICE-ViT, I used the version from Transformers.
Is this the correct choice?

Thanks a lot for your time! 🙏

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions