Skip to content
This repository was archived by the owner on Oct 7, 2022. It is now read-only.
This repository was archived by the owner on Oct 7, 2022. It is now read-only.

复现Roberta-Large和ELECTRA-Large的问题 #46

@yiyaxiaozhi

Description

@yiyaxiaozhi

我使用的环境是
pytorch 1.4.0
transformers 2.8.0
参照着文档https://github.com/thunlp/OpenMatch/blob/master/docs/experiments-msmarco.md 中的训练命令

CUDA_VISIBLE_DEVICES=0 \
python train.py\
        -task ranking \
        -model bert \
        -train ./data/train.jsonl \
        -max_input 3000000 \
        -save ./checkpoints/electra_large.bin \
        -dev queries=./data/queries.dev.small.tsv,docs=./data/collection.tsv,qrels=./data/qrels.dev.small.tsv,trec=./data/run.msmarco-passage.dev.small.100.trec \
        -qrels ./data/qrels.dev.small.tsv \
        -vocab google/electra-large-discriminator \
        -pretrain google/electra-large-discriminator \
        -res ./results/electra_large.trec \
        -metric mrr_cut_10 \
        -max_query_len 32 \
        -max_doc_len 256 \
        -epoch 1 \
        -batch_size 2 \
        -lr 5e-6 \
        -eval_every 10000

训练到global step 约18w local step约72w的时候,训练的验证MRR就会从0.33一直往下降,且整个训练过程结束后,最高的MRR只到了0.336.是什么地方遗漏了会导致这样的问题呢?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions