PyTorch implementation of Chaudhary et al. 2020's TopicBERT
Install conda if you have not already done so. Then run
conda env create -f environment.yml
This will create a Python environment that strictly adheres to the versioning indicated in the project proposal. It is intended to closely mirror Google Colab.
Then train the model via main.py. There are many options that can be set, run python main.py -h to see more.
One particularly helpful option is -s PATH or --save PATH, which saves the given options as a JSON file that
can easily be used again with --load PATH.
Sample config.json:
{
"dataset": "reuters8",
"label_path": ".../labels.txt",
"train_dataset_path": ".../training.tsv",
"val_dataset_path": ".../validation.tsv",
"test_dataset_path": ".../test.tsv",
"num_workers": 8,
"batch_size": 16,
"warmup_steps": 10,
"lr": 2e-05,
"alpha": 0.9,
"num_epochs": 2,
"clip": 1.0,
"seed": 42,
"device": "cuda",
"val_freq": 0.0,
"test_freq": 0.0,
"disable_tensorboard": false,
"tensorboard_dir": "runs/topicbert-512",
// directory where checkpoints should be
"resume": ".../checkpoints/",
// whether to look for a checkpoint in above or just save a new one there
"save_checkpoint_only": true,
"verbose": true,
"silent": false,
"load": null,
"save": "config.json"
}Alternatively, open experiment.ipynb in Google Colab:
- Have working BERT on some dataset (SST-2)
- Completed on 4/8/21, Liam
- Reuters8 Dataset & DataLoader set up
- Dataset & DataLoader done on 4/9/21, Liam
- BERT doing standalone prediction on Reuters8
- Done — achieves 99.5% train, 98.0% val accuracy run on Google Colab, 4/10/21, Liam
- Set up NVDM topic model on some dataset
- NVDM working on Reuters8
- Done — error behaves as expected when training, needs further analysis, 4/18/21, Liam
- Create joint model (TopicBERT)
- Coding complete, 4/19/21, Liam
- Achieve near baselines with TopicBERT
- We achieve 0.96 F1 score on Reuters8 with TopicBERT-512, outperforming the original paper marginally. See differences section for potental factors.
- Done, 4/19/21, Liam
- Move from Jupyter to Python modules
- All "modules" converted, 4/25/21, Liam.
trainingpackage andmain.pycomplete, 4/26/21, Liam.
- Measure performance baselines
- All baselines finalized, 5/3/21, Liam.
Happy to report that the model has performance (runtime & accuracy) characteristics as expected!
Non-modification Extensions Pursued:
- Pre-train VAE.
- Implemented HR-VAE as comptatible model with TopicBERT. Currently have ability for the TopicBERT main script to pre-train an HR-VAE model on a dataset. 5/8/21, Liam.
More Extension Ideas:
- Test new datasets in topic classification
- Test datasets in a different domain (e.g. NLI, GLUE)
This section maintains a (non-definitive) list of differences between the original implementation and this repository's code.
-
F_MINset to10on Reuters8 dataset yields a vocab size ofK = 4832rather thanK = 4813reported in the original paper, despite following the same text-cleaning guidelines. We assume this will not significantly affect results. -
F_MINset to100on the IMDB dataset yields a vocab size ofK = 7358rather thanK = 6823reported in the original paper, despite following the same text-cleaning guidelines. We assume this will not significantly affect results. - We use a size 1k validation set for IMDB (24k train), whereas the originaal authors used a 5k validation set.
- The original authors use
bert-base-cased. As all data is lowercased across datasets in the original experiments, we change this tobert-base-uncased. - Labels are encoded one-hot. We use
torch.max(...)[1]to extract prediction & label indices. These indices can be converted back and forth with label strings viadataset.label_mapping[index]anddataset.label_mapping[label_str]. - NVDM in the original paper uses
tanhactivation for multiliayer perceptron in NVDM. However, the author's TensorFlow implementation usessigmoid. We useGELU, as the NVDM paper (Miao et al. 2016) uses this as well. - TopicBERT as described in the paper has a projection layer consisting of a single matrix
$\mathbf{P} \in \mathbf{R}^{\hat{H} \times H_B}$ . We addGELUactivation after$\mathbf{P}$ . The original author's TensorFlow implementation utilizes atf.keras.layers.Denselayer, which adds a bias vector andGELUactivation after$\mathbf{P}$ .