- [ ] Implement `compute_recall` - Currently this involves a separate forward pass of the model. We should be able to combine this with `evaluation_loss` - `model.generate` isn't working atm - [x] flash-attention implementation - [ ] Improve logging format: `log_predictions` - Can we make this interactive via [gradio](https://www.gradio.app/)? - [ ] Plot learning rate - [ ] Early Stopping - [ ] two 'yes' tokens
compute_recallevaluation_lossmodel.generateisn't working atmlog_predictions