Conversation
|
Awesome! I am taking a look today/tomorrow at this. |
|
@etredal While the cosine similarity is high, there seems to be a mode collapse issue with the replacement model. I'm going to add the attention mask, set |
|
Just did another practice run and got some clues for the repetitive text patterns. The replacement model has over 50% lower variance resulting in a flatter distribution which could explain the mode collapse. Recon loss is also high (1.27). I'm going to add more epochs, increase the learning rate, and look into more tweaks to reduce recon loss and preserve variance. I'll keep you updated. |

New features
num_featuresbased on model being used (num_layers*hidden_size)total_loss_per_featureto make a more fair comparison between models of different sizesAlso, I ran into some issues with the Poetry config file. I had to change up some of the syntax and version constraints to work on my Linux machine.
I've also attached the data from running DistilGPT2 below. Some of the metrics like sparsity loss aren't as useful though so I'm going to rerun it with the added loss metric that I've mentioned above.
run_distilgpt2.zip
Let me know if this is good and what other verifications for the CLT you had in mind.