-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
It seems that cosine similarity was not used during training.These are some of the logits I obtained during repro training, and they are significantly greater than 1.
logits tensor([[15.2290, 7.5998, 9.8649, ..., 7.7345, 9.5413, 16.3422],
[ 7.4700, 15.4301, 6.9831, ..., 11.5478, 6.7609, 16.8512],
[10.8048, 9.7472, 11.9755, ..., 7.9214, 9.1346, 14.9441],
...,
[ 9.1644, 8.0641, 8.9382, ..., 9.6266, 11.3413, 16.5944],
[ 7.5931, 10.8228, 6.1488, ..., 15.4235, 6.2452, 16.2435],
[11.1124, 6.7673, 9.4720, ..., 7.7954, 14.2624, 16.0656]],
Metadata
Metadata
Assignees
Labels
No labels