-
Notifications
You must be signed in to change notification settings - Fork 30
Benchmark GPT-tfjs #659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark GPT-tfjs #659
Conversation
|
thanks, very interesting! |
|
@tharvik I'd be curious to hear your opinion on a few things:
|
tharvik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superbe! that's very nice to have metrics on what's we're doing, thanks!
- Where do you think we should report the benchmark? I was thinking of all reporting them in this PR and linking it where relevant (e.g. in gpt/config.ts or the GPT class docstring)
the nicest thing would be to be able to generate such metrics via the CLI, with the example output of the cmd being what you've here (not the tables I mean, but the same content).
- Benchmarking performance needs modifying the gpt source code to keep track of memory, do you think it's worth keeping around or leave the benchmark on this branch and not merge it?
not merging it means that it'll slowly drift off. I think adding the memory usage to the EpochLogs as you did is the way to go. see my comments related to it.
|
The memory values needs to be slightly updated when #807 is merged |
|
up or down? ;) |
|
A very superficial benchmark showed 10-20% decrease in memory usage! |
Training
Benchmark on a 2022 MacBook Air M2 with 16GB of RAM.
To reproduce, check out 58f018f and run
npm -w cli run benchmark_gpt -- --contextLength 128 --batchSize 8for example.Time per token is obtained by measuring the time of 10 training update iterations and diving by (batch size * context length)
Memory values are the max memory allocated between the attention mechanism and the memory after computing the gradients. So far, the attention mechanism always had higher memory requirements. The actual peak memory allocated during training may be different but tfjs doesn't let us get this information easily.
I leave empty
-cells where I deemed the benchmark too slow to perform. If needed, missing values can be extrapolated.gpt-nano:gpt-nanobatch_size=8batch_size=16batch_size=32batch_size=64context_length=1280.33 GB
0.56 GB
1.12 GB
2.18 GB
context_length=2560.64 GB
1.22 GB
2.36 GB
4.66 GB
context_length=5121.42 GB
2.75 GB
5.42 GB
context_length=10243.56 GB
6.98 GB
context_length=204810.2 GB
gpt-micro:gpt-microbatch_size=8batch_size=16batch_size=32context_length=1280.6 GB
1 GB
1.86 GB
context_length=2561.1 GB
2 GB
3.8 GB
context_length=5122.3 GB
4.4 GB
context_length=10245.8 GB
gpt-mini:gpt-minibatch_size=8batch_size=16context_length=1281 GB
1.75 GB
context_length=2561.9 GB
3.5 GB
gpt2:gpt2batch_size=8context_length=1287.7 GB
context_length=25612.7 GB
Comparisons
Using the Python nanoGPT benchmark script on the same machine, I get the following comparisons between Python and JS:
gpt-nanogpt-tfjspython(nanoGPT repo)batch size=8andcontext_length=128batch size=32andcontext_length=512Inference
Run
npm -w cli run benchmark_gpt -- --inference --modelPath <path to trained model>For
gpt-nanotrained with context length 128, inference time averages between 6 and 8 ms/token.WebGPT reports 3 ms/token at 5M parameters, which is between gpt-nano (2.5M) and gpt-micro (7.2M). They also managed to scale up to 1.5B parameters on a M1 Mac with WebGPU.