Skip to content

Conversation

@ranlu
Copy link

@ranlu ranlu commented Oct 15, 2025

Run multiple iterations before backprop to reduce the traffic between distributed workers, right now the code does not work with size averaged loss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants