-
Notifications
You must be signed in to change notification settings - Fork 60
Pull requests: meta-pytorch/torchft
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Super tiny fix typo
CLA Signed
This label is managed by the Meta Open Source bot.
#321
opened Mar 27, 2026 by
fzyzcjy
Loading…
Fix NCCL Error 7 by removing eager_connect_single_device
CLA Signed
This label is managed by the Meta Open Source bot.
#320
opened Mar 24, 2026 by
d4l3k
Member
Loading…
Cleanup deprecated usages of package monarch.utils in torchft
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
meta-exported
#315
opened Feb 24, 2026 by
dulinriley
Loading…
fbcode/torchft/torchft
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
meta-exported
#313
opened Feb 3, 2026 by
facebook-github-bot
Contributor
Loading…
fbcode/torchft/torchft
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
meta-exported
#311
opened Jan 31, 2026 by
facebook-github-bot
Contributor
Loading…
[WIP] Refactoring test cases and dependent files to be device agnostic instead of using cuda specific references
CLA Signed
This label is managed by the Meta Open Source bot.
#308
opened Jan 5, 2026 by
AnantGulati
•
Draft
Add device-agnostic process group wrappers for distributed setup initialization
CLA Signed
This label is managed by the Meta Open Source bot.
#307
opened Dec 31, 2025 by
AnantGulati
Loading…
Configure expected_replicas to avoid running tasks with unexpected re…
CLA Signed
This label is managed by the Meta Open Source bot.
#303
opened Dec 19, 2025 by
zhengchenyu
Contributor
Loading…
Use fully qualified domain name instead of hostname.
CLA Signed
This label is managed by the Meta Open Source bot.
#296
opened Dec 5, 2025 by
zhengchenyu
Contributor
Loading…
Fix inconsistent return types.
CLA Signed
This label is managed by the Meta Open Source bot.
#294
opened Nov 27, 2025 by
zhengchenyu
Contributor
Loading…
Keep the training data continuous and the total batch size constant regardless of changes in the replica world size.
CLA Signed
This label is managed by the Meta Open Source bot.
#292
opened Nov 18, 2025 by
zhengchenyu
Contributor
Loading…
integrate torchcomms
CLA Signed
This label is managed by the Meta Open Source bot.
fb-exported
meta-exported
#290
opened Nov 7, 2025 by
tushar00jain
Contributor
Loading…
Fixing the issue with indentation on the landing page
CLA Signed
This label is managed by the Meta Open Source bot.
#227
opened Jul 9, 2025 by
svekars
Contributor
Loading…
Add config sharing from Lighthouse with UI support (#130)
CLA Signed
This label is managed by the Meta Open Source bot.
#202
opened May 24, 2025 by
WarrenZhu050413
Contributor
•
Draft
Added example training scripts for localsgd, DiLoCo, Live Checkpoint Recovery, and proactive failure detection with DDP (#198)
CLA Signed
This label is managed by the Meta Open Source bot.
#200
opened May 22, 2025 by
WarrenZhu050413
Contributor
Loading…
ParallelProcessGroup: 200gbps with Gloo -- what if we just run like 20 of them in parallel???
CLA Signed
This label is managed by the Meta Open Source bot.
#199
opened May 21, 2025 by
d4l3k
Member
Loading…
Added proactive heartbeat timeout failure propagation (#164) (#188)
CLA Signed
This label is managed by the Meta Open Source bot.
#196
opened May 20, 2025 by
WarrenZhu050413
Contributor
Loading…
Support multiple quorums on a single LighthouseServer using gRPC metadata-based room assignment
CLA Signed
This label is managed by the Meta Open Source bot.
#189
opened May 5, 2025 by
MattKotzbauer
Contributor
Loading…
Disable async quorum for the first quorum sync
CLA Signed
This label is managed by the Meta Open Source bot.
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.