Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
120 commits
Select commit Hold shift + click to select a range
4c9fb9e
online training components wip
Feb 22, 2025
6995cb9
wip
Feb 23, 2025
1a4f79a
Merge branch 'main' into online_training
Feb 23, 2025
8ca6d71
wip
Feb 23, 2025
d776eb5
Merge branch 'main' into online_training
Feb 23, 2025
d9f3bf0
wip changes
Feb 28, 2025
bcf05f4
wip
Feb 28, 2025
e915733
more
Mar 5, 2025
5e9dd9a
changes
Mar 10, 2025
9a39fca
changes
Mar 11, 2025
18f609d
changes
Mar 11, 2025
211dc92
optional weight update group since we dont need it for stationary mod…
Mar 11, 2025
e2569a8
changes
Mar 11, 2025
bf901b3
switch gsm8k rewards back to 0 and 1
Mar 12, 2025
500e28a
change batching logic in grpo loss
Mar 13, 2025
62d6c8e
metric formatters added for grpo
Mar 13, 2025
5bb5c06
more changes to math utils and rewards
Mar 18, 2025
d94ce26
allow original text in the example if needed
Mar 19, 2025
3bbbe68
microbatching wip
Mar 24, 2025
0c3f69f
microbatching and validation implemented
Mar 25, 2025
13e49d3
safetensor loading fix
Mar 26, 2025
33318b9
validation during training, reference model offloading, global actor…
Mar 28, 2025
3229a40
online dpo typo and small fixes
Apr 1, 2025
42f6ee5
formatting
Apr 1, 2025
7bd4c0d
_add_ directive removed
Apr 1, 2025
20e3ede
hf tokenizer support added
Apr 2, 2025
b4537ec
default fix
Apr 2, 2025
d8e45ec
Jacklanchantin/online training (#1101)
jacklanchantin Apr 3, 2025
e3996aa
grpo preset added
Apr 5, 2025
cdb4c24
type fix to allow none value
Apr 10, 2025
2cb30f8
flexible sampling params
Apr 17, 2025
163f776
logit entropy track
Apr 18, 2025
caa6541
force v0 engine with newer vllm versions
Apr 18, 2025
d1db955
make validation unit separately so that total examples counter is not…
Apr 24, 2025
6b3284e
Record rollouts in logs, update num examples, separate train/valid in…
jacklanchantin Apr 29, 2025
dd4ed22
validate before training (#1153)
jacklanchantin Apr 29, 2025
5fb86ab
Add athene rm (#1154)
jacklanchantin Apr 30, 2025
dd03a36
math-verify verifier added
Apr 30, 2025
cf5703b
loss zeroer log, use prompt batch size in loss normalizer
May 1, 2025
128dd3d
GRPO len norm support
May 1, 2025
0610d4e
Add validation sampling params & force sync when starting train unit …
jacklanchantin May 1, 2025
6ec348d
check if self._step_nr exists when syncing (#1160)
jacklanchantin May 2, 2025
aef5df4
group DPO added
May 6, 2025
edc8688
different normalization
May 6, 2025
fd2b939
normalizer fix
May 6, 2025
fba975f
only force sync if sync_vllm_model_every_n_steps>0
jacklanchantin May 6, 2025
1b50aad
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 6, 2025
33d9e00
only force sync if sync_vllm_model_every_n_steps>0 (#1169)
jacklanchantin May 8, 2025
869aa90
use simple grad acc
May 8, 2025
9744bb6
nll len norm flag
May 9, 2025
182ab2c
online dpo NLL len norm
May 9, 2025
3292773
typo on NLL len norm
May 9, 2025
45c8f8d
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 21, 2025
effbf55
Remove Skywork-RM (#1180)
jacklanchantin May 21, 2025
eaef36f
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 21, 2025
f86dc67
removing old math verification logic (#1181)
uralik May 21, 2025
b03a60d
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 22, 2025
04c4064
replicas support added to vllm models (#1184)
uralik May 23, 2025
1b77139
stateful rollout bag in online recipe, microbatching in GRPO (#1185)
uralik May 27, 2025
93be66b
Jacklanchantin/normalized rewards (#1161)
jacklanchantin May 28, 2025
d92feb2
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 28, 2025
aa7e775
Initial changes for pointwise GRM. (#1182)
swarnaHub May 28, 2025
5007c2c
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 28, 2025
ac40152
fix grad acc bug (#1191)
jacklanchantin May 29, 2025
8e8043c
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin May 30, 2025
9b0036e
average and length normalized scores over multiple judgments
Jun 2, 2025
9a85388
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
Jun 2, 2025
e986ade
Removing unused import
Jun 2, 2025
57abfbd
Remote HuggingFace (#1193)
jacklanchantin Jun 4, 2025
dda2dfd
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin Jun 4, 2025
8f1c96b
Remove reward grpo batching (#1196)
jacklanchantin Jun 4, 2025
0d699e4
Merge branch 'online_training' of github.com:facebookresearch/fairseq…
jacklanchantin Jun 4, 2025
f78110b
fix export keymap by reusing and reversing existing importing key map
Jun 8, 2025
d588b52
typing style
Jun 8, 2025
457eb36
sort
Jun 8, 2025
2a8f7f2
typing fix
Jun 8, 2025
c075248
hg config export
Jun 8, 2025
666d9e5
mypy
Jun 9, 2025
9e235b9
styling
Jun 9, 2025
1a7187a
mypy
Jun 9, 2025
845a8f5
mypy
Jun 9, 2025
581c972
mypy
Jun 9, 2025
f7e2297
mypy
Jun 9, 2025
7ea63c3
mypy
Jun 9, 2025
76be659
raise keyerror if keep_jsonl_keys not in data file (#1201)
jacklanchantin Jun 9, 2025
644d58e
mypy and refactor
Jun 9, 2025
4ee2a5b
mypy
Jun 9, 2025
a3c9aa4
mypy
Jun 9, 2025
4fc88d4
style
Jun 9, 2025
1d316b8
r1 distilled llama asset added
Jun 11, 2025
72cd66d
Merge branch 'main' into ot_merge
Jun 11, 2025
f0bae87
Merge branch 'kulikov/qwen_export_fix' into ot_merge
Jun 11, 2025
627e3f6
fixing
Jun 25, 2025
9bd4745
Merge branch 'main' into ot_merge
Jun 25, 2025
53f79c2
merge fix
Jun 25, 2025
f3f24a8
merge fix
Jun 25, 2025
598c3d2
refactoring
Jun 27, 2025
4bd9016
dpo refactoring
Jul 1, 2025
8dab75f
missing
Jul 1, 2025
2778dea
missing
Jul 1, 2025
f6a9fe4
remove old files
Jul 1, 2025
85cf9aa
more vllm engine args, qwen arch saver fix
Jul 2, 2025
df267e9
moving to vllm worker extension class rather than separate class, ena…
Jul 3, 2025
a7ffaa5
Support for pairwise judges in online training (#1194)
swarnaHub Jul 7, 2025
1cc84ac
run black formatter (#1225)
jacklanchantin Jul 8, 2025
14155b2
Add General Verifier for Math (#1224)
jacklanchantin Jul 8, 2025
c362f7d
Jacklanchantin/general verifier fix (#1227)
jacklanchantin Jul 16, 2025
cbdc0eb
Merge branch 'main' into ot_merge
Jul 22, 2025
af1e84e
v0.5 breaking change
Jul 22, 2025
69351e3
fix qwen parameter name converter, merge leak
Jul 23, 2025
11a8e5f
Jacklanchantin/fix online dpo rm args (#1233)
jacklanchantin Jul 25, 2025
d806bee
Jacklanchantin/add qwen25 3b instruct (#1234)
jacklanchantin Jul 25, 2025
b8bbbba
converter fetch on recipe level (#1236)
uralik Jul 28, 2025
1799948
restricting sampling in vllm with qwen models to avoid OOVs (#1237)
uralik Jul 29, 2025
dbf3a3e
Merge branch 'main' into online_training
Aug 8, 2025
221c83b
parallel vllm worker init, online dpo ref score fix when bs>1 and rew…
uralik Aug 27, 2025
606459b
Add truncated importance sampling and DrGRPO args (#1394)
jacklanchantin Oct 27, 2025
9c70654
set VLLM_ALLOW_INSECURE_SERIALIZATION=1 for newer vllm versions and a…
lydiadli Oct 29, 2025
34fd1b3
remove unused vllm library (Worker) (#1441)
jacklanchantin Nov 18, 2025
7d68329
do not throttle client-server port (#1452)
uralik Nov 19, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion src/fairseq2/assets/cards/models/llama.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -168,4 +168,4 @@ model_arch: llama3_1_8b
checkpoint: "hg://deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
tokenizer: "hg://deepseek-ai/DeepSeek-R1-Distill-Llama-8B"
tokenizer_family: llama
use_v2_tokenizer: true
use_v2_tokenizer: true
15 changes: 15 additions & 0 deletions src/fairseq2/assets/cards/models/qwen.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,21 @@
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

---

name: qwen25_3b_instruct
model_family: qwen
model_arch: qwen25_3b
model_config:
_set_:
max_seq_len: 32768
checkpoint: "hg://qwen/qwen2.5-3b-instruct"
tokenizer: "hg://qwen/qwen2.5-3b-instruct"
tokenizer_family: qwen
use_im_end: true

---

name: qwen25_7b
model_family: qwen
model_arch: qwen25_7b
Expand Down
15 changes: 15 additions & 0 deletions src/fairseq2/cli/_setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -62,11 +62,13 @@
CausalLMLossEvalConfig,
CausalLMTrainConfig,
InstructionFinetuneConfig,
OnlineFinetuneConfig,
POFinetuneConfig,
TextGenerateConfig,
load_clm_loss_evaluator,
load_clm_trainer,
load_instruction_finetuner,
load_online_finetuner,
load_po_finetuner,
load_text_generator,
)
Expand Down Expand Up @@ -235,6 +237,19 @@ def _register_clm_cli(cli: Cli) -> None:
help="generate text",
)

# Online Finetune
online_finetune_handler = RecipeCommandHandler(
loader=load_online_finetuner,
config_kls=OnlineFinetuneConfig,
default_preset="llama3_1_instruct",
)

group.add_command(
name="online_finetune",
handler=online_finetune_handler,
help="online-finetune a language model.",
)


def _register_convert_cli(cli: Cli) -> None:
group = cli.add_group("convert", help="model conversion utilities")
Expand Down
144 changes: 144 additions & 0 deletions src/fairseq2/data/text/tokenizers/huggingface_tokenizer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the BSD-style license found in the
# LICENSE file in the root directory of this source tree.

from __future__ import annotations

from collections.abc import Sequence
from pathlib import Path
from typing import final

import torch
from torch import Tensor
from transformers import AutoTokenizer
from typing_extensions import override

from fairseq2.data import VocabularyInfo
from fairseq2.data.text.tokenizers import (
TextTokenDecoder,
TextTokenEncoder,
)
from fairseq2.typing import Device


@final
class HuggingfaceTokenizerEncoder(TextTokenEncoder):
"""Represents a tiktoken decoder."""

_tokenizer: AutoTokenizer
_prefix_indices: list[int]
_suffix_indices: list[int]
_prefix_index_tensor: Tensor | None
_suffix_index_tensor: Tensor | None
_device: Device | None
_pin_memory: bool

def __init__(
self,
tokenizer: AutoTokenizer,
*,
prefix_tokens: Sequence[str] | None = None,
suffix_tokens: Sequence[str] | None = None,
device: Device | None = None,
pin_memory: bool = False,
) -> None:
"""
:param tokenizer:
The huggingface :class:`AutoTokenizer` object.
:param prefix_tokens:
The prefix tokens to encode with input text.
:param suffix_tokens:
The suffix tokens to encode with input text.
:param device:
The device on which to construct tensors.
:param pin_memory:
If ``True``, uses pinned memory while constructing tensors.
"""
self._tokenizer = tokenizer

# Prefix
if prefix_tokens:
self._prefix_indices = self._tokenizer.convert_tokens_to_ids(prefix_tokens)

self._prefix_index_tensor = torch.tensor(
self._prefix_indices, dtype=torch.int64, device=device
)
else:
self._prefix_indices = []

self._prefix_index_tensor = None

# Suffix
if suffix_tokens:
self._suffix_indices = self._tokenizer.convert_tokens_to_ids(suffix_tokens)

self._suffix_index_tensor = torch.tensor(
self._suffix_indices, dtype=torch.int64, device=device
)
else:
self._suffix_indices = []

self._suffix_index_tensor = None

self._device = device
self._pin_memory = pin_memory

@override
def __call__(self, text: str) -> Tensor:
# fairseq2 tokenizer adds special tokens on its own
indices = self._tokenizer.encode(text, add_special_tokens=False)

if self._prefix_indices:
indices = self._prefix_indices + indices

if self._suffix_indices:
indices.extend(self._suffix_indices)

return torch.tensor(
indices, dtype=torch.int64, device=self._device, pin_memory=self._pin_memory
)

@override
def encode_as_tokens(self, text: str) -> list[str]:
indices = self(text).tolist()

tokens = self._tokenizer.convert_tds_to_tokens(indices)

return tokens

@property
@override
def prefix_indices(self) -> Tensor | None:
return self._prefix_index_tensor

@property
@override
def suffix_indices(self) -> Tensor | None:
return self._suffix_index_tensor


@final
class HuggingfaceTokenizerDecoder(TextTokenDecoder):
"""Represents a tiktoken decoder."""

_tokenizer: AutoTokenizer

def __init__(self, tokenizer: AutoTokenizer) -> None:
self._tokenizer = tokenizer

@override
def __call__(self, token_indices: Tensor) -> str:
if token_indices.dim() != 1:
raise ValueError(
f"`token_indices` must be one dimensional, but has {token_indices.dim()} dimensions instead."
)

return self._tokenizer.decode(token_indices)

@override
def decode_from_tokens(self, tokens: Sequence[str]) -> str:
indices = self._tokenizer.convert_tokens_to_ids(tokens)

return self._tokenizer.decode(indices)
21 changes: 21 additions & 0 deletions src/fairseq2/data/text/tokenizers/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -203,6 +203,27 @@ def vocab_info(self) -> VocabularyInfo:


def load_llama_tokenizer(path: Path, card: AssetCard) -> TextTokenizer:

# first check if this is HuggingFace tokenizer
try:
use_hf = card.field("use_hf_tokenizer").as_(bool)
except AssetCardFieldNotFoundError:
use_hf = False
except AssetCardError as ex:
raise text_tokenizer_asset_card_error(card.name) from ex

if use_hf:
try:
return LLaMA3TokenizerHuggingFace(path)
except ValueError as ex:
raise TextTokenizerLoadError(
card.name, f"The '{card.name}' asset card does not contain a valid text tokenizer configuration of the '{LLAMA_TOKENIZER_FAMILY}' family. See the nested exception for details." # fmt: skip
) from ex
except RuntimeError as ex:
raise TextTokenizerLoadError(
card.name, f"The '{card.name}' text tokenizer cannot be loaded. See the nested exception for details." # fmt: skip
) from ex

try:
use_v2 = card.field("use_v2_tokenizer").as_(bool)
except AssetCardFieldNotFoundError:
Expand Down
Loading