Skip to content

Placeholder tokens update#44467

Open
itazap wants to merge 48 commits intomainfrom
placeholder_tokens_update
Open

Placeholder tokens update#44467
itazap wants to merge 48 commits intomainfrom
placeholder_tokens_update

Conversation

@itazap
Copy link
Collaborator

@itazap itazap commented Mar 5, 2026

Replace placeholder tokens as specified in added_tokens_decoder

if we have added_tokens_decoder with specific token_ids, we need to overwrite them in spm model !

example: [UNUSED_TOKEN_146] -> <|im_start|>

see internlm2: https://huggingface.co/internlm/internlm2_5-7b-chat/blob/main/tokenizer_config.json

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants