fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3555

Jaffe2718 · 2025-12-04T11:32:40Z

Description

This PR addresses the tokenizer index out-of-bounds issue when using custom Whisper models with modified vocabulary sizes, as reported in #3392.

The problem occurs when converting models like efwkjn/whisper-ja-anime-v0.1, efwkjn/whisper-ja-anime-v0.2, and efwkjn/whisper-ja-anime-v0.3 to ggml format using convert-h5-to-ggml.py and running them with whisper.cpp. These models have a vocabulary size of 20480 (including special tokens), which differs from the official Whisper models (51864 for monolingual, 51865 for multilingual). The hardcoded special token IDs in whisper.cpp cause index out-of-bounds errors when using these custom models.

Solution

The solution dynamically calculates special token IDs based on the actual vocabulary size and structure, instead of using hardcoded values:

After loading the vocabulary and establishing id-to-token mappings, we determine:
- vocab.n_vocab: The total size of the embedding layer
- common_vocab_size: The number of regular (non-special) tokens (size_t common_vocab_size = vocab.token_to_id.size())
Following OpenAI's Whisper token arrangement principles (special tokens are placed consecutively after regular tokens), we calculate the ranges:
- [0, common_vocab_size): Regular tokens
- common_vocab_size: <|endoftext|>
- common_vocab_size + 1: <|startoftranscript|>
- [common_vocab_size + 2, emb_size - 1507): Language mark tokens
- emb_size - 1507: <|translate|>
- emb_size - 1506: <|transcribe|>
- emb_size - 1505: <|startoflm|>
- emb_size - 1504: <|startofprev|>
- emb_size - 1503: <|nospeech|>
- emb_size - 1502: <|notimestamps|>
- [emb_size - 1501, emb_size): Timestamp tokens (1501 tokens from <|0.00|> to <|30.00|>)
The total number of non-language special tokens is 1509 (1501 timestamps + 8 other special tokens).
The number of language tokens is calculated as: vocab.n_vocab - common_vocab_size - 1509

This approach dynamically adapts to different vocabulary sizes and maintains compatibility with both official and custom Whisper models.

Fix #3392

…nt sizes

Jaffe2718 · 2025-12-04T11:43:48Z

src/whisper.cpp

    int num_languages() const {
-        return n_vocab - 51765 - (is_multilingual() ? 1 : 0);
+        return n_vocab - token_to_id.size() - 1509;
    }


Modified num_languages() Function

The num_languages() function has been redesigned to dynamically calculate the number of supported languages, replacing the original hardcoded logic that relied on fixed vocabulary size thresholds (e.g., 51865 for multilingual models).

Rationale (Aligned with OpenAI Whisper's Tokenizer Design)

Per OpenAI’s official Whisper tokenizer implementation (tokenizer.py#L340-L351):

Language-specific special tokens (e.g., <|ja|>, <|en|>) are consecutively arranged between the <|startoftranscript|> and <|translate|> tokens in the vocabulary.

The total number of non-language special tokens is fixed at 1509 (1501 timestamp tokens + 8 core functional tokens: <|endoftext|>, <|startoftranscript|>, <|translate|>, <|transcribe|>, <|startoflm|>, <|startofprev|>, <|nospeech|>, <|notimestamps|>).

I think the method which caculates the token ids between <|startoftext|> and <|translate|> is better, because the sizes of token_to_id and id_to_token will change after loading special token in whisper.cpp#L1641-L1672

Jaffe2718 · 2025-12-06T12:48:55Z

models/convert-h5-to-ggml.py

+if "<|endoftext|>" in tokens:
+    del tokens["<|endoftext|>"]


When I use ggml-tiny.en.bin recognition, the result is also empty, the same reason why the tests fail

If the last commit was Migrate from HG dataset into HG model, it is necessary that these models need to be reconverted if they were last generated with this script, otherwise, <|endoftext|> will be written into common tokens.

Just like the test model, the GGML models that are actually converted from OpenAI's official model should not record special tokens in the vocabulary, otherwise the ID of the subsequent special token will be positioned incorrectly.

Jaffe2718 · 2025-12-07T14:44:28Z

It seems that whisper.cpp the en version of the official test model (tensorless empty model) writes <|endoftext|> to vocab, which causes my modified code to fail the test. The code used for my submission does not rely on the embedding layer size to hardcode whether it is a multilingual model or not, so if you write a special token that occupies the vocabulary of a normal token, it will cause the special token to be positioned incorrectly

audio_path: whisper.cpp/samples/jfk.wav
audio_len: 176017 (samples)

whisper_init_from_file_with_params_no_state: loading model from 'whisper.cpp/models/for-tests-ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700H)
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =    77.11 MB
whisper_model_load: model size    =    0.00 MB
whisper_model_load: WARN no tensors loaded from model file - assuming empty model for testing
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.21 MB
whisper_init_state: compute buffer (encode) =   17.72 MB
whisper_init_state: compute buffer (cross)  =    3.89 MB
whisper_init_state: compute buffer (decode) =   95.91 MB

init model from whisper.cpp/models/for-tests-ggml-tiny.bin
vocab info:
  n_vocab: 51865
  token_eot_id: 50257
  token_sot_id: 50258
  token_translate_id: 50358
  token_transcribe_id: 50359
  token_solm_id: 50360
  token_prev_id: 50361
  token_nosp_id: 50362
  token_not_id: 50363
  token_beg_id: 50364
audio_samples: 176017
whisper_full.return: 0
n_segments: 0
whisper_init_from_file_with_params_no_state: loading model from 'whisper.cpp/models/for-tests-ggml-tiny.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 98
whisper_model_load:          CPU total size =    77.11 MB
whisper_model_load: model size    =    0.00 MB
whisper_model_load: WARN no tensors loaded from model file - assuming empty model for testing
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.21 MB
whisper_init_state: compute buffer (encode) =   17.72 MB
whisper_init_state: compute buffer (cross)  =    3.89 MB
whisper_init_state: compute buffer (decode) =   95.91 MB

init model from whisper.cpp/models/for-tests-ggml-tiny.en.bin
vocab info:
  n_vocab: 51864
  token_eot_id: 50257
  token_sot_id: 50258
  token_translate_id: 50357
  token_transcribe_id: 50358
  token_solm_id: 50359
  token_prev_id: 50360
  token_nosp_id: 50361
  token_not_id: 50362
  token_beg_id: 50363
audio_samples: 176017
whisper_full.return: 0
n_segments: 0

Jaffe2718 · 2025-12-07T14:50:03Z

So you may need to regenerate the empty model used in the test. The newly generated test model compared to the old one guarantees that special tokens have been excluded from the normal vocabulary.

import os
import base64
import struct
import numpy as np

# ggml magic number
GGML_FILE_MAGIC = 0x67676d6c  # "ggml"

# Hyperparameter settings (configuration using tiny.en model)
class HyperParams:
    def __init__(self,
                 n_vocab=51865,
                 n_audio_ctx=1500,
                 n_audio_state=384,
                 n_audio_head=6,
                 n_audio_layer=4,
                 n_text_ctx=448,
                 n_text_state=384,
                 n_text_head=6,
                 n_text_layer=4,
                 n_mels=80):
        self.n_vocab = n_vocab
        self.n_audio_ctx = n_audio_ctx
        self.n_audio_state = n_audio_state
        self.n_audio_head = n_audio_head
        self.n_audio_layer = n_audio_layer
        self.n_text_ctx = n_text_ctx
        self.n_text_state = n_text_state
        self.n_text_head = n_text_head
        self.n_text_layer = n_text_layer
        self.n_mels = n_mels
        self.ftype = True   # True: fp16, False: fp32

def write_ggml_metadata(fout, hparams):
    # write magic number
    fout.write(struct.pack("i", GGML_FILE_MAGIC))
    
    # write hyperparameters
    fout.write(struct.pack("i", hparams.n_vocab))
    fout.write(struct.pack("i", hparams.n_audio_ctx))
    fout.write(struct.pack("i", hparams.n_audio_state))
    fout.write(struct.pack("i", hparams.n_audio_head))
    fout.write(struct.pack("i", hparams.n_audio_layer))
    fout.write(struct.pack("i", hparams.n_text_ctx))
    fout.write(struct.pack("i", hparams.n_text_state))
    fout.write(struct.pack("i", hparams.n_text_head))
    fout.write(struct.pack("i", hparams.n_text_layer))
    fout.write(struct.pack("i", hparams.n_mels))
    fout.write(struct.pack("i", hparams.ftype))

def write_mel_filters(fout, hparams, mel_filters_path):
    print("loading real Mel filter data...")
    # load the Mel filter from the npz file
    with np.load(mel_filters_path) as f:
        filters = f[f"mel_{hparams.n_mels}"]
    fout.write(struct.pack("i", filters.shape[0]))
    fout.write(struct.pack("i", filters.shape[1]))
    for i in range(filters.shape[0]):
        for j in range(filters.shape[1]):
            fout.write(struct.pack("f", filters[i][j]))

def write_tokenizer(fout, tokenizer_path):
    with open(tokenizer_path, "rb") as f:
        contents = f.read()
        tokens = {base64.b64decode(token): int(rank) for token, rank in (line.split() for line in contents.splitlines() if line)}
    # write size of tokenizer
    fout.write(struct.pack("i", len(tokens)))

    # write vocabulary
    for key in tokens:
        fout.write(struct.pack("i", len(key)))
        fout.write(key)

def generate_empty_model(filename, hparams):
    print(f"generate empty model file: {filename}")
    with open(filename, "wb") as f:
        write_ggml_metadata(f, hparams)
        write_mel_filters(f, hparams, "whisper/whisper/assets/mel_filters.npz")
        write_tokenizer(f, f"whisper/whisper/assets/{'gpt2' if hparams.n_vocab < 51865 else 'multilingual'}.tiktoken")
        # ignore the rest of the model

if __name__ == "__main__":
    os.system("git clone https://github.com/openai/whisper.git")
    os.makedirs("empty_models", exist_ok=True)
    
    # Base models
    generate_empty_model("empty_models/for-tests-ggml-base.bin", HyperParams(
        n_vocab=51865, n_audio_state=512, n_audio_head=8, n_audio_layer=6,
        n_text_state=512, n_text_head=8, n_text_layer=6
    ))
    generate_empty_model("empty_models/for-tests-ggml-base.en.bin", HyperParams(
        n_vocab=51864, n_audio_state=512, n_audio_head=8, n_audio_layer=6,
        n_text_state=512, n_text_head=8, n_text_layer=6
    ))
    
    # Small models
    generate_empty_model("empty_models/for-tests-ggml-small.bin", HyperParams(
        n_vocab=51865, n_audio_state=768, n_audio_head=12, n_audio_layer=12,
        n_text_state=768, n_text_head=12, n_text_layer=12
    ))
    generate_empty_model("empty_models/for-tests-ggml-small.en.bin", HyperParams(
        n_vocab=51864, n_audio_state=768, n_audio_head=12, n_audio_layer=12,
        n_text_state=768, n_text_head=12, n_text_layer=12
    ))
    
    # Medium models
    generate_empty_model("empty_models/for-tests-ggml-medium.bin", HyperParams(
        n_vocab=51865, n_audio_state=1024, n_audio_head=16, n_audio_layer=24,
        n_text_state=1024, n_text_head=16, n_text_layer=24
    ))
    generate_empty_model("empty_models/for-tests-ggml-medium.en.bin", HyperParams(
        n_vocab=51864, n_audio_state=1024, n_audio_head=16, n_audio_layer=24,
        n_text_state=1024, n_text_head=16, n_text_layer=24
    ))
    
    # Large models
    generate_empty_model("empty_models/for-tests-ggml-large.bin", HyperParams(
        n_vocab=51865, n_audio_state=1280, n_audio_head=20, n_audio_layer=32,
        n_text_state=1280, n_text_head=20, n_text_layer=32
    ))
    generate_empty_model("empty_models/for-tests-ggml-large-v3.bin", HyperParams(    # add <|yue|>
        n_vocab=51866, n_audio_state=1280, n_audio_head=20, n_audio_layer=32,
        n_text_state=1280, n_text_head=20, n_text_layer=32
    ))
        
    # Tiny models
    generate_empty_model("empty_models/for-tests-ggml-tiny.bin", HyperParams(n_vocab=51865))
    generate_empty_model("empty_models/for-tests-ggml-tiny.en.bin", HyperParams(n_vocab=51864))
    
    # Turbo model (based on large-v3 with optimizations)
    generate_empty_model("empty_models/for-tests-ggml-turbo.bin", HyperParams(    # add <|yue|>
        n_vocab=51866, n_audio_state=1280, n_audio_head=20, n_audio_layer=32,
        n_text_state=1280, n_text_head=20, n_text_layer=32
    ))

audio_path: whisper.cpp/samples/jfk.wav
audio_len: 176017 (samples)

whisper_init_from_file_with_params_no_state: loading model from 'empty_models/for-tests-ggml-tiny.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
register_backend: registered backend CPU (1 devices)
register_device: registered device CPU (12th Gen Intel(R) Core(TM) i7-12700H)
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =    77.11 MB
whisper_model_load: model size    =    0.00 MB
whisper_model_load: WARN no tensors loaded from model file - assuming empty model for testing
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.21 MB
whisper_init_state: compute buffer (encode) =   17.72 MB
whisper_init_state: compute buffer (cross)  =    3.89 MB
whisper_init_state: compute buffer (decode) =   95.91 MB

init model from empty_models/for-tests-ggml-tiny.bin
vocab info:
  n_vocab: 51865
  token_eot_id: 50257
  token_sot_id: 50258
  token_translate_id: 50358
  token_transcribe_id: 50359
  token_solm_id: 50360
  token_prev_id: 50361
  token_nosp_id: 50362
  token_not_id: 50363
  token_beg_id: 50364
audio_samples: 176017
whisper_full.return: 0
n_segments: 0
whisper_init_from_file_with_params_no_state: loading model from 'empty_models/for-tests-ggml-tiny.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
whisper_init_with_params_no_state: devices    = 1
whisper_init_with_params_no_state: backends   = 1
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1 (tiny)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:          CPU total size =    77.11 MB
whisper_model_load: model size    =    0.00 MB
whisper_model_load: WARN no tensors loaded from model file - assuming empty model for testing
whisper_backend_init_gpu: device 0: CPU (type: 0)
whisper_backend_init_gpu: no GPU found
whisper_init_state: kv self size  =    3.15 MB
whisper_init_state: kv cross size =    9.44 MB
whisper_init_state: kv pad  size  =    2.36 MB
whisper_init_state: compute buffer (conv)   =   13.21 MB
whisper_init_state: compute buffer (encode) =   17.72 MB
whisper_init_state: compute buffer (cross)  =    3.89 MB
whisper_init_state: compute buffer (decode) =   95.91 MB

init model from empty_models/for-tests-ggml-tiny.en.bin
vocab info: 
  n_vocab: 51864
  token_eot_id: 50256    # correct
  token_sot_id: 50257
  token_translate_id: 50357
  token_transcribe_id: 50358
  token_solm_id: 50359
  token_prev_id: 50360
  token_nosp_id: 50361
  token_not_id: 50362
  token_beg_id: 50363
audio_samples: 176017
whisper_full.return: 0
n_segments: 0

Jaffe2718 · 2025-12-07T15:04:08Z

Theoretically, the newly converted test model and the GGML model that removes the special token are compatible with the old code, because the old code still relies on hard coding to locate the ID of the special token. However, the code I submitted is based on the number of ordinary tokens in vocabulary and the size of the embedding layer to calculate the ID of the special token, if the ggml model contains special tokens in the vocabulary, then the IDs of <|endoftext|> and <|startoftranscript|> will be calculated incorrectly.

Jaffe2718 · 2025-12-07T15:06:23Z

src/whisper.cpp

-            vocab.token_beg        += dt;
-        }
+        vocab.token_eot        = n_vocab;                  // <|endoftext|>   50256 for en, 50257 for multilingual, others for custom model
+        vocab.token_sot        = n_vocab + 1;              // <|startoftext|>


Sorry, I make a mistake in comment. It should be <|startoftranscribe|>.

Avoid hard-coding definition vocabulary to be compatible with differe…

79b0c01

…nt sizes

Jaffe2718 commented Dec 4, 2025

View reviewed changes

Jaffe2718 changed the title ~~Avoid hard-coding definition vocabulary to be compatible with costom model~~ Avoid hard-coding definition vocabulary to be compatible with custom model Dec 4, 2025

modify comment

43d873c

Jaffe2718 changed the title ~~Avoid hard-coding definition vocabulary to be compatible with custom model~~ fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3392 Dec 5, 2025

Jaffe2718 marked this pull request as draft December 6, 2025 10:44

Jaffe2718 added 2 commits December 6, 2025 20:19

fix num_language(): incorrect after loading special tokens

ccd9b6e

fix convert script: remove special token <|endoftext|>

1ab1804

Jaffe2718 commented Dec 6, 2025

View reviewed changes

Jaffe2718 marked this pull request as ready for review December 6, 2025 12:56

Jaffe2718 commented Dec 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3555

fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3555

Jaffe2718 commented Dec 4, 2025 •

edited

Loading

Uh oh!

Jaffe2718 Dec 4, 2025

Uh oh!

Jaffe2718 Dec 6, 2025

Uh oh!

Jaffe2718 Dec 6, 2025

Uh oh!

Jaffe2718 Dec 6, 2025

Uh oh!

Jaffe2718 Dec 7, 2025

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 Dec 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3555

Are you sure you want to change the base?

fix: eliminate hard-coded vocab definitions to make the Whisper model compatible with custom vocabularies and embedding layer lengths #3555

Conversation

Jaffe2718 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Solution

Uh oh!

Jaffe2718 Dec 4, 2025

Choose a reason for hiding this comment

Modified num_languages() Function

Rationale (Aligned with OpenAI Whisper's Tokenizer Design)

Uh oh!

Jaffe2718 Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Jaffe2718 Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Jaffe2718 Dec 6, 2025

Choose a reason for hiding this comment

Uh oh!

Jaffe2718 Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 commented Dec 7, 2025

Uh oh!

Jaffe2718 Dec 7, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Jaffe2718 commented Dec 4, 2025 •

edited

Loading

Modified `num_languages()` Function