Skip to content

Suggested functionality: Estimate by model_type #1

@Somerandomguy10111

Description

@Somerandomguy10111

First off: Great tool and saved me the headache of trying to trace the functions tokens myself.
A final touch could to introduce an option to have a token estimator class (tokenizer class?) which gets the model type as attribute and then uses the tiktoken.encoding_for_model() function to retrieve the encoding.

That way if openai ever changes the encoding or uses a different encoding for newer models the package can stay up to date.
On a side note what I think is also useful are following functions which you can use e.g. to prevent logging of huge inputs to the model

def get_string_tokens(self, the_str : str) -> int:
    return len(self.encode(the_str))


def get_limited_string(self, the_str : str, max_tokens : int) -> str:
    encoded_str = self.encode(the_str)
    return self.decode(encoded_str[:max_tokens])

Best
Somerandomguy10111

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions