Skip to content

Comments

Added zero token_type_ids if tokenizer output does not have them #17

Open
GrayWizard12345 wants to merge 4 commits intoDen4ikAI:mainfrom
GrayWizard12345:main
Open

Added zero token_type_ids if tokenizer output does not have them #17
GrayWizard12345 wants to merge 4 commits intoDen4ikAI:mainfrom
GrayWizard12345:main

Conversation

@GrayWizard12345
Copy link

The ONNX model expects token_type_ids as an input parameter, but the CharTokenizer doesn't automatically include them when using return_tensors="np".

Fixed accent_model.py: Modified the put_accent method (lines 32-51) to add token_type_ids if they're missing from the tokenizer output. The fix adds zeros with the same shape as input_ids, which is the correct default value for single-sequence inputs.

Added .gitignore: Created a proper .gitignore file to prevent build artifacts like pycache from being committed to the repository.

Copilot AI and others added 4 commits February 4, 2026 10:21
Co-authored-by: GrayWizard12345 <24612767+GrayWizard12345@users.noreply.github.com>
Co-authored-by: GrayWizard12345 <24612767+GrayWizard12345@users.noreply.github.com>
…oken-type-ids

Add missing token_type_ids to accent model inputs
Copilot AI review requested due to automatic review settings February 4, 2026 10:30
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a compatibility issue between the CharTokenizer and ONNX models by ensuring token_type_ids are present in tokenizer outputs, and adds standard Python build artifact exclusions to version control.

Changes:

  • Added automatic insertion of zero-valued token_type_ids when missing from CharTokenizer output in accent_model.py
  • Created .gitignore file with standard Python project exclusions

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File Description
ruaccent/accent_model.py Adds token_type_ids (zeros with same shape as input_ids) if missing from tokenizer output, fixing ONNX model compatibility
.gitignore Adds standard Python project ignore patterns for build artifacts, virtual environments, IDE files, and OS-specific files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants