Added zero token_type_ids if tokenizer output does not have them by GrayWizard12345 · Pull Request #17 · Den4ikAI/ruaccent

GrayWizard12345 · 2026-02-04T10:30:50Z

The ONNX model expects token_type_ids as an input parameter, but the CharTokenizer doesn't automatically include them when using return_tensors="np".

Fixed accent_model.py: Modified the put_accent method (lines 32-51) to add token_type_ids if they're missing from the tokenizer output. The fix adds zeros with the same shape as input_ids, which is the correct default value for single-sequence inputs.

Added .gitignore: Created a proper .gitignore file to prevent build artifacts like pycache from being committed to the repository.

Co-authored-by: GrayWizard12345 <24612767+GrayWizard12345@users.noreply.github.com>

…oken-type-ids Add missing token_type_ids to accent model inputs

Copilot

Pull request overview

This PR fixes a compatibility issue between the CharTokenizer and ONNX models by ensuring token_type_ids are present in tokenizer outputs, and adds standard Python build artifact exclusions to version control.

Changes:

Added automatic insertion of zero-valued token_type_ids when missing from CharTokenizer output in accent_model.py
Created .gitignore file with standard Python project exclusions

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated no comments.

File	Description
ruaccent/accent_model.py	Adds token_type_ids (zeros with same shape as input_ids) if missing from tokenizer output, fixing ONNX model compatibility
.gitignore	Adds standard Python project ignore patterns for build artifacts, virtual environments, IDE files, and OS-specific files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI and others added 4 commits February 4, 2026 10:21

Initial plan

a96881a

Fix accent model to add token_type_ids if missing

dc8c66f

Co-authored-by: GrayWizard12345 <24612767+GrayWizard12345@users.noreply.github.com>

Add .gitignore and remove __pycache__

c0fa056

Co-authored-by: GrayWizard12345 <24612767+GrayWizard12345@users.noreply.github.com>

Merge pull request #1 from GrayWizard12345/copilot/fix-accent-model-t…

4d005a5

…oken-type-ids Add missing token_type_ids to accent model inputs

Copilot AI review requested due to automatic review settings February 4, 2026 10:30

Copilot started reviewing on behalf of GrayWizard12345 February 4, 2026 10:31 View session

Copilot AI reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Added zero token_type_ids if tokenizer output does not have them #17

Added zero token_type_ids if tokenizer output does not have them #17
GrayWizard12345 wants to merge 4 commits intoDen4ikAI:mainfrom
GrayWizard12345:main

GrayWizard12345 commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

GrayWizard12345 commented Feb 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants