Korean TN fixes: cardinal, decimal, fraction, date#374
Korean TN fixes: cardinal, decimal, fraction, date#374tbartley94 merged 11 commits intoNVIDIA:ko_tn_staging_v1from
Conversation
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
…zation Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
|
This PR was closed because it has been inactive for 7 days since being marked as stale. |
nemo_text_processing/text_normalization/ko/taggers/electronic.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/telephone.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/telephone.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/taggers/tokenize_and_classify.py
Outdated
Show resolved
Hide resolved
|
@bbae0312 Can you confirm tests passing (sparrowhawk and unit)? |
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
|
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
|
Both unit tests and Sparrowhawk tests are passing locally. |
tbartley94
left a comment
There was a problem hiding this comment.
naming and readability issues.
post processing.py is completely unnecessary
nemo_text_processing/text_normalization/ko/taggers/electronic.py
Outdated
Show resolved
Hide resolved
|
|
||
| cc16_grouped = four + sep_del + four + sep_del + four + sep_del + four | ||
| sep_to_space = pynutil.delete(sep_token) + insert_space | ||
| cc16_grouped = four + sep_to_space + four + sep_to_space + four + sep_to_space + four |
There was a problem hiding this comment.
Got it, I’ll simplify with **3. Thanks!
nemo_text_processing/text_normalization/ko/taggers/tokenize_and_classify.py
Outdated
Show resolved
Hide resolved
nemo_text_processing/text_normalization/ko/verbalizers/post_processing.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
nemo_text_processing/text_normalization/ko/taggers/tokenize_and_classify.py
Outdated
Show resolved
Hide resolved
…ation Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
for more information, see https://pre-commit.ci
nemo_text_processing/text_normalization/ko/verbalizers/post_processing.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Jinwoo Bae <34386414+bbae0312@users.noreply.github.com>
What does this PR do ?
Add fixes and improvements for Korean TN: cardinal, decimal, ordinal, fraction, date, and post-processing.
Before your PR is "Ready for review"
Pre checks:
git commit -sto sign.pytestor (if your machine does not have GPU)pytest --cpufrom the root folder (given you marked your test cases accordingly@pytest.mark.run_only_on('CPU')).bash tools/text_processing_deployment/export_grammars.sh --MODE=test ...pytestand Sparrowhawk here.__init__.pyfor every folder and subfolder, includingdatafolder which has .TSV files?Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.to all newly added Python files?Copyright 2015 and onwards Google, Inc.. See an example here.try import: ... except: ...) if not already done.PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.