[pull] master from BYVoid:master#152
Merged
Merged
Conversation
…inline format (#1300) This script merges configuration files and text dictionaries on-the-fly. The output is formatted as JSONC (with header comments for version, compile time and source info), which is slightly different from the original pure JSON files but fully supported by OpenCC. The purpose of the output of this script is not to replace the config format; rather, as a way to compare changes to dictionary files.
…oss all test suites (#1301) As the test suite grows, test case IDs alone are often insufficient to convey the intent or context behind individual cases. This change adds JSONC support across all test suite parsers (C++, Node.js, and Python) so that contributors can annotate test cases inline — for example, explaining why a particular input/output pair exists, or flagging non-obvious edge cases. Trailing comma support is included as a minor convenience: when appending a new entry to the cases array, the last existing entry does not need to be modified just to add a comma, keeping diffs minimal and focused. This is a non-breaking, infrastructure-only change. No existing test case data is modified.
* Add Taiwan medical phrase conversions (s2twp / tw2sp) Add cross-strait medical vocabulary differences that character-level conversion cannot handle, so s2twp lands on Taiwan usage and tw2sp converts back. Covers blood tests, hepatitis A-E, neurology/psychiatry, cardiology, imaging, drugs, and syndrome terms. - TWPhrases.txt: 70 forward entries (keys are post-s2t standard traditional forms, e.g. 白細胞->白血球, 乙肝->B肝, 阻滯劑->阻斷劑, 他汀類->史他汀類, 代謝綜合徵->代謝症候群) - TWPhrasesRev.txt: 61 reverse entries - STPhrases.txt: 19 whole-word segmentation entries so compound terms (综合征-suffixed syndromes, 计算机断层) survive segmentation before the Taiwan vocabulary stage - testcases.json: 10 consolidated s2twp / tw2sp cases Conventions: - Abbreviation<->abbreviation, full<->full (乙肝<->B肝, 乙型肝炎<->B型肝炎). - tw2sp keeps the full form rather than emitting an abbreviation (心房顫動 stays, not 房顫) via self-mappings; 心肌梗塞 reverses to the common mainland 心肌梗死. - Ultrasound: 超聲波/B超 -> 超音波; tw2sp 超音波 -> 超声波 (the general term), avoiding over-conversion of 超音波清洗機 etc. - Multiple Taiwan variants are accepted on tw2sp where both are in use (妥瑞氏症/妥瑞症, 馬凡氏症候群/馬凡氏症, 阿莫西林/安莫西林). - Pharmacology terms scoped to category-level (阻滯劑, 他汀類) to avoid corrupting individual drug names. Reverse mapping and Taiwan phrase segmentation invariants verified.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )