改進豆腐塊處理及維護檢測腳本#1285
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9d9a7c434e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2322f8b6b6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
7972543 to
dded0b5
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: dded0b5623
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| "../data/config/hk2sp.json", | ||
| "../data/config/hk2t.json", | ||
| "../data/config/jp2t.json", | ||
| "../data/config/s2hk.json", | ||
| "../data/config/s2hkp.json", |
There was a problem hiding this comment.
Register new configs in the npm CLI
Adding hk2sp.json and s2hkp.json to the packaged assets is not enough for the npm CLI to load them by mode name: node/cli.js::resolveConfigPath only treats entries present in BUILT_IN_CONFIGS as asset-relative, and otherwise resolves -c s2hkp/-c s2hkp.json against the user's current working directory. As a result, these newly documented/packaged configs work through native OpenCC but fail from the npm CLI unless the caller passes an absolute path; add them to the CLI's built-in config list/stem set alongside the existing configs.
Useful? React with 👍 / 👎.
| 'hk2sp_jieba.json', | ||
| 's2hk_jieba.json', | ||
| 's2hkp_jieba.json', |
There was a problem hiding this comment.
Register new Jieba configs in the npm CLI
These new plugin configs are copied into the opencc-jieba package, but the npm CLI only recognizes Jieba mode names listed in OPTIONAL_JIEBA_CONFIGS in node/cli.js; otherwise resolveConfigPath turns -c s2hkp_jieba or -c hk2sp_jieba.json into an absolute path under the caller's cwd, which bypasses resolveJiebaConfigPath and fails to load the packaged plugin config. Add the new Jieba config names to that CLI allowlist when packaging them here.
Useful? React with 👍 / 👎.
d0ccabc to
151c4b3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 151c4b32ce
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
18ddad9 to
318b9bd
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 318b9bd5c7
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
26dfb45 to
9cb624c
Compare
|
Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a5eab16af2
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| { "type": "ocd2", "file": "TSPhrases.ocd2" }, | ||
| { | ||
| "type": "ocd2", | ||
| "file": "TSPhrasesExt.ocd2", |
There was a problem hiding this comment.
Keep inline config compilation working
These new generated dictionary names are now present in standard configs, but scripts/compile_to_inline_config.py still only knows how to load source .txt files from data/dictionary (plus reversed dictionaries and the old TSCharactersExt special case). With this config, the documented command python3 scripts/compile_to_inline_config.py -c data/config/t2s.json -d data/dictionary ... now fails with Dictionary file not found for: TSPhrasesExt, so inline-config users cannot compile any of the updated *2s configs unless the script is taught to generate/load the Base/Ext dictionaries or these sources are made available.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9da2b67a3d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
- `opencc` raises in such case.
- Make public core attributes more explicit. - Prevent error when dumping a new dict.
- For easier test discovery using something like `python -m unittest discover scripts/`
- Key and values are now accessed like a dict. - Rework comments handling: - Comment blocks for an entry is accessed via `entry.block`. - Header and footer are now accessed via `Table.header` and `Table.footer`. - Support auto-deduplication. - Implement `dump(sort=bool)` and remove `Table.sort()`.
- Define tofu chars in a separated `AllowedSmpChars.txt` scheme file rather than modifying the source dict file.
- Separate tofu related files to `TSCharacters{Base,Ext}.txt` and `TSPhrases{Base,Ext}.txt`.
- Check for values only. - Declare allowed chars in `scheme/AllowedSmpChars.txt`.
- Move CJK compatibility table to `scheme/CJKCompatibilityIdeographs.txt` and reformat to be more compact. - Check only chars in entry values.
- Also improves performance.
改進 Python 維護腳本
和 #1264 類似,主要包括:
test_common.py,以符合 Python 慣例及方便以python -m unittest discover ...之類的方式執行測試。改進豆腐塊處理
不再於詞典檔使用
@tofu-risk標註。引進scheme/AllowedSmpChars.txt定義所有可能產生豆腐塊的字元及其替換字元,並透過腳本自動生成使用替換字元的STCharactersBase.txt及使用豆腐塊字元的STCharactersExt.txt。為因應未來可能有相關的詞彙,也用同樣方式生成
STPhrasesBase.txt及STPhrasesExt.txt。改進詞典檢查腳本
dictionary/*.cpp,改用scripts/test_dictionary.py檢測詞典規格一致性。scheme/AllowedSmpChars.txt定義,並改為只檢查輸出端。另外增加檢查輸入的 SMP 字元可能產生豆腐塊且可替換時,須加入相應的替換後詞項。{TS,ST}Characters.txt的 SMP 字元輸出檢查豁免,以便統一用scheme/AllowedSmpChars.txt檢查系統的字體相容性。scheme/CJKCompatibilityIdeographs.txt,並精簡格式,且只檢查輸出端。phrase_character_dependency測試增加檢查輸入輸出長度不符的情形。