Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 9 additions & 4 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,10 +42,9 @@ OpenCC 以 [Apache License 2.0](LICENSE) 釋出。提交 Pull Request、issue/co
- `HKVariantsPhrases.txt` - 轉入香港字形(如 `s2hk`、`t2hk`)時使用的香港異體字詞組例外
- `HKVariantsRevPhrases.txt` - 從香港字形轉出(如 `hk2s`、`hk2t`)時使用的香港異體字詞組例外

- **日文新舊字形**
- `JPShinjitaiCharacters.txt` - 日文新字體(單字)
- `JPShinjitaiPhrases.txt` - 日文新字體(詞組)
- `JPVariants.txt` - 日文異體字
- **日文新舊字形**(僅供探索性研究,不建議用於生產環境)
- `JPShinjitaiCharacters.txt` - 日文新舊字體對照(單字)
- `JPShinjitaiPhrases.txt` - 日文新字體到舊字體(詞組,亦包含少量和製漢語詞匯轉換)

### 2. 詞典格式規範

Expand Down Expand Up @@ -315,6 +314,12 @@ python3 scripts/add_testcase.py \
- `t2s` - OpenCC 標準繁體到簡體
- `t2tw` - OpenCC 標準繁體到臺灣正體
- `t2hk` - OpenCC 標準繁體到香港繁體

下列模式目前缺少大量詞組,歡迎貢獻新詞組:
- `s2hkp` - 簡體到香港繁體(含地域用詞轉換)
- `hk2sp` - 香港繁體到簡體(含地域用詞轉換)

下列模式僅供探索性研究,不建議用於生產環境:
- `jp2t` - 日文新字體到舊字體
- `t2jp` - 日文舊字體到新字體

Expand Down
7 changes: 5 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -230,14 +230,17 @@ Rules:
* `tw2t.json` **Traditional Chinese (Taiwan Standard)** to **Traditional Chinese (OpenCC Standard)** / **台灣正體** 到 **OpenCC 標準繁體**
* `t2hk.json` **Traditional Chinese (OpenCC Standard)** to **Traditional Chinese (Hong Kong variant)** / **OpenCC 標準繁體** 到 **香港繁體**
* `hk2t.json` **Traditional Chinese (Hong Kong variant)** to **Traditional Chinese (OpenCC Standard)** / **香港繁體** 到 **OpenCC 標準繁體**
* `t2jp.json` **Traditional Chinese Characters (Kyūjitai)** to **New Japanese Kanji (Shinjitai)** / **OpenCC 標準繁體(日文舊字體)** 到 **日文新字體**
* `jp2t.json` **New Japanese Kanji (Shinjitai)** to **Traditional Chinese Characters (Kyūjitai)** / **日文新字體** 到 **OpenCC 標準繁體(日文舊字體)**

下列配置文件仍在開發中,歡迎貢獻新詞組:

* `s2hkp.json` **Simplified Chinese** to **Traditional Chinese (Hong Kong variant, with Hong Kong Phrases)** / **簡體** 到 **香港繁體(香港常用詞彙)**
* `hk2sp.json` **Traditional Chinese (Hong Kong variant)** to **Simplified Chinese (Mainland China Phrases)** / **香港繁體** 到 **簡體(含中國大陸常用詞彙)**

下列配置文件僅供探索性研究,不建議用於生產環境:

* `t2jp.json` **Old Japanese Kanji (Kyūjitai)** to **New Japanese Kanji (Shinjitai)** / **日文舊字體** 到 **日文新字體**
* `jp2t.json` **New Japanese Kanji (Shinjitai)** to **Old Japanese Kanji (Kyūjitai)** / **日文新字體** 到 **日文舊字體**,並將少量日文詞組轉換爲對應中文

#### 指定配置文件

通过环境变量`OPENCC_DATA_DIR`加载指定路径下的配置文件
Expand Down
11 changes: 5 additions & 6 deletions data/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ set(
HKVariantsRevPhrases
HKPhrases
HKPhrasesRev
JPVariants
JPShinjitaiCharacters
JPShinjitaiPhrases
)
Expand All @@ -32,7 +31,7 @@ set(
TSCharactersExt
TWVariantsRev
HKVariantsRev
JPVariantsRev
JPShinjitaiCharactersRev
)

set(DICTS ${DICTS_RAW} ${DICTS_GENERATED})
Expand Down Expand Up @@ -84,12 +83,12 @@ set(
)

set(
DICT_JPVariantsRev_GENERATING_INPUT
${DICT_DIR}/JPVariants.txt
DICT_JPShinjitaiCharactersRev_GENERATING_INPUT
${DICT_DIR}/JPShinjitaiCharacters.txt
)
set(
DICT_JPVariantsRev_GENERATING_COMMAND
${DICT_REVERSE_BIN} ${DICT_JPVariantsRev_GENERATING_INPUT} JPVariantsRev.txt
DICT_JPShinjitaiCharactersRev_GENERATING_COMMAND
${DICT_REVERSE_BIN} ${DICT_JPShinjitaiCharactersRev_GENERATING_INPUT} JPShinjitaiCharactersRev.txt
)

foreach(DICT ${DICTS_GENERATED})
Expand Down
5 changes: 2 additions & 3 deletions data/config/jp2t.json
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{
"name": "New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai)",
"name": "New Japanese Kanji (Shinjitai) to Old Japanese Kanji (Kyūjitai)",
"segmentation": {
"type": "mmseg",
"dict": { "type": "ocd2", "file": "JPShinjitaiPhrases.ocd2" }
Expand All @@ -10,8 +10,7 @@
"type": "group",
"dicts": [
{ "type": "ocd2", "file": "JPShinjitaiPhrases.ocd2" },
{ "type": "ocd2", "file": "JPShinjitaiCharacters.ocd2" },
{ "type": "ocd2", "file": "JPVariantsRev.ocd2" }
{ "type": "ocd2", "file": "JPShinjitaiCharacters.ocd2" }
]
}
}
Expand Down
6 changes: 3 additions & 3 deletions data/config/t2jp.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"name": "Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai)",
"name": "Old Japanese Kanji (Kyūjitai) to New Japanese Kanji (Shinjitai)",
"segmentation": {
"type": "mmseg",
"dict": { "type": "ocd2", "file": "JPVariants.ocd2" }
"dict": { "type": "ocd2", "file": "JPShinjitaiCharactersRev.ocd2" }
},
"conversion_chain": [
{ "dict": { "type": "ocd2", "file": "JPVariants.ocd2" } }
{ "dict": { "type": "ocd2", "file": "JPShinjitaiCharactersRev.ocd2" } }
]
}
4 changes: 2 additions & 2 deletions data/dictionary/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ package(default_visibility = ["//visibility:public"])
for txt in [
"TWVariants",
"HKVariants",
"JPVariants",
"JPShinjitaiCharacters",
]
]

Expand All @@ -32,7 +32,7 @@ TEXT_DICTS = glob(
"TSCharactersExt.txt",
"TWVariantsRev.txt",
"HKVariantsRev.txt",
"JPVariantsRev.txt",
"JPShinjitaiCharactersRev.txt",
]

NON_BMP_TEXT_DICTS = [
Expand Down
Loading
Loading