feat(ndc): configurable input_encoding (names + code pages) for NdcContentParser#16
Merged
Conversation
ResolveEncoding now accepts any encoding name (e.g. iso-8859-5) or numeric code page (e.g. 28595) in addition to the latin1/utf-8/ascii fast paths. Unknown or invalid values throw NotSupportedException with a clear message instead of silently falling back to Latin1. CodePagesEncodingProvider is auto-registered in a static constructor so non-Latin1 ISO/Windows code pages (Cyrillic via ISO-8859-5, etc.) are available without caller setup. The binary parse path coerces numeric input_encoding values via ToString, matching NdcOptions.FromDictionary. Closes #15
b926b36 to
fe8b587
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #15
Что
NdcContentParser.ResolveEncodingтеперь принимает любую кодировку, а не захардкоженный набор:iso-8859-5, и т.д. (черезEncoding.GetEncoding(name))28595(черезEncoding.GetEncoding(int))latin1/iso-8859-1,utf-8/utf8,ascii— возвращают framework-синглтоны без обращения к провайдеру.Главный кейс из issue — печать кириллицы через ISO-8859-5 — работает.
Обработка ошибок
Неизвестная/некорректная кодировка →
NotSupportedExceptionс понятным сообщением (включает невалидное значение), вместо прежнего молчаливого fallback на Latin1.Авто-регистрация провайдера
CodePagesEncodingProvider.Instanceрегистрируется в статическом конструктореNdcContentParser(thread-safe, до первогоGetEncoding), так что не-Latin1 ISO/Windows code pages доступны без настройки со стороны вызывающего. ПакетSystem.Text.Encoding.CodePagesне нужен — тип в shared framework на net8.0/net10.0.Бинарный путь
Parse(ReadOnlyMemory<byte>)приводит числовыеinput_encodingчерезToString()(какNdcOptions.FromDictionary), иначе boxedintиз YAML молча падал в Latin1.AOT
Провайдер data-driven, без рефлексии/dynamic.
IsAotCompatible=trueсохранён, 0 trim/AOT warning.Тесты
NdcBinaryParserTests: resolve по имени и номеру (CodePage == 28595), throw на неизвестном имени и code page, e2e декод кириллицы ISO-8859-5, e2e числовой boxed-intinput_encoding. NDC: 22 passed. Полный прогон 3322 passed, 0 failed (net10.0).