Skip to content

Improve Hidden Character Finder implementation#300

Draft
Copilot wants to merge 2 commits intomainfrom
copilot/port-php-to-csharp
Draft

Improve Hidden Character Finder implementation#300
Copilot wants to merge 2 commits intomainfrom
copilot/port-php-to-csharp

Conversation

Copy link
Copy Markdown

Copilot AI commented Mar 18, 2026

Ports and improves the Hidden Character Finder from the PHP reference implementation to C#.

References:

Changes Made

New File: HCFService.cs

Extracted all character analysis logic into a public static HCFService class, making it independently unit-testable without Blazor rendering.

Bug Fixes

  • Correct Unicode code point calculation: The old implementation incorrectly summed raw UTF-8 byte values (e.g., U+00A0 showed 354 instead of 160). Now correctly uses (int)c[0] for BMP characters and char.ConvertToUtf32() for surrogate pairs.
  • Correct output format: Now matches the PHP reference — &#160;<br>\u00A0 and U+00A0 for multi-byte chars; 83<br>0x53 for ASCII chars.
  • Surrogate pair handling: Text2Html now uses StringInfo.SubstringByTextElements instead of string.Substring(i, 1) to correctly process characters outside the BMP (emoji, etc.).
  • Unicode escapes for supplementary characters: \uXXXX (4 hex digits) for BMP, \UXXXXXXXX (8 hex digits) for code points > U+FFFF.
  • Whitespace symbol CSS class: Changed from undefined white class to symbol class.

Improvements

  • Real character/byte counts: Replaced the # characters, # bytes placeholder with actual computed values displayed after processing.
  • Cleaner initial state: Output section is hidden until "Show me the characters" is clicked.
  • Cleanup: Removed debug Console.WriteLine calls, empty Decode() method, commented-out code, and static example HTML section.

Testing

  • ✅ 10 new tests added to HCFTest.cs covering rendering, ASCII char tooltips, multi-byte Unicode, whitespace symbols, hidden characters, and char/byte counts.
  • ✅ All 33 project tests pass.
  • ✅ CodeQL analysis found no alerts.
Original prompt

This section details on the original issue you should resolve

<issue_title>Hidden Character Finder</issue_title>
<issue_description>Port php to C#

Comments on the Issue (you are @copilot in this section)

@AlexHedley e029e33 @AlexHedley ![image](https://user-images.githubusercontent.com/1573469/181906890-e936456f-5cca-4b39-903a-84961a8146dd.png) ![image](https://user-images.githubusercontent.com/1573469/181907679-489f8bbb-6070-4603-9fd5-3e0f600e236d.png) @AlexHedley Review the current implementation of Hidden Character Finder and suggest any improvements

💬 Send tasks to Copilot coding agent from Slack and Teams to turn conversations into code. Copilot posts an update in your thread when it's finished.

@AlexHedley
Copy link
Copy Markdown
Owner

@copilot retry please

…e pairs, add HCFService, tests

Co-authored-by: AlexHedley <1573469+AlexHedley@users.noreply.github.com>
Copy link
Copy Markdown
Author

Copilot AI commented Mar 19, 2026

@copilot retry please

Done in commit dadf7d5. Here's a summary of the improvements made to the Hidden Character Finder:

Bug fixes:

  • Fixed HtmlChar to correctly compute Unicode code points — the old implementation summed raw UTF-8 byte values instead of using the actual code point (e.g., U+00A0 was showing 354 instead of 160)
  • Fixed Text2Html to use StringInfo.SubstringByTextElements to correctly handle surrogate pairs (emoji, etc.)
  • Fixed Unicode escape format: BMP chars use \uXXXX, supplementary chars (> U+FFFF) use \UXXXXXXXX
  • Fixed space/tab symbols: were using an undefined CSS class white, changed to symbol class consistent with other whitespace

Improvements:

  • Extracted logic to HCFService.cs (public static class) — makes the logic independently unit-testable
  • Character/byte count now shows real values (replaced "# characters, # bytes" placeholder)
  • Output section is hidden until "Show me the characters" is clicked
  • Removed debug Console.WriteLine calls, commented-out code, empty Decode() method, and static demo section

Tests: Added 10 tests covering rendering, ASCII char tooltips, multi-byte Unicode, whitespace symbols, hidden chars, and char/byte counts — all 33 project tests pass.

Copilot AI changed the title [WIP] Port Hidden Character Finder from PHP to C# Improve Hidden Character Finder implementation Mar 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hidden Character Finder

2 participants