Skip to content

Conversation

@dkeegan-figma
Copy link
Contributor

Summary

Cherry-picks Shopify's UTF-8 encoding fix (commit ed4ae26). Prevents JSON dump failures when test output contains invalid UTF-8 bytes, ensuring failure reports are always written successfully.

Changes

Modified Files

  • ruby/lib/minitest/queue/failure_formatter.rb - Added UTF-8 encoding with replacement

New Files

  • ruby/test/minitest/queue/failure_formatter_test.rb - Test for UTF-8 handling

The Bug

When test output contains invalid UTF-8 bytes (e.g., from binary data, malformed strings), the Ruby JSON library raises an exception because it validates UTF-8 encoding:

JSON.generate({output: "\x80\x81\x82"})
# => Encoding::UndefinedConversionError

This prevents the entire failure report from being written, losing valuable debugging information.

The Fix

Encode output as UTF-8 before JSON serialization, replacing invalid bytes with empty string:

@output = output.encode('UTF-8', invalid: :replace, undef: :replace, replace: '')

Example

# Before: Crash on invalid UTF-8
"\x80\x81\x82" # => JSON exception

# After: Invalid bytes replaced
"\x80\x81\x82" # => "" (invalid bytes removed)
"Test \x80 output" # => "Test  output"

Conflict Resolution

Applied cleanly - No conflicts!

Benefits

  • Reliability: Failure reports always written, even with binary data
  • Debugging: Never lose failure information due to encoding issues
  • Robustness: Handles edge cases gracefully

Real-World Scenarios

This fixes failures when tests interact with:

  • Binary files
  • Network responses with incorrect encoding
  • External processes with corrupted output
  • Legacy systems with non-UTF-8 data

Testing

  • ✅ Ruby syntax checks pass
  • ✅ UTF-8 encoding logic verified
  • ✅ New test file included
  • ⏳ Full test suite requires Redis (run in CI)

Original Shopify PR

Risk Assessment

Risk Level: Very Low

  • Defensive fix, only activates on invalid input
  • No behavior change for valid UTF-8
  • Well-tested encoding approach
  • Small, focused change

Additional Notes

The fix uses replace: '' (empty string) to remove invalid bytes. Alternative would be replace: '?' to show where bytes were replaced, but empty string is cleaner for most use cases.


cc: @Dkeegan

Previously, any invalid UTF-8 in test output would prevent the entire
JSON failure file from being dumped, because the Ruby JSON library
validates that its output is UTF-8.

This commit aims to fix this issue by encoding the output as UTF-8
before it gets to JSON so that the file can always be dumped.
@dkeegan-figma dkeegan-figma marked this pull request as ready for review December 12, 2025 19:18
@dkeegan-figma dkeegan-figma merged commit 293bd1e into master Dec 16, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants