Skip to content

bound the trailing hex read in my_mb_wc_filename#5187

Open
jmestwa-coder wants to merge 1 commit into
MariaDB:mainfrom
jmestwa-coder:filename-mbwc-overread
Open

bound the trailing hex read in my_mb_wc_filename#5187
jmestwa-coder wants to merge 1 commit into
MariaDB:mainfrom
jmestwa-coder:filename-mbwc-overread

Conversation

@jmestwa-coder
Copy link
Copy Markdown
Contributor

my_mb_wc_filename decodes the 5-byte '@hhhh' escape but the length guard is off by one:

  • the s + 4 > e check only covers 4 bytes while the hex branch reads s[4]
  • a truncated @HHH whose buffer ends at s+4 over-reads one byte past e, reachable through well_formed_length/charpos on a my_charset_filename string with tight bounds
  • the existing s[3] ? guard only stops at a NUL terminator, not at the end pointer
    bound the s[4] read against e; added a strings-t case that returns ILSEQ instead of over-reading (ASAN flags the read of size 1 past a 4-byte buffer before the fix).

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an out-of-bounds read vulnerability in my_mb_wc_filename by ensuring that the end pointer is checked before reading the 4th hex digit of a truncated escape sequence. It also adds a unit test to verify this behavior. The review comments suggest improving the unit test by returning the actual return value of my_ci_mb_wc and asserting the expected error code (MY_CS_ILSEQ) directly, which makes the test more idiomatic and direct.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1714 to +1721
static int test_mb_wc_filename_truncated()
{
CHARSET_INFO *cs= &my_charset_filename;
uchar buf[5]= {'@', '0', '0', '0', '1'};
my_wc_t wc= 0;
int rc= my_ci_mb_wc(cs, &wc, buf, buf + 4);
return rc != MY_CS_ILSEQ;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Returning the actual return value of my_ci_mb_wc rather than a boolean mapping makes the test more direct and easier to understand. We can then assert the expected error code (MY_CS_ILSEQ) directly in the test runner.

static int test_mb_wc_filename_truncated()
{
  CHARSET_INFO *cs= &my_charset_filename;
  uchar buf[5]= {'@', '0', '0', '0', '1'};
  my_wc_t wc= 0;
  return my_ci_mb_wc(cs, &wc, buf, buf + 4);
}

Comment on lines +1757 to +1758
ok(test_mb_wc_filename_truncated() == 0,
"filename decoder does not read past the end pointer");
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Asserting the expected error code (MY_CS_ILSEQ) directly is more idiomatic and provides clearer test output if the assertion fails.

  ok(test_mb_wc_filename_truncated() == MY_CS_ILSEQ,
     "filename decoder does not read past the end pointer");

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant