Skip to content

Conversation

@fogelito
Copy link
Contributor

@fogelito fogelito commented Dec 18, 2025

Summary by CodeRabbit

  • New Features

    • Added capability to detect and sanitize non-UTF-8 characters in database operations across all adapters.
  • Bug Fixes

    • Improved error handling and reporting for character encoding issues in MariaDB and MySQL adapters.
  • Chores

    • Introduced a new Character exception type for better error classification.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 18, 2025

Warning

Rate limit exceeded

@fogelito has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 25 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 78c35b8 and 4d171d3.

📒 Files selected for processing (8)
  • src/Database/Adapter.php (1 hunks)
  • src/Database/Adapter/MariaDB.php (3 hunks)
  • src/Database/Adapter/Mongo.php (1 hunks)
  • src/Database/Adapter/Pool.php (1 hunks)
  • src/Database/Adapter/Postgres.php (1 hunks)
  • src/Database/Adapter/SQLite.php (1 hunks)
  • src/Database/Database.php (0 hunks)
  • tests/e2e/Adapter/Scopes/DocumentTests.php (2 hunks)

Walkthrough

This PR introduces non-UTF-8 character handling across the database layer. It adds an abstract method to the base Adapter class, implements capability checks in all concrete adapter classes, creates a new Character exception type, and maps specific MySQL/MariaDB character encoding errors to this exception. A sanitization utility is added to the Database class, along with corresponding test coverage.

Changes

Cohort / File(s) Summary
Adapter capability interface
src/Database/Adapter.php
Added abstract method getSupportNonUtcCharacters(): bool to enforce implementations across all adapters.
Adapter implementations
src/Database/Adapter/MariaDB.php, src/Database/Adapter/Mongo.php, src/Database/Adapter/MySQL.php, src/Database/Adapter/Pool.php, src/Database/Adapter/Postgres.php, src/Database/Adapter/SQLite.php
Implemented getSupportNonUtcCharacters() in all adapters. MariaDB and MySQL return true and false respectively; Mongo, Postgres, and SQLite return false; Pool delegates via __FUNCTION__.
Character exception handling
src/Database/Exception/Character.php
New Character exception class extending base Exception for character-related database errors.
Error mapping for character encoding
src/Database/Adapter/MariaDB.php, src/Database/Adapter/MySQL.php
Enhanced processException() to map MySQL error code 1366 (invalid character) to CharacterException.
Database sanitization
src/Database/Database.php
Added public method sanitizeNonUtf8(string $string): string that returns input unchanged.
Test updates
tests/e2e/Adapter/Base.php, tests/e2e/Adapter/Scopes/DocumentTests.php
Commented out CollectionTests and CustomDocumentTypeTests in Base; added new testNonUtfChars() in DocumentTests to validate non-UTF-8 character handling when adapters support it.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

  • The changes follow a clear, repetitive pattern across adapters (capability method implementations)
  • Error mapping additions are straightforward conditional logic
  • New exception class is minimal with no complex initialization
  • Test case is straightforward validation logic

Areas for attention:

  • Verify error code 1366 mapping is correct for both MySQL and MariaDB variants
  • Confirm sanitizeNonUtf8() placeholder implementation aligns with intended future behavior
  • Ensure trait disabling in Base test class doesn't break existing test coverage expectations

Possibly related PRs

  • Feat mongo tmp #647: Both PRs modify database adapter capability methods and surface APIs across multiple adapter implementations.

Suggested reviewers

  • abnegate

Poem

🐰 Hop through databases with UTF-8 care,
Characters handled with adapter awareness,
MariaDB says yes, but others say no,
Exceptions now catch what strings overflow! 🌟

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 27.78% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Handle non utf chars' directly reflects the main change: adding support for handling non-UTF-8 characters across database adapters.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
src/Database/Adapter/Pool.php (1)

646-649: Consider renaming to match the getSupportFor* convention.

The method name getSupportNonUtcCharacters() breaks the established naming pattern. All other capability checks in this class use getSupportFor* (e.g., getSupportForSchemas(), getSupportForAttributes(), getSupportForTimeouts()).

For consistency, consider renaming to getSupportForNonUtcCharacters() across the entire codebase (including the abstract definition in Adapter.php and all concrete implementations).

🔎 View suggested naming pattern
-    public function getSupportNonUtcCharacters(): bool
+    public function getSupportForNonUtcCharacters(): bool
     {
         return $this->delegate(__FUNCTION__, \func_get_args());
     }

Note: This change would also require updating the abstract method definition in src/Database/Adapter.php and all concrete adapter implementations (MariaDB, SQLite, Mongo, Postgres).

src/Database/Adapter/MySQL.php (1)

8-8: MySQL: new CharacterException mapping for error 1366 is sensible but broad

The new import and early branch in processException() cleanly surface character/encoding issues as a dedicated CharacterException, which fits the PR goal.

Be aware though that MySQL error 1366 can also be raised for non-encoding problems (e.g. “Incorrect integer value”), so this check will classify all such cases as CharacterException. If you ever need to narrow this to “incorrect string value” only, you might additionally inspect errorInfo[2] (the vendor message) before mapping.

Also applies to: 151-153

src/Database/Adapter.php (1)

1468-1473: Naming nit: getSupportNonUtcCharacters looks like a typo for UTF

The docblock talks about “non utf characters”, but the method is spelled getSupportNonUtcCharacters (“Utc” as in time). Since this is a newly introduced abstract, you may want to fix the spelling now (e.g. getSupportNonUtfCharacters or getSupportForNonUtfCharacters) across adapters before it becomes baked into the public surface.

tests/e2e/Adapter/Scopes/DocumentTests.php (1)

40-47: Use assertInstanceOf instead of instanceof + assertTrue for the CharacterException check.

Functionally fine, but PHPUnit’s dedicated assertion is clearer and more idiomatic.

Suggested diff
-        try {
-            $database->createDocument(__FUNCTION__, new Document([
-                'title' => $nonUtfString,
-            ]));
-            $this->fail('Failed to throw exception');
-        } catch (Throwable $e) {
-            $this->assertTrue($e instanceof CharacterException);
-        }
+        try {
+            $database->createDocument(__FUNCTION__, new Document([
+                'title' => $nonUtfString,
+            ]));
+            $this->fail('Failed to throw exception');
+        } catch (Throwable $e) {
+            $this->assertInstanceOf(CharacterException::class, $e);
+        }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbe23a3 and 78c35b8.

📒 Files selected for processing (11)
  • src/Database/Adapter.php (1 hunks)
  • src/Database/Adapter/MariaDB.php (3 hunks)
  • src/Database/Adapter/Mongo.php (1 hunks)
  • src/Database/Adapter/MySQL.php (2 hunks)
  • src/Database/Adapter/Pool.php (1 hunks)
  • src/Database/Adapter/Postgres.php (1 hunks)
  • src/Database/Adapter/SQLite.php (1 hunks)
  • src/Database/Database.php (1 hunks)
  • src/Database/Exception/Character.php (1 hunks)
  • tests/e2e/Adapter/Base.php (1 hunks)
  • tests/e2e/Adapter/Scopes/DocumentTests.php (2 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-16T09:37:33.531Z
Learnt from: fogelito
Repo: utopia-php/database PR: 733
File: src/Database/Adapter/MariaDB.php:1801-1806
Timestamp: 2025-10-16T09:37:33.531Z
Learning: In the MariaDB adapter (src/Database/Adapter/MariaDB.php), only duplicate `_uid` violations should throw `DuplicateException`. All other unique constraint violations, including `PRIMARY` key collisions on the internal `_id` field, should throw `UniqueException`. This is the intended design to distinguish between user-facing document duplicates and internal/user-defined unique constraint violations.

Applied to files:

  • src/Database/Adapter/MySQL.php
  • src/Database/Exception/Character.php
  • src/Database/Adapter/MariaDB.php
  • tests/e2e/Adapter/Scopes/DocumentTests.php
📚 Learning: 2025-10-03T02:04:17.803Z
Learnt from: abnegate
Repo: utopia-php/database PR: 721
File: tests/e2e/Adapter/Scopes/DocumentTests.php:6418-6439
Timestamp: 2025-10-03T02:04:17.803Z
Learning: In tests/e2e/Adapter/Scopes/DocumentTests::testSchemalessDocumentInvalidInteralAttributeValidation (PHP), when the adapter reports getSupportForAttributes() === false (schemaless), the test should not expect exceptions from createDocuments for “invalid” internal attributes; remove try/catch and ensure the test passes without exceptions, keeping at least one assertion.

Applied to files:

  • tests/e2e/Adapter/Scopes/DocumentTests.php
  • tests/e2e/Adapter/Base.php
📚 Learning: 2025-10-16T08:48:36.715Z
Learnt from: fogelito
Repo: utopia-php/database PR: 739
File: src/Database/Adapter/Postgres.php:154-158
Timestamp: 2025-10-16T08:48:36.715Z
Learning: For the utopia-php/database repository, no migration scripts are needed for the collation change from utf8_ci to utf8_ci_ai in the Postgres adapter because there is no existing production data.

Applied to files:

  • src/Database/Adapter/Postgres.php
📚 Learning: 2025-10-29T12:27:57.071Z
Learnt from: ArnabChatterjee20k
Repo: utopia-php/database PR: 747
File: src/Database/Adapter/Mongo.php:1449-1453
Timestamp: 2025-10-29T12:27:57.071Z
Learning: In src/Database/Adapter/Mongo.php, when getSupportForAttributes() returns false (schemaless mode), the updateDocument method intentionally uses a raw document without $set operator for replacement-style updates, as confirmed by the repository maintainer ArnabChatterjee20k.

Applied to files:

  • src/Database/Adapter/Mongo.php
🧬 Code graph analysis (7)
src/Database/Adapter/MySQL.php (2)
src/Database/Exception.php (1)
  • Exception (7-21)
src/Database/Exception/Character.php (1)
  • Character (7-9)
src/Database/Adapter/SQLite.php (5)
src/Database/Adapter.php (1)
  • getSupportNonUtcCharacters (1473-1473)
src/Database/Adapter/MariaDB.php (1)
  • getSupportNonUtcCharacters (2239-2242)
src/Database/Adapter/Mongo.php (1)
  • getSupportNonUtcCharacters (3225-3228)
src/Database/Adapter/Pool.php (1)
  • getSupportNonUtcCharacters (646-649)
src/Database/Adapter/Postgres.php (1)
  • getSupportNonUtcCharacters (2742-2745)
src/Database/Exception/Character.php (1)
src/Database/Exception.php (1)
  • Exception (7-21)
src/Database/Adapter/Postgres.php (5)
src/Database/Adapter.php (1)
  • getSupportNonUtcCharacters (1473-1473)
src/Database/Adapter/SQLite.php (1)
  • getSupportNonUtcCharacters (1880-1883)
src/Database/Adapter/MariaDB.php (1)
  • getSupportNonUtcCharacters (2239-2242)
src/Database/Adapter/Mongo.php (1)
  • getSupportNonUtcCharacters (3225-3228)
src/Database/Adapter/Pool.php (1)
  • getSupportNonUtcCharacters (646-649)
src/Database/Adapter/Mongo.php (5)
src/Database/Adapter.php (1)
  • getSupportNonUtcCharacters (1473-1473)
src/Database/Adapter/SQLite.php (1)
  • getSupportNonUtcCharacters (1880-1883)
src/Database/Adapter/MariaDB.php (1)
  • getSupportNonUtcCharacters (2239-2242)
src/Database/Adapter/Pool.php (1)
  • getSupportNonUtcCharacters (646-649)
src/Database/Adapter/Postgres.php (1)
  • getSupportNonUtcCharacters (2742-2745)
src/Database/Adapter.php (5)
src/Database/Adapter/SQLite.php (1)
  • getSupportNonUtcCharacters (1880-1883)
src/Database/Adapter/MariaDB.php (1)
  • getSupportNonUtcCharacters (2239-2242)
src/Database/Adapter/Mongo.php (1)
  • getSupportNonUtcCharacters (3225-3228)
src/Database/Adapter/Pool.php (1)
  • getSupportNonUtcCharacters (646-649)
src/Database/Adapter/Postgres.php (1)
  • getSupportNonUtcCharacters (2742-2745)
src/Database/Adapter/Pool.php (6)
src/Database/Adapter.php (1)
  • getSupportNonUtcCharacters (1473-1473)
src/Database/Adapter/SQLite.php (1)
  • getSupportNonUtcCharacters (1880-1883)
src/Database/Adapter/MariaDB.php (1)
  • getSupportNonUtcCharacters (2239-2242)
src/Database/Adapter/Mongo.php (1)
  • getSupportNonUtcCharacters (3225-3228)
src/Database/Adapter/Postgres.php (1)
  • getSupportNonUtcCharacters (2742-2745)
src/Database/Mirror.php (1)
  • delegate (88-103)
🪛 GitHub Actions: CodeQL
tests/e2e/Adapter/Scopes/DocumentTests.php

[error] 50-50: PHPStan: Parameter #3 $subject of function str_replace expects array|string, string|false given.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Setup & Build Docker Image
🔇 Additional comments (5)
tests/e2e/Adapter/Base.php (1)

26-27: The review comment is based on incorrect code assumptions. Lines 26-27 of tests/e2e/Adapter/Base.php show CollectionTests and CustomDocumentTypeTests are currently enabled, not commented out. If these traits were previously disabled and have been re-enabled, verify the reason for their reactivation and ensure all tests pass. If the comment refers to an earlier state, update it to reflect the current code.

Likely an incorrect or invalid review comment.

src/Database/Adapter/Mongo.php (1)

3224-3227: Mongo: non-UTF character support flag wired correctly

Returning false here is consistent with the new abstract in Adapter and with the other non-SQL adapters; no issues from this implementation.

src/Database/Exception/Character.php (1)

1-9: CharacterException type is well-scoped and consistent

This minimal Character exception cleanly reuses the existing base Exception behavior and gives adapters a dedicated type for character-related failures; no changes needed.

src/Database/Adapter/MariaDB.php (2)

10-10: MariaDB: CharacterException mapping for 1366 integrates cleanly

Importing CharacterException and handling 22007 / 1366 first in processException() gives a clear, adapter-specific signal for invalid character/value errors without disturbing the existing timeout/duplicate/not-found mappings. This looks consistent with how the MySQL adapter is treated.

Also applies to: 1842-1845


2239-2242: MariaDB: non-UTF character support flag matches adapter behavior

Returning true from getSupportNonUtcCharacters() is consistent with the new abstract and with the fact that MariaDB now exposes dedicated CharacterException handling; no issues here.

@utopia-php utopia-php deleted a comment from coderabbitai bot Dec 18, 2025
@abnegate abnegate merged commit 414dbde into main Dec 18, 2025
18 checks passed
@abnegate abnegate deleted the handle-non-utf-chars branch December 18, 2025 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants