Add ClickHouse adapter and Usage analytics library#3
Add ClickHouse adapter and Usage analytics library#3lohanidamodar wants to merge 90 commits intomainfrom
Conversation
- Database adapter - ClickHouse adapter
- Removed hardcoded column definitions in Usage class, replacing with dynamic schema derived from SQL adapter. - Introduced new Query class for building ClickHouse queries with fluent interface. - Added support for advanced query operations including find and count methods. - Enhanced error handling and SQL injection prevention mechanisms. - Created comprehensive usage guide for ClickHouse adapter. - Added unit tests for Query class to ensure functionality and robustness. - Maintained backward compatibility with existing methods while improving overall architecture.
…metric logging with deterministic IDs
…ed tags in ClickHouse and Database adapters
…pdate tests for new behavior
feat: integrate Utopia\Query for usage metrics queries and remove leg…
Redesigned the library from scratch to support two key use cases:
1. **Events** (API analytics + metered billing):
- SummingMergeTree table with rich dimensions: metric, projectId, path,
method, status, userAgent, resource, resourceId, tags
- Cloudflare-style flexible filtering via queryEvents() with arbitrary
dimension combinations, GROUP BY, and time-bucket aggregation
- sumEvents()/countEvents() for billing calculations (bandwidth, requests,
compute hours per project/resource)
- Hourly and daily period aggregation support
2. **Gauges** (resource count snapshots):
- ReplacingMergeTree for point-in-time snapshots (projects, databases,
functions, storage counts)
- Replace-upsert semantics: latest value per (metric, resource, period, time)
- Both hourly and daily snapshot periods with historical retention
Key changes:
- New Adapter interface with events/gauges separation (replaces old increment/set)
- New ClickHouse adapter with LowCardinality columns, bloom filter indexes,
and parameterized queries for SQL injection safety
- Usage class with dual buffer (events append, gauges last-write-wins)
- Removed Database/SQL adapters and utopia-php/database dependency (ClickHouse-focused)
- Removed utopia-php/query dependency (replaced with flexible filter maps)
- 46 integration tests + 15 unit tests covering analytics, billing, gauges,
buffering, multi-tenancy, validation, and purge scenarios
https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
…hierarchy
- Add 'inf' (infinite/lifetime) period support to events and gauges queries
- Add Adapter::resolveMetric() for template placeholder resolution
(e.g. '{databaseInternalId}.collections' → 'db123.collections')
- Add reduce() method for cascading delete deductions via negative events
- ClickHouse: handle inf period in queryEvents (no time bucketing),
queryGauges, and buildGaugeRow (epoch timestamp)
- Tests: unit tests for metric resolution, integration tests for
inf period, reduce, and Appwrite resource hierarchy scenarios
(database→collections→documents, buckets→files)
https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Snapshots (gauges) will handle resource counting, so we don't need the negative-event deduction pattern for cascading deletes. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Region is a high-cardinality filter/group-by target that belongs as a native LowCardinality(String) column rather than buried in the JSON tags field. Added to both events and gauges tables with bloom_filter indexes, row builders, dimension constants, and tests. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
- getMetrics(): multi-metric time-series with zero-filled gaps, returns
{total, data: [{value, date}]} per metric — serves console dashboards
- getMetricsTotals(): batch totals for multiple metrics — serves billing
- getBreakdown(): top-N per-resource breakdown by any dimension — serves
per-function/bucket/database drill-down views
All built on top of existing adapter methods, no ClickHouse changes needed.
https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
When false, returns only data points that have actual values (sparse), skipping zero-value gap filling. Defaults to true for backwards compat. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a usage/metrics subsystem and related infrastructure: an abstract Adapter API, a ClickHouse adapter implementation, a Usage manager with buffering/auto-flush and multi-tenant support, and a Metric value object. Adds PHPUnit tests for adapter and Usage logic, Docker and docker-compose configs, a multi-stage Dockerfile, Composer configuration (autoload, scripts, dev deps), CI workflows (lint, CodeQL, tests), coding-standard config (pint), CONTRIBUTING/CODE_OF_CONDUCT/LICENSE/README, and a .gitignore update. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
- Fix validatePeriod() using isset() which returns false for null values; use array_key_exists() instead so 'inf' period is accepted correctly - Include all dimension columns (region, resource, resourceId, path, method, status) in events table ORDER BY key so SummingMergeTree preserves distinct dimension values instead of collapsing rows with different dimensions https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
- Fix validatePeriod() using isset() which returns false for null values; use array_key_exists() so 'inf' period is accepted correctly - Include all dimension columns in events ORDER BY key so SummingMergeTree preserves rows with distinct dimension values - Add region to gauges ORDER BY key for ReplacingMergeTree - Use raw table (no FINAL) in countEvents() to count actual inserts - Normalize time keys in getMetrics() zero-fill matching to handle ClickHouse DateTime vs DateTime64 format differences - Fix Pint linter issues (single_line_after_imports, method_argument_space) https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
There was a problem hiding this comment.
Actionable comments posted: 13
🧹 Nitpick comments (7)
README.md (1)
6-8: Minor grammar suggestion: Use hyphenated compound adjectives.Consider using "dependency-free" (hyphenated) when used as a compound adjective before a noun.
✏️ Suggested fix
-Although this library is part of the [Utopia Framework](https://github.com/utopia-php/framework) project it is dependency free and can be used as standalone with any other PHP project or framework. +Although this library is part of the [Utopia Framework](https://github.com/utopia-php/framework) project it is dependency-free and can be used as standalone with any other PHP project or framework.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@README.md` around lines 6 - 8, Update the README.md text to use the hyphenated compound adjective "dependency-free" where the phrase "dependency free" appears (in the sentence starting "Although this library is part of the Utopia Framework..."), so it reads "dependency-free and can be used as standalone..." to follow compound-adjective grammar conventions.CONTRIBUTING.md (1)
9-11: Consider linking to the local Code of Conduct.The link points to
appwrite/appwriterepository's Code of Conduct, but this PR adds aCODE_OF_CONDUCT.mdto this repository. Consider linking locally for consistency.✏️ Suggested fix
-Help us keep Utopia-php open and inclusive. Please read and follow our [Code of Conduct](https://github.com/appwrite/appwrite/blob/master/CODE_OF_CONDUCT.md). +Help us keep Utopia-php open and inclusive. Please read and follow our [Code of Conduct](CODE_OF_CONDUCT.md).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@CONTRIBUTING.md` around lines 9 - 11, Update the external Code of Conduct link in CONTRIBUTING.md to point to the local repository's CODE_OF_CONDUCT.md file instead of the appwrite/appwrite URL; locate the heading "## Code of Conduct" and replace the hyperlink target so it references the local ./CODE_OF_CONDUCT.md (or simply CODE_OF_CONDUCT.md) to ensure the doc links to this repo's file..github/workflows/codeql-analysis.yml (1)
10-15: Consider updatingactions/checkoutto v4 and removing fragileHEAD^2checkout.The
actions/checkout@v3action is outdated. Additionally,git checkout HEAD^2assumes a merge commit structure and will fail on non-merge commits or when running on the default branch.♻️ Suggested improvement
- name: Checkout repository - uses: actions/checkout@v3 + uses: actions/checkout@v4 with: - fetch-depth: 2 - - - run: git checkout HEAD^2 + ref: ${{ github.event.pull_request.head.sha }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/codeql-analysis.yml around lines 10 - 15, Replace the fragile checkout steps: update the actions/checkout usage from actions/checkout@v3 to actions/checkout@v4 and remove the explicit git checkout HEAD^2 command; instead configure the checkout step to fetch the correct ref (for example set with: fetch-depth: 0 and/or ref: ${{ github.sha }} or use pull_request context when appropriate) so the workflow no longer depends on a merge-commit HEAD^2 assumption.Dockerfile (2)
1-1: Updatecomposer:2.0to a more recent version.
composer:2.0is from late 2020 and is significantly outdated. Consider usingcomposer:2(latest 2.x) or a specific recent version likecomposer:2.8for better compatibility, security patches, and performance improvements.Proposed fix
-FROM composer:2.0 as step0 +FROM composer:2 as step0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` at line 1, Replace the outdated base image reference "FROM composer:2.0" with a newer Composer 2.x tag (e.g., "composer:2" for latest 2.x or a specific recent patch like "composer:2.8") in the Dockerfile; update the FROM line to use the chosen tag so the build uses the improved, patched Composer image and then rebuild/test the image to confirm compatibility (locate the line containing FROM composer:2.0 in the Dockerfile to change).
11-24: Container runs as root user.The static analysis tool flags that no non-root USER is specified. While this is acceptable for a development/test container (which this appears to be based on the
tail -f /dev/nullcommand), consider adding a non-root user if this image might be used in production-like environments.Optional: Add non-root user
FROM php:8.3.3-cli-alpine3.19 as final LABEL maintainer="team@appwrite.io" +RUN adduser -D -u 1000 appuser WORKDIR /code COPY --from=step0 /src/vendor /code/vendor # Add Source Code COPY ./tests /code/tests COPY ./src /code/src COPY ./phpunit.xml /code/phpunit.xml +USER appuser CMD [ "tail", "-f", "/dev/null" ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 11 - 24, The Dockerfile currently runs the container as root; create and switch to a non-root user for safety by adding a system group/user (e.g., group "app" and user "app") using Alpine-compatible commands, ensure ownership of copied directories (/code, /code/vendor, /code/src, /code/tests) is changed to that user, and add a final USER app directive so the container process (CMD) runs as the non-root user; update the Dockerfile around the COPY steps and final CMD to create/chown and then set USER.tests/Usage/MetricTest.php (1)
29-30: Consider adding test coverage forgetValueedge cases.The tests verify integer value retrieval, but consider adding tests for:
- Float values:
['value' => 42.5]- Numeric strings:
['value' => '42']and['value' => '42.5']- Non-numeric strings:
['value' => 'abc'](should return default)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/Usage/MetricTest.php` around lines 29 - 30, Add unit tests in MetricTest to cover getValue edge cases: add assertions that Metric constructed with ['value' => 42.5] returns 42.5, with ['value' => '42'] returns 42 (or numeric cast), with ['value' => '42.5'] returns 42.5, and with ['value' => 'abc'] returns the default value (e.g., null or 0 based on Metric::getValue behavior); use the existing pattern (instantiate Metric and call getValue) and assert expected results with assertEquals to ensure numeric strings are handled and non-numeric strings fall back to the default.src/Usage/Metric.php (1)
19-22: Add documentation clarifying thatgetId()always returns an empty string for ClickHouse adapter results.The ClickHouse adapter's
parseRowsmethod (lines 660-700) and all SELECT queries do not populateidor$idfields. Since ClickHouse is the only adapter implementation,getId()will always return an empty string. Either document this behavior with a brief comment explaining it's reserved for potential future use, or remove the method if IDs are not required.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Metric.php` around lines 19 - 22, Add a PHPDoc comment above the Metric::getId() method explaining that for the current ClickHouse adapter its parseRows implementation and all SELECT queries do not populate 'id' or '$id', so getId() will always return an empty string; state this is kept for potential future adapters or legacy support and leave the method signature (public function getId(): string) unchanged. Mention the ClickHouse adapter's parseRows method as the source of this behavior so readers know why IDs are not present, and keep behavior intact rather than removing the method.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/codeql-analysis.yml:
- Around line 17-20: The workflow step named "Run CodeQL" calls `composer
check`, which is not defined in composer.json (only `lint`, `format`, and `test`
exist); update that step to call one of the existing scripts (e.g., replace
`composer check` with `composer lint` or `composer test`) so the CI command
matches a defined composer script and the job no longer fails.
In @.github/workflows/linter.yml:
- Line 11: Update the GitHub Actions checkout action from actions/checkout@v3 to
actions/checkout@v4 by replacing the reference "actions/checkout@v3" with
"actions/checkout@v4" in the workflow file so the step using the checkout action
uses the current stable release; ensure no other workflows still reference the
v3 tag.
- Line 15: Remove the problematic git checkout invocation by deleting the step
that runs "git checkout HEAD^2" from the workflow; the actions/checkout step
already checks out the correct PR commit for pull_request events, so remove the
run: git checkout HEAD^2 command to avoid failures on regular commits, squash
merges, and rebase merges.
In @.github/workflows/tests.yml:
- Around line 5-6: The job ID "lint" is inconsistent with its display name
"Tests"; update one to match the other for clarity: either rename the job ID
from "lint" to "tests" or change the name from "Tests" to "Lint" so the job
identifier and display name align (refer to the job block using the ID "lint"
and its name "Tests" in the workflow).
- Around line 11-15: Update the workflow to use the newer checkout action and
remove the unsafe manual ref switch: replace the usage reference
actions/checkout@v3 with actions/checkout@v4 and delete the git checkout HEAD^2
run step so the job does not attempt to check out a parent commit (which fails
for non-merge commits); ensure the workflow relies on the checkout action's
options (e.g., fetch-depth) instead of the explicit git checkout command.
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 683-685: queryEvents() currently coerces every aggregate result to
an integer in ClickHouse.php (the block that sets $row['value'] = (int)
$row['value']), which truncates fractional aggregates like avg; change this to
preserve decimals by either removing the int cast or casting to float (e.g. use
(float) $row['value'] or leave numeric strings as-is after an is_numeric check)
so aggregate: 'avg' returns non-truncated values; update the value handling code
in the ClickHouse.php method that processes query results (the block referencing
$row['value']) accordingly.
- Around line 593-596: The loop over $filters currently skips unknown keys,
widening queries and risking destructive operations (see purgeTable); update the
validation in the foreach that iterates $filters in ClickHouse::purgeTable (and
any filter-handling methods) to reject unknown filter keys instead of
continuing: check each $key against $validDimensions and throw an
InvalidArgumentException (or similar domain-specific exception) with a clear
message referencing the invalid filter key when it is not in $validDimensions,
mirroring how invalid groupBy dimensions are handled so callers fail fast on
typos or unexpected filters.
- Around line 871-875: The validatePeriod method currently uses
isset(self::PERIODS[$period]) which incorrectly rejects keys set to null (like
'inf'); change the existence check to use array_key_exists($period,
self::PERIODS) inside validatePeriod so keys with null values are accepted,
keeping the same InvalidArgumentException throw and message when the key truly
does not exist; locate validatePeriod and the PERIODS constant to make this
replacement.
- Around line 130-158: The current SummingMergeTree ORDER BY key (variable
$orderKey used in the CREATE TABLE within the query() call for eventsTable() )
omits several dimension columns (region, path, method, status, userAgent,
resource, resourceId, tags), causing distinct series to be collapsed during
merges; fix by including all queryable dimension columns in the ORDER BY key
(i.e. extend $orderKey for events to include region, path, method, status,
userAgent, resource, resourceId, tags — and likewise extend the ORDER BY for the
gauges table creation around lines 161-185 to include region and tags) so merges
preserve distinct rows per-dimension rather than collapsing them. Ensure the
sharedTables branch that builds $orderKey (and the CREATE TABLE calls that
reference $orderKey) is updated consistently.
In `@src/Usage/Metric.php`:
- Around line 29-36: The getValue method currently casts numeric strings to int
causing loss of decimals; in Metric::getValue capture $v (from $this['value'] ??
$default), and when is_numeric($v) coerce it to the proper numeric type instead
of forcing (int) — e.g., use the unary plus or cast to float (e.g., +$v or
(float)$v) so "42.5" becomes 42.5; ensure the return type still matches the
declared int|float and adjust any subsequent checks if needed.
In `@src/Usage/Usage.php`:
- Around line 476-490: These setter methods (setNamespace, setTenant,
setSharedTables) mutate the adapter context immediately while any buffered rows
still reference the previous context; before mutating the adapter you must
either flush buffered data or snapshot the current context into those buffered
rows so they remain tied to the original namespace/tenant/sharedTables. Update
setNamespace, setTenant, and setSharedTables in Usage to call flush() (or
perform a snapshot of the current context into the buffer) before delegating to
$this->adapter->setNamespace(...), $this->adapter->setTenant(...), and
$this->adapter->setSharedTables(...), respectively, so buffered rows are written
under the correct context.
- Around line 144-149: The current collectGauge method builds a dedupe key that
collapses distinct gauge snapshots by only using metric, resource and projectId;
update collectGauge (method name) to include all identity fields used for
uniqueness — e.g. $gauge['period'], $gauge['resourceId'], $gauge['region'],
$gauge['time'] and tenant (tenant or tenantId) — when composing the key so
hourly vs daily and different regions/resourceIds/tenants don't overwrite each
other; construct a deterministic composite key (for example by concatenating or
JSON-encoding the selected fields in a fixed order) and continue storing to
$this->gaugeBuffer[$key] and incrementing $this->bufferCount.
- Around line 409-424: The current flush logic clears both $this->eventBuffer
and $this->gaugeBuffer unconditionally even when $this->adapter->addEvents(...)
or $this->adapter->setGauges(...) returns false; change it so each buffer is
cleared only if its corresponding adapter call returned true (i.e. if
$eventsResult from addEvents is true then set $this->eventBuffer = [] and
decrement buffer counts accordingly, and if $gaugesResult from setGauges is true
then set $this->gaugeBuffer = []), update $this->bufferCount to reflect any
remaining items, and only update $this->lastFlushTime if at least one write
succeeded; use the existing symbols ($this->eventBuffer, $this->gaugeBuffer,
$this->adapter->addEvents, $this->adapter->setGauges, $this->bufferCount,
$this->lastFlushTime) to locate and apply the changes.
---
Nitpick comments:
In @.github/workflows/codeql-analysis.yml:
- Around line 10-15: Replace the fragile checkout steps: update the
actions/checkout usage from actions/checkout@v3 to actions/checkout@v4 and
remove the explicit git checkout HEAD^2 command; instead configure the checkout
step to fetch the correct ref (for example set with: fetch-depth: 0 and/or ref:
${{ github.sha }} or use pull_request context when appropriate) so the workflow
no longer depends on a merge-commit HEAD^2 assumption.
In `@CONTRIBUTING.md`:
- Around line 9-11: Update the external Code of Conduct link in CONTRIBUTING.md
to point to the local repository's CODE_OF_CONDUCT.md file instead of the
appwrite/appwrite URL; locate the heading "## Code of Conduct" and replace the
hyperlink target so it references the local ./CODE_OF_CONDUCT.md (or simply
CODE_OF_CONDUCT.md) to ensure the doc links to this repo's file.
In `@Dockerfile`:
- Line 1: Replace the outdated base image reference "FROM composer:2.0" with a
newer Composer 2.x tag (e.g., "composer:2" for latest 2.x or a specific recent
patch like "composer:2.8") in the Dockerfile; update the FROM line to use the
chosen tag so the build uses the improved, patched Composer image and then
rebuild/test the image to confirm compatibility (locate the line containing FROM
composer:2.0 in the Dockerfile to change).
- Around line 11-24: The Dockerfile currently runs the container as root; create
and switch to a non-root user for safety by adding a system group/user (e.g.,
group "app" and user "app") using Alpine-compatible commands, ensure ownership
of copied directories (/code, /code/vendor, /code/src, /code/tests) is changed
to that user, and add a final USER app directive so the container process (CMD)
runs as the non-root user; update the Dockerfile around the COPY steps and final
CMD to create/chown and then set USER.
In `@README.md`:
- Around line 6-8: Update the README.md text to use the hyphenated compound
adjective "dependency-free" where the phrase "dependency free" appears (in the
sentence starting "Although this library is part of the Utopia Framework..."),
so it reads "dependency-free and can be used as standalone..." to follow
compound-adjective grammar conventions.
In `@src/Usage/Metric.php`:
- Around line 19-22: Add a PHPDoc comment above the Metric::getId() method
explaining that for the current ClickHouse adapter its parseRows implementation
and all SELECT queries do not populate 'id' or '$id', so getId() will always
return an empty string; state this is kept for potential future adapters or
legacy support and leave the method signature (public function getId(): string)
unchanged. Mention the ClickHouse adapter's parseRows method as the source of
this behavior so readers know why IDs are not present, and keep behavior intact
rather than removing the method.
In `@tests/Usage/MetricTest.php`:
- Around line 29-30: Add unit tests in MetricTest to cover getValue edge cases:
add assertions that Metric constructed with ['value' => 42.5] returns 42.5, with
['value' => '42'] returns 42 (or numeric cast), with ['value' => '42.5'] returns
42.5, and with ['value' => 'abc'] returns the default value (e.g., null or 0
based on Metric::getValue behavior); use the existing pattern (instantiate
Metric and call getValue) and assert expected results with assertEquals to
ensure numeric strings are handled and non-numeric strings fall back to the
default.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 509c6c5b-bd08-4efa-8748-68efeba3c640
⛔ Files ignored due to path filters (1)
composer.lockis excluded by!**/*.lock
📒 Files selected for processing (20)
.github/workflows/codeql-analysis.yml.github/workflows/linter.yml.github/workflows/tests.yml.gitignoreCODE_OF_CONDUCT.mdCONTRIBUTING.mdDockerfileLICENSE.mdREADME.mdcomposer.jsondocker-compose.ymlphpunit.xmlpint.jsonsrc/Usage/Adapter.phpsrc/Usage/Adapter/ClickHouse.phpsrc/Usage/Metric.phpsrc/Usage/Usage.phptests/Usage/Adapter/ClickHouseTest.phptests/Usage/MetricTest.phptests/Usage/UsageUnitTest.php
| $result = true; | ||
|
|
||
| if (!empty($this->eventBuffer)) { | ||
| $result = $this->adapter->addEvents($this->eventBuffer); | ||
| } | ||
|
|
||
| if ($result && !empty($this->gaugeBuffer)) { | ||
| $result = $this->adapter->setGauges(array_values($this->gaugeBuffer)); | ||
| } | ||
|
|
||
| $this->eventBuffer = []; | ||
| $this->gaugeBuffer = []; | ||
| $this->bufferCount = 0; | ||
| $this->lastFlushTime = microtime(true); | ||
|
|
||
| return $result; |
There was a problem hiding this comment.
Don't clear the buffers when a write returned false.
The adapter contract is boolean, but Line 419 clears both buffers unconditionally. If addEvents() or setGauges() returns false, you silently drop pending usage data; if events succeed and gauges fail, the gauge batch is lost.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Usage/Usage.php` around lines 409 - 424, The current flush logic clears
both $this->eventBuffer and $this->gaugeBuffer unconditionally even when
$this->adapter->addEvents(...) or $this->adapter->setGauges(...) returns false;
change it so each buffer is cleared only if its corresponding adapter call
returned true (i.e. if $eventsResult from addEvents is true then set
$this->eventBuffer = [] and decrement buffer counts accordingly, and if
$gaugesResult from setGauges is true then set $this->gaugeBuffer = []), update
$this->bufferCount to reflect any remaining items, and only update
$this->lastFlushTime if at least one write succeeded; use the existing symbols
($this->eventBuffer, $this->gaugeBuffer, $this->adapter->addEvents,
$this->adapter->setGauges, $this->bufferCount, $this->lastFlushTime) to locate
and apply the changes.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (3)
src/Usage/Adapter/ClickHouse.php (3)
597-600:⚠️ Potential issue | 🔴 CriticalReject unknown filters instead of silently dropping them.
Line 598 currently
continues. A typo can widen the query and, inpurgeTable(), degrade intoWHERE 1=1.Suggested fix
- if (!in_array($key, $validDimensions, true)) { - continue; - } + if (!in_array($key, $validDimensions, true)) { + throw new \InvalidArgumentException("Invalid {$type} filter '{$key}'"); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 597 - 600, The loop over $filters should reject unknown filter keys instead of silently continuing; update the foreach that checks $key against $validDimensions to collect any invalid keys (using the existing $filters and $validDimensions variables) and throw an InvalidArgumentException (or return a clear error) listing the invalid keys so callers (including purgeTable()) cannot accidentally widen the query to WHERE 1=1; ensure the validation happens before building the WHERE clause so purgeTable() and other callers fail fast on typos.
134-136:⚠️ Potential issue | 🔴 CriticalInclude all queryable dimensions in MergeTree
ORDER BYkeys.Line 135 and Line 167 still omit dimensions that are queryable (
userAgent,tagsfor events;tagsfor gauges). With Summing/ReplacingMergeTree, that can collapse distinct series during merges.Suggested fix
- $orderKey = $this->sharedTables - ? '(tenant, projectId, metric, region, resource, resourceId, path, method, status, timestamp)' - : '(projectId, metric, region, resource, resourceId, path, method, status, timestamp)'; + $orderKey = $this->sharedTables + ? '(tenant, projectId, metric, region, resource, resourceId, path, method, status, userAgent, tags, timestamp)' + : '(projectId, metric, region, resource, resourceId, path, method, status, userAgent, tags, timestamp)'; @@ - $gaugeOrderKey = $this->sharedTables - ? '(tenant, projectId, metric, region, resource, resourceId, period, time)' - : '(projectId, metric, region, resource, resourceId, period, time)'; + $gaugeOrderKey = $this->sharedTables + ? '(tenant, projectId, metric, region, resource, resourceId, tags, period, time)' + : '(projectId, metric, region, resource, resourceId, tags, period, time)';Also applies to: 166-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 134 - 136, The ORDER BY key ($orderKey) for MergeTree tables is missing queryable dimensions and can collapse distinct series; update the ORDER BY definitions in ClickHouse.php to include the queryable fields: add userAgent and tags to the events $orderKey (both shared and non-shared variants) and add tags to the gauges ORDER BY key used for Summing/ReplacingMergeTree so that all queryable dimensions are part of the sort key (ensure you update both tenant-aware and non-tenant variants where $orderKey and the gauges order key are defined).
687-689:⚠️ Potential issue | 🟠 MajorPreserve fractional aggregate values in
parseRows().Line 688 forces
valuetoint, truncating results foravg(e.g.,1.75→1).Suggested fix
- if (isset($row['value'])) { - $row['value'] = (int) $row['value']; - } + if (isset($row['value']) && is_numeric($row['value'])) { + $row['value'] += 0; + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 687 - 689, In parseRows(), the code currently forces $row['value'] to an int which truncates fractional aggregates like AVG; update the logic to preserve fractional values by checking numeric form and casting appropriately: if isset($row['value']) and is_numeric($row['value']), cast to float when the value contains a decimal point or exponent (or when (float)$value != (int)$value), otherwise cast to int for whole numbers; replace the unconditional (int) $row['value'] with this conditional numeric cast so AVG results remain fractional while counts remain integers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 604-611: When building IN clauses for array filter values, skip
adding a condition when the array is empty to avoid producing invalid SQL like
"column IN ()"; in the block that iterates over is_array($value) (referencing
$value, $paramCounter, $params, $conditions, $escaped), detect empty($value) and
either continue/omit pushing a condition or add a safe always-false condition
(e.g., "0 = 1") depending on desired semantics, ensuring you do not generate an
empty IN () and that $paramCounter and $params are only updated for non-empty
arrays.
---
Duplicate comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 597-600: The loop over $filters should reject unknown filter keys
instead of silently continuing; update the foreach that checks $key against
$validDimensions to collect any invalid keys (using the existing $filters and
$validDimensions variables) and throw an InvalidArgumentException (or return a
clear error) listing the invalid keys so callers (including purgeTable()) cannot
accidentally widen the query to WHERE 1=1; ensure the validation happens before
building the WHERE clause so purgeTable() and other callers fail fast on typos.
- Around line 134-136: The ORDER BY key ($orderKey) for MergeTree tables is
missing queryable dimensions and can collapse distinct series; update the ORDER
BY definitions in ClickHouse.php to include the queryable fields: add userAgent
and tags to the events $orderKey (both shared and non-shared variants) and add
tags to the gauges ORDER BY key used for Summing/ReplacingMergeTree so that all
queryable dimensions are part of the sort key (ensure you update both
tenant-aware and non-tenant variants where $orderKey and the gauges order key
are defined).
- Around line 687-689: In parseRows(), the code currently forces $row['value']
to an int which truncates fractional aggregates like AVG; update the logic to
preserve fractional values by checking numeric form and casting appropriately:
if isset($row['value']) and is_numeric($row['value']), cast to float when the
value contains a decimal point or exponent (or when (float)$value !=
(int)$value), otherwise cast to int for whole numbers; replace the unconditional
(int) $row['value'] with this conditional numeric cast so AVG results remain
fractional while counts remain integers.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 89ae55e6-6108-4dbc-9712-4241bda0c437
📒 Files selected for processing (2)
src/Usage/Adapter/ClickHouse.phpsrc/Usage/Usage.php
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Usage/Usage.php
- Add millisecond offsets to events without explicit timestamps in addEvents() so each row gets a unique ORDER BY key and is not collapsed by SummingMergeTree - Add ceilDateTime() for end-date range queries: rounds up to .999ms when no sub-second precision is given, so events within the same second are not excluded by the upper bound - Normalize time keys in getMetrics() to handle ClickHouse DateTime vs DateTime64 format differences in zero-fill bucket matching https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (4)
src/Usage/Adapter/ClickHouse.php (4)
609-612:⚠️ Potential issue | 🔴 CriticalFail fast on unknown filter keys instead of silently widening queries.
Silently
continueon invalid filters makes read results broader, and in purge flows can turn typos into destructive deletes.Suggested fix
foreach ($filters as $key => $value) { if (!in_array($key, $validDimensions, true)) { - continue; + throw new \InvalidArgumentException("Invalid {$type} filter '{$key}'"); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 609 - 612, The loop over $filters currently skips unknown keys which widens queries; instead validate and fail fast by checking each $key against $validDimensions and throwing a clear exception (e.g. InvalidArgumentException) when an invalid key is encountered; update the foreach block that iterates $filters (the code referencing $filters and $validDimensions in ClickHouse.php) to raise an error with the offending $key so callers (purge/read flows) cannot proceed with a silently broadened query.
134-136:⚠️ Potential issue | 🔴 CriticalInclude all queryable dimensions in MergeTree sort keys to prevent unintended row collapse.
eventsORDER BY still omitsuserAgentandtags, andgaugesORDER BY omitstags. ForSummingMergeTree/ReplacingMergeTree, that can merge or replace logically distinct rows that differ only on those dimensions.Suggested fix
- $orderKey = $this->sharedTables - ? '(tenant, projectId, metric, region, resource, resourceId, path, method, status, timestamp)' - : '(projectId, metric, region, resource, resourceId, path, method, status, timestamp)'; + $orderKey = $this->sharedTables + ? '(tenant, projectId, metric, region, resource, resourceId, path, method, status, userAgent, tags, timestamp)' + : '(projectId, metric, region, resource, resourceId, path, method, status, userAgent, tags, timestamp)'; - $gaugeOrderKey = $this->sharedTables - ? '(tenant, projectId, metric, region, resource, resourceId, period, time)' - : '(projectId, metric, region, resource, resourceId, period, time)'; + $gaugeOrderKey = $this->sharedTables + ? '(tenant, projectId, metric, region, resource, resourceId, tags, period, time)' + : '(projectId, metric, region, resource, resourceId, tags, period, time)';ClickHouse official docs: In SummingMergeTree and ReplacingMergeTree, is merge/dedup identity defined by ORDER BY (sorting key), and can rows differing only in non-key columns be collapsed/replaced?Also applies to: 166-168
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 134 - 136, The ORDER BY key built in the $orderKey assignment and the MergeTree definitions for the events and gauges tables are missing queryable dimensions (events: userAgent and tags; gauges: tags), which can cause SummingMergeTree/ReplacingMergeTree to merge distinct rows that differ only by those columns; update the $orderKey construction (respecting the $this->sharedTables branch) and the ORDER BY clauses used for the events and gauges table definitions to include userAgent and tags so the sorting/merge identity includes all queryable dimensions (refer to the $orderKey variable, events table and gauges table / SummingMergeTree or ReplacingMergeTree usages) and ensure both branches (sharedTables true/false) include those fields.
142-143:⚠️ Potential issue | 🟠 Major
int|floatadapter contract is violated by int-only storage and coercion.Values are forced to
Int64and repeatedly cast toint, which drops fractional usage and truncates aggregate results (notablyavg).Suggested fix
- value Int64, + value Float64, ... - value Int64, + value Float64, ... - 'value' => (int) ($event['value'] ?? 0), + 'value' => (float) ($event['value'] ?? 0), ... - 'value' => (int) ($gauge['value'] ?? 0), + 'value' => (float) ($gauge['value'] ?? 0), ... - return (int) ($json['data'][0]['total'] ?? 0); + $total = $json['data'][0]['total'] ?? 0; + return is_numeric($total) ? $total + 0 : 0; ... - return (int) ($json['data'][0]['value'] ?? 0); + $value = $json['data'][0]['value'] ?? 0; + return is_numeric($value) ? $value + 0 : 0; ... - if (isset($row['value'])) { - $row['value'] = (int) $row['value']; + if (isset($row['value']) && is_numeric($row['value'])) { + $row['value'] += 0; }#!/bin/bash # Verify declared contract vs int-only coercion in ClickHouse adapter paths. rg -n "int\|float" src/Usage/Adapter.php -C2 rg -n "value Int64|\\(int\\) \\(\\$event\\['value'\\]|\\(int\\) \\(\\$gauge\\['value'\\]|\\(int\\) \\(\\$json\\['data'\\]\\[0\\]\\['total'\\]|\\(int\\) \\(\\$json\\['data'\\]\\[0\\]\\['value'\\]|\\$row\\['value'\\] = \\(int\\)" src/Usage/Adapter/ClickHouse.phpAlso applies to: 175-176, 539-540, 578-579, 363-364, 478-479, 699-701
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 142 - 143, The ClickHouse adapter violates the declared int|float contract by storing value as Int64 and coercing inputs with (int), truncating floats; update the ClickHouse schema to use Float64 for the value column instead of Int64 (change lines around "value Int64" to "value Float64") and remove all integer casts in ClickHouse.php (replace occurrences like (int) ($event['value']), (int) ($gauge['value']), (int) ($json['data'][0]['value']), and assignments like $row['value'] = (int) with safe float-preserving conversions such as (float) or floatval), ensuring functions/methods that build rows/insert payloads in ClickHouse.php preserve fractional values so aggregates like avg() are correct.
616-624:⚠️ Potential issue | 🟠 MajorGuard empty array filters to avoid invalid
IN ()SQL.An empty array currently builds
column IN (), which is invalid SQL.Suggested fix
if (is_array($value)) { + if ($value === []) { + $conditions[] = '1 = 0'; + continue; + } $inParts = []; foreach ($value as $v) { $pn = 'p_' . $paramCounter++; $inParts[] = "{{$pn}:String}"; $params[$pn] = (string) $v; } $conditions[] = "{$escaped} IN (" . implode(', ', $inParts) . ")";🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 616 - 624, The is_array($value) branch builds "column IN ()" for empty arrays which produces invalid SQL; inside that branch (where $paramCounter, $params, $conditions and $escaped are used) add a guard for empty arrays — if empty either skip adding the condition or append a safe false condition (e.g. "0=1" or "{$escaped} IS NULL AND 0=1") to $conditions — otherwise proceed to build $inParts and populate $params as before.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 794-835: The insert() method currently sends writes in a single
shot; add the same retry-with-exponential-backoff logic used by query() to make
inserts resilient to transient network errors and retryable HTTP responses (5xx
and 429). Wrap the $this->client->fetch(...) call inside a retry loop that
catches transport exceptions and non-200 responses, retrying up to a
configurable max attempts with increasing delays (exponential jitter), and only
throw after exhausting attempts; ensure headers added (X-ClickHouse-Database,
Content-Type, Connection) are still removed in the finally block and reuse the
existing retry/backoff configuration or helper used by query() to keep behavior
consistent.
---
Duplicate comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 609-612: The loop over $filters currently skips unknown keys which
widens queries; instead validate and fail fast by checking each $key against
$validDimensions and throwing a clear exception (e.g. InvalidArgumentException)
when an invalid key is encountered; update the foreach block that iterates
$filters (the code referencing $filters and $validDimensions in ClickHouse.php)
to raise an error with the offending $key so callers (purge/read flows) cannot
proceed with a silently broadened query.
- Around line 134-136: The ORDER BY key built in the $orderKey assignment and
the MergeTree definitions for the events and gauges tables are missing queryable
dimensions (events: userAgent and tags; gauges: tags), which can cause
SummingMergeTree/ReplacingMergeTree to merge distinct rows that differ only by
those columns; update the $orderKey construction (respecting the
$this->sharedTables branch) and the ORDER BY clauses used for the events and
gauges table definitions to include userAgent and tags so the sorting/merge
identity includes all queryable dimensions (refer to the $orderKey variable,
events table and gauges table / SummingMergeTree or ReplacingMergeTree usages)
and ensure both branches (sharedTables true/false) include those fields.
- Around line 142-143: The ClickHouse adapter violates the declared int|float
contract by storing value as Int64 and coercing inputs with (int), truncating
floats; update the ClickHouse schema to use Float64 for the value column instead
of Int64 (change lines around "value Int64" to "value Float64") and remove all
integer casts in ClickHouse.php (replace occurrences like (int)
($event['value']), (int) ($gauge['value']), (int) ($json['data'][0]['value']),
and assignments like $row['value'] = (int) with safe float-preserving
conversions such as (float) or floatval), ensuring functions/methods that build
rows/insert payloads in ClickHouse.php preserve fractional values so aggregates
like avg() are correct.
- Around line 616-624: The is_array($value) branch builds "column IN ()" for
empty arrays which produces invalid SQL; inside that branch (where
$paramCounter, $params, $conditions and $escaped are used) add a guard for empty
arrays — if empty either skip adding the condition or append a safe false
condition (e.g. "0=1" or "{$escaped} IS NULL AND 0=1") to $conditions —
otherwise proceed to build $inParts and populate $params as before.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 024cc42d-d4be-43cf-aafa-ce0b032402de
📒 Files selected for processing (1)
src/Usage/Adapter/ClickHouse.php
| private function insert(string $table, array $rows): void | ||
| { | ||
| if (empty($rows)) { | ||
| return; | ||
| } | ||
|
|
||
| $scheme = $this->secure ? 'https' : 'http'; | ||
| $escaped = $this->esc($table); | ||
| $queryParams = ['query' => "INSERT INTO {$escaped} FORMAT JSONEachRow"]; | ||
| $url = "{$scheme}://{$this->host}:{$this->port}/?" . http_build_query($queryParams); | ||
|
|
||
| $this->client->addHeader('X-ClickHouse-Database', $this->database); | ||
| $this->client->addHeader('Content-Type', 'application/x-ndjson'); | ||
| $this->client->addHeader('Connection', 'keep-alive'); | ||
|
|
||
| $jsonRows = array_map(function ($row) { | ||
| $encoded = json_encode($row); | ||
| if ($encoded === false) { | ||
| throw new Exception("Failed to JSON encode row: " . json_last_error_msg()); | ||
| } | ||
| return $encoded; | ||
| }, $rows); | ||
|
|
||
| $body = implode("\n", $jsonRows); | ||
|
|
||
| try { | ||
| $response = $this->client->fetch( | ||
| url: $url, | ||
| method: Client::METHOD_POST, | ||
| body: $body, | ||
| ); | ||
|
|
||
| $httpCode = $response->getStatusCode(); | ||
| if ($httpCode !== 200) { | ||
| $responseBody = $response->getBody(); | ||
| $responseBody = is_string($responseBody) ? $responseBody : ''; | ||
| throw new Exception("ClickHouse insert HTTP {$httpCode}: {$responseBody}"); | ||
| } | ||
| } finally { | ||
| $this->client->removeHeader('Content-Type'); | ||
| } | ||
| } |
There was a problem hiding this comment.
Add retry/backoff to insert() to match resilience guarantees in query().
Reads are retried, but writes are single-shot. A transient network or 5xx/429 on insert can drop an entire batch.
Suggested fix
- try {
- $response = $this->client->fetch(
- url: $url,
- method: Client::METHOD_POST,
- body: $body,
- );
-
- $httpCode = $response->getStatusCode();
- if ($httpCode !== 200) {
- $responseBody = $response->getBody();
- $responseBody = is_string($responseBody) ? $responseBody : '';
- throw new Exception("ClickHouse insert HTTP {$httpCode}: {$responseBody}");
- }
- } finally {
+ try {
+ $attempt = 0;
+ while (true) {
+ try {
+ $response = $this->client->fetch(
+ url: $url,
+ method: Client::METHOD_POST,
+ body: $body,
+ );
+ $httpCode = $response->getStatusCode();
+ if ($httpCode !== 200) {
+ $responseBody = $response->getBody();
+ $responseBody = is_string($responseBody) ? $responseBody : '';
+ throw new Exception("ClickHouse insert HTTP {$httpCode}: {$responseBody}");
+ }
+ break;
+ } catch (Exception $e) {
+ if ($attempt >= $this->maxRetries || !$this->isRetryable($e)) {
+ throw $e;
+ }
+ $attempt++;
+ usleep($this->retryDelay * (2 ** ($attempt - 1)) * 1000);
+ }
+ }
+ } finally {
$this->client->removeHeader('Content-Type');
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/Usage/Adapter/ClickHouse.php` around lines 794 - 835, The insert() method
currently sends writes in a single shot; add the same
retry-with-exponential-backoff logic used by query() to make inserts resilient
to transient network errors and retryable HTTP responses (5xx and 429). Wrap the
$this->client->fetch(...) call inside a retry loop that catches transport
exceptions and non-200 responses, retrying up to a configurable max attempts
with increasing delays (exponential jitter), and only throw after exhausting
attempts; ensure headers added (X-ClickHouse-Database, Content-Type, Connection)
are still removed in the finally block and reuse the existing retry/backoff
configuration or helper used by query() to keep behavior consistent.
CI Workflows: - Upgrade actions/checkout from v3 to v4 - Remove broken `git checkout HEAD^2` that fails on non-merge commits - Fix codeql-analysis.yml: use `composer lint` instead of nonexistent `composer check` Buffer safety (Usage.php): - Include resourceId, region, and period in gauge buffer key to prevent incorrect overwrites of distinct gauge entries - Only clear buffers on successful flush; retain data for retry on failure - Auto-flush before mutating adapter context (namespace/tenant/sharedTables) so buffered rows are written with their original context Validation (ClickHouse.php): - Reject unknown filter dimensions with InvalidArgumentException instead of silently skipping them, preventing accidentally widened queries Metric.php: - Use unary + operator instead of (int) cast for numeric strings to preserve decimal values (e.g. "3.14" now returns 3.14 not 3) Minor: - Dockerfile: update composer:2.0 to composer:2 - README: use hyphenated "dependency-free" compound adjective - CONTRIBUTING: link to local CODE_OF_CONDUCT.md instead of external URL https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
- Guard against empty array filter values producing invalid IN () SQL; short-circuit to match-nothing condition instead - Use +0 instead of (int) cast for query result values to preserve fractional aggregates (e.g. avg returns floats) - Add retry with exponential backoff to insert(), matching the existing read-path retry in query(), so transient network errors or 5xx/429 responses don't silently drop entire batches https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (9)
src/Usage/Metric.php (1)
107-110:hasAttribute()returnsfalsefor keys withnullvalues.Using
isset()returnsfalsewhen a key exists but its value isnull. If distinguishing between "key absent" and "key present with null value" matters, consider usingarray_key_exists()oroffsetExists().♻️ Optional fix if null-valued keys should return true
public function hasAttribute(string $name): bool { - return isset($this[$name]); + return $this->offsetExists($name); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Metric.php` around lines 107 - 110, The hasAttribute method currently uses isset($this[$name]) which returns false for existing keys with null values; change it to check presence rather than truthiness by using array_key_exists($name, $this->attributes) or the existing ArrayAccess method offsetExists($name) (i.e. replace the isset(...) call in Metric::hasAttribute with array_key_exists(...) or $this->offsetExists($name) depending on how the Metric stores attributes) so keys that exist with null values return true.src/Usage/Usage.php (3)
151-158:bufferCountincrements even when updating an existing gauge key.When
collectGauge()updates an existing key ingaugeBuffer,bufferCountstill increments. This meansbufferCounttracks total calls rather than actual buffered items, which could triggershouldFlush()prematurely. Consider incrementing only for new keys, or document thatbufferCountrepresents call count, not item count.♻️ Increment only for new keys
public function collectGauge(array $gauge): self { $key = ($gauge['metric'] ?? '') . ':' . ($gauge['resource'] ?? '') . ':' . ($gauge['projectId'] ?? '') . ':' . ($gauge['resourceId'] ?? '') . ':' . ($gauge['region'] ?? '') . ':' . ($gauge['period'] ?? ''); + if (!isset($this->gaugeBuffer[$key])) { + $this->bufferCount++; + } $this->gaugeBuffer[$key] = $gauge; - $this->bufferCount++; return $this; }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Usage.php` around lines 151 - 158, The bufferCount currently increments on every call to collectGauge() causing it to reflect call count rather than the number of buffered items; change collectGauge() so it only increments $this->bufferCount when the computed $key does not already exist in $this->gaugeBuffer (i.e., check isset($this->gaugeBuffer[$key]) before incrementing), or alternatively update shouldFlush() to rely on count($this->gaugeBuffer) instead of $this->bufferCount; target the collectGauge method and the $gaugeBuffer/$bufferCount fields when applying the fix.
368-392:generateTimeBuckets()could produce many buckets for large date ranges.For a 1-year range with hourly buckets, this generates ~8,760 entries. The limit at Line 246 is
count($timeBuckets) + 1, which could exceed reasonable query limits. Consider adding a cap or warning for very large ranges.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Usage.php` around lines 368 - 392, generateTimeBuckets() can produce excessive bucket counts (e.g., ~8,760 hourly buckets for a year) which may exceed query limits where count($timeBuckets)+1 is used; modify generateTimeBuckets to enforce a configurable maximum bucket cap (e.g., MAX_BUCKETS constant or parameter) and either truncate the returned array to that cap or throw/return an explicit error when the requested range would exceed it, and update callers (where count($timeBuckets) + 1 is used) to handle the capped result or error; locate the function generateTimeBuckets and implement the cap check before building/returning buckets, exposing the maximum via configuration or an optional argument so tests can override it.
462-470:getBufferCount()andgetBufferSize()have different semantics that may confuse callers.
getBufferCount()returns totalcollect*()calls, whilegetBufferSize()returns actual buffered items. This inconsistency could lead to incorrect assumptions. Consider aligning these or clearly documenting the difference.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Usage.php` around lines 462 - 470, The two methods expose inconsistent semantics: getBufferCount() returns the cumulative collect*() call counter ($this->bufferCount) while getBufferSize() returns the actual number of buffered items (count($this->eventBuffer) + count($this->gaugeBuffer)); align them by making getBufferCount() return the actual buffered item count (use count($this->eventBuffer) + count($this->gaugeBuffer)) so both methods report the same concept, and if you still need the cumulative collect call counter keep bufferCount but add a new method getCollectCallCount() (or rename) to expose $this->bufferCount to avoid breaking callers—update usages of getBufferCount() accordingly.src/Usage/Adapter/ClickHouse.php (4)
461-479:getGauge()may truncate fractional values.Similar to
sumEvents(), Line 478 casts to(int)while the return type isint|float. For consistency withparseRows()which uses+0, consider the same approach here.♻️ Preserve potential fractional gauge values
- return (int) ($json['data'][0]['value'] ?? 0); + return ($json['data'][0]['value'] ?? 0) + 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 461 - 479, The getGauge() method currently forces an integer return by casting the retrieved value with (int), which will truncate fractional gauge values; update the return conversion to preserve decimals (use the same numeric conversion approach as parseRows(), e.g. coerce $json['data'][0]['value'] to a numeric type without forcing an int) so getGauge() returns int|float correctly while keeping the existing null/default handling.
966-976: Retry detection relies on substring matching, which may miss some transient errors.The
isRetryable()method checks for specific substrings in error messages. This is pragmatic but could miss errors with different wording or localized messages. Consider also checking HTTP status codes directly if available.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 966 - 976, The isRetryable(Exception $e) method currently relies solely on substring matches of $e->getMessage(); enhance it to also inspect structured error info where available: if $e is an instance of any HTTP/Client exception or a ClickHouse-specific exception (e.g., check for classes like ClientException, HttpException, PDOException or the adapter's ClickHouseException), extract and evaluate the HTTP status code or numeric SQL/driver error code (treat 429, 502–504 and other transient numeric codes as retryable) and treat network/connection exception classes as retryable; keep the existing substring fallback in isRetryable() so you cover localized messages and unknown exceptions.
534-556: Event value is coerced toInt64, which truncates fractional values.Line 539 casts
valueto(int), and the schema definesvalue Int64. If the API needs to support fractional event values (e.g.,0.5units), the schema would needFloat64instead. Current design is internally consistent but worth documenting.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 534 - 556, The buildEventRow currently casts 'value' to int in buildEventRow which truncates fractions; change the cast to float (use (float) ($event['value'] ?? 0.0) in buildEventRow) and update the ClickHouse table schema/type for the value column from Int64 to Float64 (and any related type annotations, migrations, or query expectations) so fractional values are preserved; run/update tests and any aggregation logic that assumes integer semantics to ensure correctness after the type change.
341-364:sumEvents()may truncate fractional aggregates.The return type is
int|float, but Line 363 casts to(int). ClickHouse'ssum()ofInt64values returns an integer, so this is currently correct. However, if the schema changes to support fractional values (Float64), this cast would truncate results.♻️ Preserve potential fractional sums
- return (int) ($json['data'][0]['total'] ?? 0); + return ($json['data'][0]['total'] ?? 0) + 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 341 - 364, In sumEvents(), avoid forcibly casting the ClickHouse aggregate to (int) which truncates fractional sums; locate the return in sumEvents() that currently does "return (int) ($json['data'][0]['total'] ?? 0);" and change it to preserve fractional values (e.g. read $json['data'][0]['total'] ?? 0 into a $total variable and coerce to a numeric type without truncation like $total + 0 or (float) when appropriate) so the method still returns int|float based on the actual value from query()/json decode; keep the null-to-zero fallback.Dockerfile (1)
1-24: Consider adding a non-root user for defense-in-depth.The Trivy static analysis flags that the container runs as root. While this appears to be a dev/test container (given the
tail -f /dev/nullCMD), adding a non-root user is a good security practice even for development images.🔧 Optional improvement
FROM php:8.3.3-cli-alpine3.19 as final LABEL maintainer="team@appwrite.io" +RUN addgroup -g 1000 appwrite && adduser -u 1000 -G appwrite -D appwrite + WORKDIR /code COPY --from=step0 /src/vendor /code/vendor # Add Source Code COPY ./tests /code/tests COPY ./src /code/src COPY ./phpunit.xml /code/phpunit.xml +RUN chown -R appwrite:appwrite /code +USER appwrite + CMD [ "tail", "-f", "/dev/null" ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@Dockerfile` around lines 1 - 24, Add a non-root user in the final stage: create a group and a non-root user (e.g., "app") using Alpine commands (addgroup/adduser), chown the WORKDIR (/code) and copied artifacts (including /code/vendor) to that user, and set USER to that non-root user before the CMD; update the final stage around the existing FROM php:8.3.3-cli-alpine3.19, WORKDIR /code, and COPY --from=step0 /src/vendor /code/vendor so all runtime files are owned by the non-root user and the container no longer runs as root.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 287-292: The README example calls a non-existent method
setAsyncInserts() on $adapter; either remove that call from the README or
implement the method on the ClickHouse adapter class (e.g., add
setAsyncInserts(bool $enabled, bool $waitForConfirmation = false) to the
adapter), or replace the example with existing configuration methods such as
$adapter->setMaxRetries(...), $adapter->setRetryDelay(...), or
$adapter->setTimeout(...) so the Usage setup() example uses valid API calls
(update the README lines showing $adapter->setAsyncInserts(...) and ensure Usage
instantiation and setup() remain unchanged).
---
Nitpick comments:
In `@Dockerfile`:
- Around line 1-24: Add a non-root user in the final stage: create a group and a
non-root user (e.g., "app") using Alpine commands (addgroup/adduser), chown the
WORKDIR (/code) and copied artifacts (including /code/vendor) to that user, and
set USER to that non-root user before the CMD; update the final stage around the
existing FROM php:8.3.3-cli-alpine3.19, WORKDIR /code, and COPY --from=step0
/src/vendor /code/vendor so all runtime files are owned by the non-root user and
the container no longer runs as root.
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 461-479: The getGauge() method currently forces an integer return
by casting the retrieved value with (int), which will truncate fractional gauge
values; update the return conversion to preserve decimals (use the same numeric
conversion approach as parseRows(), e.g. coerce $json['data'][0]['value'] to a
numeric type without forcing an int) so getGauge() returns int|float correctly
while keeping the existing null/default handling.
- Around line 966-976: The isRetryable(Exception $e) method currently relies
solely on substring matches of $e->getMessage(); enhance it to also inspect
structured error info where available: if $e is an instance of any HTTP/Client
exception or a ClickHouse-specific exception (e.g., check for classes like
ClientException, HttpException, PDOException or the adapter's
ClickHouseException), extract and evaluate the HTTP status code or numeric
SQL/driver error code (treat 429, 502–504 and other transient numeric codes as
retryable) and treat network/connection exception classes as retryable; keep the
existing substring fallback in isRetryable() so you cover localized messages and
unknown exceptions.
- Around line 534-556: The buildEventRow currently casts 'value' to int in
buildEventRow which truncates fractions; change the cast to float (use (float)
($event['value'] ?? 0.0) in buildEventRow) and update the ClickHouse table
schema/type for the value column from Int64 to Float64 (and any related type
annotations, migrations, or query expectations) so fractional values are
preserved; run/update tests and any aggregation logic that assumes integer
semantics to ensure correctness after the type change.
- Around line 341-364: In sumEvents(), avoid forcibly casting the ClickHouse
aggregate to (int) which truncates fractional sums; locate the return in
sumEvents() that currently does "return (int) ($json['data'][0]['total'] ?? 0);"
and change it to preserve fractional values (e.g. read $json['data'][0]['total']
?? 0 into a $total variable and coerce to a numeric type without truncation like
$total + 0 or (float) when appropriate) so the method still returns int|float
based on the actual value from query()/json decode; keep the null-to-zero
fallback.
In `@src/Usage/Metric.php`:
- Around line 107-110: The hasAttribute method currently uses
isset($this[$name]) which returns false for existing keys with null values;
change it to check presence rather than truthiness by using
array_key_exists($name, $this->attributes) or the existing ArrayAccess method
offsetExists($name) (i.e. replace the isset(...) call in Metric::hasAttribute
with array_key_exists(...) or $this->offsetExists($name) depending on how the
Metric stores attributes) so keys that exist with null values return true.
In `@src/Usage/Usage.php`:
- Around line 151-158: The bufferCount currently increments on every call to
collectGauge() causing it to reflect call count rather than the number of
buffered items; change collectGauge() so it only increments $this->bufferCount
when the computed $key does not already exist in $this->gaugeBuffer (i.e., check
isset($this->gaugeBuffer[$key]) before incrementing), or alternatively update
shouldFlush() to rely on count($this->gaugeBuffer) instead of
$this->bufferCount; target the collectGauge method and the
$gaugeBuffer/$bufferCount fields when applying the fix.
- Around line 368-392: generateTimeBuckets() can produce excessive bucket counts
(e.g., ~8,760 hourly buckets for a year) which may exceed query limits where
count($timeBuckets)+1 is used; modify generateTimeBuckets to enforce a
configurable maximum bucket cap (e.g., MAX_BUCKETS constant or parameter) and
either truncate the returned array to that cap or throw/return an explicit error
when the requested range would exceed it, and update callers (where
count($timeBuckets) + 1 is used) to handle the capped result or error; locate
the function generateTimeBuckets and implement the cap check before
building/returning buckets, exposing the maximum via configuration or an
optional argument so tests can override it.
- Around line 462-470: The two methods expose inconsistent semantics:
getBufferCount() returns the cumulative collect*() call counter
($this->bufferCount) while getBufferSize() returns the actual number of buffered
items (count($this->eventBuffer) + count($this->gaugeBuffer)); align them by
making getBufferCount() return the actual buffered item count (use
count($this->eventBuffer) + count($this->gaugeBuffer)) so both methods report
the same concept, and if you still need the cumulative collect call counter keep
bufferCount but add a new method getCollectCallCount() (or rename) to expose
$this->bufferCount to avoid breaking callers—update usages of getBufferCount()
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: d063cb07-84f9-4f7d-a0fa-58efe49e2103
📒 Files selected for processing (9)
.github/workflows/codeql-analysis.yml.github/workflows/linter.yml.github/workflows/tests.ymlCONTRIBUTING.mdDockerfileREADME.mdsrc/Usage/Adapter/ClickHouse.phpsrc/Usage/Metric.phpsrc/Usage/Usage.php
🚧 Files skipped from review as they are similar to previous changes (3)
- .github/workflows/codeql-analysis.yml
- .github/workflows/linter.yml
- .github/workflows/tests.yml
Add userAgent and tags to SummingMergeTree ORDER BY so distinct series are not collapsed during merges. Add tags to ReplacingMergeTree ORDER BY for the same reason. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Rename job ID from 'lint' to 'tests' to match its display name. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Adds the method documented in README that enables ClickHouse server-side async inserts via settings headers. https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/Usage/Adapter/ClickHouse.php (1)
883-885: Uncaught exception on malformed date string.If
$valueis an invalid date string (e.g.,"not-a-date"), theDateTimeconstructor throws an exception. Consider catching and either throwing a domain-specific exception with context, or falling back to current time.♻️ Proposed defensive handling
if (is_string($value)) { - return (new \DateTime($value))->format('Y-m-d H:i:s.v'); + try { + return (new \DateTime($value))->format('Y-m-d H:i:s.v'); + } catch (\Exception $e) { + throw new \InvalidArgumentException("Invalid date string: {$value}", 0, $e); + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 883 - 885, The current conversion that does (new \DateTime($value))->format('Y-m-d H:i:s.v') can throw for malformed strings; wrap that DateTime construction in a try/catch around the DateTime creation (the branch that checks is_string($value) in the ClickHouse adapter) and either return a safe fallback like (new \DateTime())->format('Y-m-d H:i:s.v') or throw a clearer domain-specific exception (e.g., \InvalidArgumentException) that includes the offending $value and context (ClickHouse adapter date formatting) so callers can handle bad input. Ensure you catch \Exception or \Throwable, include $value in the error message, and preserve the original behavior/format when returning a fallback.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/tests.yml:
- Around line 18-19: Update the "Run Tests" workflow step so the Docker exec
call includes the -T flag; specifically locate the step named "Run Tests" and
change the command string that currently starts with "docker compose exec usage
vendor/bin/phpunit --configuration phpunit.xml tests" to include "-T" (e.g.,
"docker compose exec -T usage ...") to ensure the step runs reliably in GitHub
Actions CI.
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 769-772: The URL construction in executeQuery (and similarly in
insert) creates invalid URLs for IPv6 hosts because it doesn't bracket
addresses; update the code that builds $url in executeQuery(string $sql, array
$params) to detect when $this->host contains a colon (and is not already wrapped
in '[' and ']') and wrap it in square brackets before interpolating into
"{$scheme}://{$this->host}:{$this->port}/"; apply the identical change to the
host handling in insert() so IPv6 addresses become "http://[::1]:8123/" instead
of "http://::1:8123/".
---
Nitpick comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 883-885: The current conversion that does (new
\DateTime($value))->format('Y-m-d H:i:s.v') can throw for malformed strings;
wrap that DateTime construction in a try/catch around the DateTime creation (the
branch that checks is_string($value) in the ClickHouse adapter) and either
return a safe fallback like (new \DateTime())->format('Y-m-d H:i:s.v') or throw
a clearer domain-specific exception (e.g., \InvalidArgumentException) that
includes the offending $value and context (ClickHouse adapter date formatting)
so callers can handle bad input. Ensure you catch \Exception or \Throwable,
include $value in the error message, and preserve the original behavior/format
when returning a fallback.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: b137e463-0e14-4faf-afb3-574a89d178bd
📒 Files selected for processing (2)
.github/workflows/tests.ymlsrc/Usage/Adapter/ClickHouse.php
| - name: Run Tests | ||
| run: docker compose exec usage vendor/bin/phpunit --configuration phpunit.xml tests |
There was a problem hiding this comment.
Add -T flag to docker compose exec for CI compatibility.
GitHub Actions runners don't allocate a pseudo-TTY by default. Without the -T flag, docker compose exec may fail or behave unexpectedly in CI environments.
Proposed fix
- name: Run Tests
- run: docker compose exec usage vendor/bin/phpunit --configuration phpunit.xml tests
+ run: docker compose exec -T usage vendor/bin/phpunit --configuration phpunit.xml tests📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - name: Run Tests | |
| run: docker compose exec usage vendor/bin/phpunit --configuration phpunit.xml tests | |
| - name: Run Tests | |
| run: docker compose exec -T usage vendor/bin/phpunit --configuration phpunit.xml tests |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/tests.yml around lines 18 - 19, Update the "Run Tests"
workflow step so the Docker exec call includes the -T flag; specifically locate
the step named "Run Tests" and change the command string that currently starts
with "docker compose exec usage vendor/bin/phpunit --configuration phpunit.xml
tests" to include "-T" (e.g., "docker compose exec -T usage ...") to ensure the
step runs reliably in GitHub Actions CI.
- Include time and tenant in gauge buffer dedup key to prevent overwrites of distinct snapshots - Add non-root user to Dockerfile for production safety - Add getValue() edge case tests for floats, numeric strings, and non-numeric fallback https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
- Update type hints across Adapter, ClickHouse, Usage, and Metric - Change ClickHouse column type from Nullable(UInt64) to Nullable(String) to support string tenant identifiers - Preserve numeric tenants as int when reading back from ClickHouse - Add tests for string and int tenant values https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Summary
This PR introduces a complete usage analytics and metered billing library with pluggable adapter support. It includes a high-performance ClickHouse adapter for storing and querying usage metrics at scale, along with comprehensive test coverage and documentation.
Key Changes
src/Usage/Usage.php): Core manager for collecting and querying usage data with in-memory buffering, auto-flush thresholds, and interval-based flushingsrc/Usage/Adapter.php): Defines the interface for storage backends with support for events (append-only logs) and gauges (point-in-time snapshots)src/Usage/Adapter/ClickHouse.php): Production-ready adapter featuring:src/Usage/Metric.php): ArrayObject-based result wrapper with typed getters for query resultsNotable Implementation Details
https://claude.ai/code/session_01UCN4sKD3Zkdbimn3QgTHSB
Summary by CodeRabbit
New Features
Documentation
Chores
Tests