Skip to content

fix: reduce public content false positives#1633

Merged
HanShaoshuai-k merged 1 commit into
mainfrom
fix/public_content_detect
Jun 29, 2026
Merged

fix: reduce public content false positives#1633
HanShaoshuai-k merged 1 commit into
mainfrom
fix/public_content_detect

Conversation

@HanShaoshuai-k

@HanShaoshuai-k HanShaoshuai-k commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Summary

Reduce public-content deterministic gate false positives for documented API identifiers and conventional placeholders while preserving detection for real JWTs, bearer tokens, and credential-shaped values.

Changes

  • Parse JWT-like findings as actual JWTs by decoding the header and requiring a JSON header with alg, instead of rejecting every long a.b.c dotted value.
  • Keep non-JWT dotted taxonomy such as mail API methods, MIME types, and scope names from being reported or semantically redacted as JWT-like tokens.
  • Allow conventional documentation placeholders such as YOUR_ACCESS_TOKEN, ACCESS_TOKEN_HERE, and YOUR_CLIENT_SECRET.
  • Keep positive controls for real JWTs, real bearer headers, credential assignments, and credential-shaped placeholder lookalikes.

Test Plan

  • go test ./internal/qualitygate/publiccontent -count=1
  • go test ./internal/qualitygate/... -count=1
  • make quality-gate QUALITY_GATE_CHANGED_FROM=upstream/main
  • git diff --check

Related Issues

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection of placeholder-style text so common token and credential examples are no longer flagged incorrectly.
    • Tightened token scanning to better distinguish real JWT-like values from lookalikes, reducing false positives.
    • Preserved non-token dotted text in sanitized excerpts more accurately.
  • Tests

    • Expanded coverage for bearer placeholders, conventional credential placeholders, and JWT-like detection behavior.
    • Updated existing scan tests to keep expected results and redaction behavior aligned.

@github-actions github-actions Bot added the size/M Single-domain feat or fix with limited business impact label Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Refines credential scanning: rules.go gains conventionalNamedPlaceholderValue and conventionalCredentialPlaceholderName to recognize your_<token>/<token>_here patterns. scan.go validates JWT-like regex matches via structural base64url/JSON header decoding, filters placeholder bearer headers, and makes JWT redaction in sanitizeSemanticExcerpt conditional. Tests add four new cases and migrate fixture paths to a consolidated fixtures/ directory.

Changes

Credential scanning improvements

Layer / File(s) Summary
Conventional placeholder name recognition
internal/qualitygate/publiccontent/rules.go
namedPlaceholderValue now delegates to new conventionalNamedPlaceholderValue, which normalizes dashes to underscores and matches your_<credential> / <credential>_here forms via conventionalCredentialPlaceholderName.
JWT validation and bearer placeholder filtering
internal/qualitygate/publiccontent/scan.go
Adds isJWTToken/decodeBase64URLSegment helpers to gate JWT-like regex matches on structural validity; adds isPlaceholderBearerHeader to skip bearer header findings for placeholder values; removes isSchemaDottedIdentifier; makes sanitizeSemanticExcerpt JWT redaction conditional.
New tests and fixture path migrations
internal/qualitygate/publiccontent/scan_test.go
Adds TestScanFileAllowsBearerHeaderPlaceholders, TestSemanticCandidateKeepsNonJWTDottedTaxonomy, TestScanFileAllowsConventionalCredentialPlaceholders, and TestScanFileDetectsCredentialShapedPlaceholderLookalikes; migrates many existing tests from scattered source paths to fixtures/ files.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • larksuite/cli#1582: Modifies the same jwtLikeRE handling in scan.go and placeholder-detection logic in rules.go, including credential/token-shaped placeholder rules and related tests.

Suggested reviewers

  • liangshuo-1

🐇 A rabbit hops through tokens with glee,
your_secret_here is no secret, you see!
JWT headers decoded with care,
Bearer placeholders float free in the air.
False alarms are banished — only real secrets beware! 🔑

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly matches the PR’s main goal of reducing public content false positives.
Description check ✅ Passed The description follows the template and includes summary, changes, test plan, and related issues.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/public_content_detect

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/qualitygate/publiccontent/scan_test.go`:
- Around line 1074-1086: The placeholder coverage in
TestScanFileAllowsConventionalCredentialPlaceholders only exercises
underscore-based variants, so it does not lock down the dash-normalization
behavior. Update this test in scan_test.go to include at least one dashed
placeholder case such as YOUR-API-KEY or ACCESS-TOKEN-HERE, and keep asserting
that ScanFile does not return public_content_generic_credential for those
values. This should verify the new normalization path alongside the existing
credential placeholder examples.
- Around line 991-1002: Tighten TestScanFileAllowsBearerHeaderPlaceholders so it
asserts the scan produces no findings at all for bearer-token placeholder
content, not just that public_content_bearer_header is absent. Update the loop
over ScanFile results to fail on any item, or otherwise verify the returned
slice is empty, so placeholders like those in docs/auth.md cannot slip through
as public_content_generic_credential or any other rule.

In `@internal/qualitygate/publiccontent/scan.go`:
- Around line 432-440: The isPlaceholderBearerHeader helper is mixing an index
computed from the lowercased normalized string with a slice taken from the
original match string, which can misalign offsets and panic on non-ASCII input.
Update the slicing to use the same normalized value used for LastIndex so the
offset stays consistent, and keep the rest of the placeholder check logic in
isPlaceholderBearerHeader unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e0dc5d73-3481-4413-a11b-0c119b1478a7

📥 Commits

Reviewing files that changed from the base of the PR and between 297776e and d7aaa1a.

📒 Files selected for processing (3)
  • internal/qualitygate/publiccontent/rules.go
  • internal/qualitygate/publiccontent/scan.go
  • internal/qualitygate/publiccontent/scan_test.go

Comment thread internal/qualitygate/publiccontent/scan_test.go
Comment thread internal/qualitygate/publiccontent/scan_test.go
Comment thread internal/qualitygate/publiccontent/scan.go
@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 87.75510% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.69%. Comparing base (4c31323) to head (d7aaa1a).
⚠️ Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
internal/qualitygate/publiccontent/scan.go 80.64% 3 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1633      +/-   ##
==========================================
+ Coverage   74.66%   74.69%   +0.03%     
==========================================
  Files         806      808       +2     
  Lines       81391    81527     +136     
==========================================
+ Hits        60769    60896     +127     
- Misses      16089    16093       +4     
- Partials     4533     4538       +5     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@github-actions

Copy link
Copy Markdown

🚀 PR Preview Install Guide

🧰 CLI update

npm i -g https://pkg.pr.new/larksuite/cli/@larksuite/cli@d7aaa1a46cd6a5acdc411e27b90404baac7a49ac

🧩 Skill update

npx skills add larksuite/cli#fix/public_content_detect -y -g

@HanShaoshuai-k HanShaoshuai-k merged commit 30b28cf into main Jun 29, 2026
43 checks passed
@HanShaoshuai-k HanShaoshuai-k deleted the fix/public_content_detect branch June 29, 2026 06:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/M Single-domain feat or fix with limited business impact

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants