Skip to content

[Bug] False positive in pre-publish validator: function-call assignment to token variable is treated as leaked secret #234

@dingyufei615

Description

@dingyufei615

Summary

The pre-publish validator is producing a false positive for normal Python code that assigns a function return value to a variable named token.

Steps To Reproduce

Current behavior

Publishing fails with an error like:

Pre-publish validation failed:
scripts/f2e_mock.py line 1842 contains a value that looks like a secret or token. Replace real credentials with placeholders before publishing.

A representative line that triggers the error is:

token = extract_group_token_value(response, group_choice.group_id)

This line does not contain a hardcoded credential. It only assigns the return value of a function call.

Minimal example

def maybe_generate_group_token(...):
requested_token = secrets.token_hex(20)
...
token = extract_group_token_value(response, group_choice.group_id)
if token:
return token
...

Why this looks like a false positive

The pre-publish validator currently uses a regex-based heuristic in:

  • server/skillhub-domain/src/main/java/com/iflytek/skillhub/domain/skill/validation/BasicPrePublishValidator.java

Relevant rule:

(?i)(api[-]?key|access[-]?key|secret|password|token)\s*[:=]\s*['"]?([A-Za-z0-9_-]{12,})

This regex scans text line-by-line and does not parse Python syntax. As a result:

  • the left-hand side matches token
  • the right-hand side matches the identifier prefix extract_group_token_value
  • the validator treats that identifier as a “secret-like value”

So function calls / identifiers can be mistaken for leaked secrets.

Related code path

The validator is invoked during publish here:

  • server/skillhub-domain/src/main/java/com/iflytek/skillhub/domain/skill/service/SkillPublishService.java

Expected behavior

The validator should block obvious hardcoded credentials, but should not reject:

  • variable assignments
  • function-call return values
  • ordinary identifiers that happen to contain words like token, secret, etc.

Example that should be allowed:

token = extract_group_token_value(response, group_choice.group_id)

Expected Behavior

Suggested regression test

A test case similar to this should pass:

token = extract_group_token_value(response, group_choice.group_id)

while real hardcoded secrets such as:

token = "ghp_xxxxxxxxxxxxxxxxxxxx"
api_key = "sk-xxxxxxxxxxxxxxxxxxxx"

should still fail.

Environment

No response

API Contract Impact

No response

Logs Or Screenshots

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingeffort/m中等改动,存在一定协同成本 / Medium change with noticeable coordination cost.priority/p1高优先级 / High priority triage bucket.risk/high涉及安全、鉴权、迁移或公共契约 / Touches security, auth, migrations, or public contracts.triage/core交由 core maintainer 结合 AI 协同处理 / Issue should be handled by a core maintainer with AI support.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions