Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions packages/core/src/l1/reviewer.ts
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,9 @@ Format: \`CRITICAL (85%)\` or \`WARNING (60%)\`
- **"What if" speculation** — cite concrete code, not hypotheticals
- **Config values** — JSON/YAML values are intentional choices
- **Test patterns** — mocks, stubs, simplified logic are intentional in tests
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Guarded values are not explicitly handled in documentation

Confidence: 🟡 79%

Problem: In packages/core/src/l1/reviewer.ts:463

The documentation for the new Guarded values section does not explicitly explain how to handle early returns, null checks, or ?? fallbacks.

Evidence:

  1. The section mentions that "Guarded values" are not bugs if there is an early return, null check, or ?? fallback that handles the value BEFORE the line you're looking at.
  2. However, the documentation does not provide clear guidance on how to properly handle these cases.
Suggested change
- **Test patterns** mocks, stubs, simplified logic are intentional in tests
const value = /* some expression */;
if (!value) return; // early return
const result = value?.someMethod(); // ?? fallback

Flagged by: r-llama31  |  CodeAgora

- **Guarded values** — if there is an early return, null check, or ?? fallback that handles the value BEFORE the line you're looking at, it is NOT a bug
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Guarded Values Not Considered for Null/Undefined Checking

Confidence: 🟢 92%

Problem: In packages/core/src/l1/reviewer.ts:464-465

The documentation suggests that guarded values (early returns, null checks, or ?? fallbacks) are not bugs, but the logic does not account for these cases when checking for issues like division by zero or null/undefined with ?? or ?. .

Evidence:

  1. Early return, null check, or ?? fallback is not considered as handled value in logic.
  2. This can lead to false positives for issues that are already guarded against.

Suggestion: Modify the logic to check for early returns, null checks, or ?? fallbacks in the surrounding code before flagging issues.

🔍 Individual Reviews (2 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🟡 WARNING)

Problem: In packages/core/src/l1/reviewer.ts:464-465 The documentation suggests that guarded values (early returns, null checks, or ?? fallbacks) are not bugs, but the logic does not account for these cases w...

Evidence:

  • Early return, null check, or ?? fallback is not considered as handled value in logic.
  • This can lead to false positives for issues that are already guarded against.

Suggestion: Modify the logic to check for early returns, null checks, or ?? fallbacks in the surrounding code before flagging issues.

r-llama31 💬 llama-3.1-8b-instant (🟡 WARNING)

Problem: In packages/core/src/l1/reviewer.ts:464-465 The documentation suggests that null/undefined with ?? or ?. is handled when the same expression is used, but the logic does not actually check for this.

Evidence:

  • Same expression is not checked for ?? or ?. in logic.
  • This can lead to false positives for null/undefined issues.

Suggestion: Modify the logic to check for ?? or ?. when used on the same expression in the surrounding code before flagging null/undefined issues.

✅ Discussion d002 — 1 round(s), consensus

Verdict: WARNING — Majority consensus (2/3 agree)

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Null/Undefined Not Handled When ?? or ?. Used

Confidence: 🟢 92%

Problem: In packages/core/src/l1/reviewer.ts:464-465

The documentation suggests that null/undefined with ?? or ?. is handled when the same expression is used, but the logic does not actually check for this.

Evidence:

  1. Same expression is not checked for ?? or ?. in logic.
  2. This can lead to false positives for null/undefined issues.

Suggestion: Modify the logic to check for ?? or ?. when used on the same expression in the surrounding code before flagging null/undefined issues.

🔍 Individual Reviews (2 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🟡 WARNING)

Problem: In packages/core/src/l1/reviewer.ts:464-465 The documentation suggests that guarded values (early returns, null checks, or ?? fallbacks) are not bugs, but the logic does not account for these cases w...

Evidence:

  • Early return, null check, or ?? fallback is not considered as handled value in logic.
  • This can lead to false positives for issues that are already guarded against.

Suggestion: Modify the logic to check for early returns, null checks, or ?? fallbacks in the surrounding code before flagging issues.

r-llama31 💬 llama-3.1-8b-instant (🟡 WARNING)

Problem: In packages/core/src/l1/reviewer.ts:464-465 The documentation suggests that null/undefined with ?? or ?. is handled when the same expression is used, but the logic does not actually check for this.

Evidence:

  • Same expression is not checked for ?? or ?. in logic.
  • This can lead to false positives for null/undefined issues.

Suggestion: Modify the logic to check for ?? or ?. when used on the same expression in the surrounding code before flagging null/undefined issues.

✅ Discussion d002 — 1 round(s), consensus

Verdict: WARNING — Majority consensus (2/3 agree)

Flagged by: r-llama31  |  CodeAgora

- **Division by zero with prior guard** — if the divisor is checked (<=0, ===0, etc.) with an early return above, do NOT flag division by zero
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Division by zero is not explicitly handled in documentation

Confidence: 🟡 79%

Problem: In packages/core/src/l1/reviewer.ts:465

The documentation for the new Division by zero with prior guard section does not explicitly explain how to handle division by zero cases.

Evidence:

  1. The section mentions that "Division by zero with prior guard" is not a bug if the divisor is checked (<=0, ===0, etc.) with an early return above.
  2. However, the documentation does not provide clear guidance on how to properly handle division by zero cases.
Suggested change
- **Division by zero with prior guard** if the divisor is checked (<=0, ===0, etc.) with an early return above, do NOT flag division by zero
const value = /* some expression */;
if (value <= 0) return; // early return
const result = value / /* some divisor */;

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Division by Zero Not Handled When Prior Guard Present

Confidence: 🟢 92%

Problem: In packages/core/src/l1/reviewer.ts:465

The documentation suggests that division by zero is not a bug when there is a prior guard (e.g., divisor checked for <=0 or ===0), but the logic does not actually check for this prior guard.

Evidence:

  1. Prior guard is not checked in logic.
  2. This can lead to false positives for division by zero issues.

Suggestion: Modify the logic to check for prior guards (early returns, null checks, or ?? fallbacks) in the surrounding code before flagging division by zero issues.

Flagged by: r-llama31  |  CodeAgora

- **Null/undefined with ?? or ?.** — if the value uses ?? (nullish coalescing) or ?. (optional chaining) on the SAME expression, the null case is handled

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Null/undefined values are not explicitly handled in documentation

Confidence: 🟡 79%

Problem: In packages/core/src/l1/reviewer.ts:467

The documentation for the new Null/undefined with ?? or ?. section does not explicitly explain how to handle null/undefined values.

Evidence:

  1. The section mentions that "Null/undefined with ?? or .?" is not a bug if the value uses ?? (nullish coalescing) or ?. (optional chaining) on the SAME expression.
  2. However, the documentation does not provide clear guidance on how to properly handle null/undefined values.
Suggested change
const value = /* some expression */;
const result = value?.someMethod(); // ?? (nullish coalescing) or ?. (optional chaining)

Flagged by: r-llama31  |  CodeAgora

**Example Evidence Document:**

Expand Down
130 changes: 130 additions & 0 deletions packages/core/src/pipeline/hallucination-filter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,138 @@ export function filterHallucinations(
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — Potential for false positives in hallucination filtering

Confidence: 🟢 93%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-148

The implementation of hallucination filtering has potential for false positives that could cause valid issues to be incorrectly filtered as contradictions of themselves.

Evidence:

  1. The pattern /already\s+returns?\s+(?:early|before|50|default)/i matches any mention of "already returns early" or "already returns 50" but might also match legitimate code patterns
  2. The pattern /not\s+a\s+concern/i would filter out any finding that mentions "not a security concern" or "not a performance concern" which could be legitimate
  3. The patterns are overly simplistic and don't account for negation patterns like "but this is a real issue"
Suggested change
}
### Potential for incomplete evidence merging in deduplication
✅ Discussion d002 — 1 round(s), consensus

Verdict: HARSHLY_CRITICAL — All supporters agreed on the issue

Flagged by: r-qwen3  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — Missing Validation for Guarded Division by Zero

Confidence: 🟢 100%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90

The code does not validate if the divisor is checked for being non-zero before division. This can lead to a division by zero error if the divisor is zero.

Evidence:

  1. The code checks for self-contradictory findings, but it does not check if the divisor is zero or not. (line 86)
  2. The code does not include any validation for guarded division by zero. (line 86-90)
  3. The code assumes that the divisor is non-zero, but it does not check for this. (line 86-90)
Suggested change
}
if (a.divisor === 0) {
doc.confidence = 0;
return;
}
🔍 Individual Reviews (4 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code does not validate if the divisor is checked for being non-zero before division....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the divisor is zero or not. (line 86)
  • The code does not include any validation for guarded division by zero. (line 86-90)
  • The code assumes that the divisor is non-zero, but it does not check for this. (line 86-90)

Suggestion: Add a check for the divisor being non-zero before division. This can be done by using a guard clause to check if the divisor is zero or not....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle guarded values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the values are guarded or not. (line 86)
  • The code does not include any validation for guarded values. (line 86-90)
  • The code assumes that the values are not guarded, but it does not check for this. (line 86-90)

Suggestion: Add a check for guarded values before using them. This can be done by using a guard clause to check if the values are non-zero or non-null before use. ```typescript if (a.value === 0 || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle null or undefined values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for null or undefined values. (line 86)
  • The code assumes that the values are not null or undefined, but it does not check for this. (line 86-90)

Suggestion: Add a check for null or undefined values before using them. This can be done by using a guard clause to check if the values are null or undefined before use. ```typescript if (a.value === null || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle "what if" speculation correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for "what if" speculation. (line 86)
  • The code assumes that the findings are not "what if" speculation, but it does not check for this. (line 86-90)

Suggestion: Add a check for "what if" speculation before using it. This can be done by using a guard clause to check if the findings are "what if" speculation before use. ```typescript if (a....

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — Guarded Values May Not Be Handled Correctly

Confidence: 🟢 100%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90

The code checks for self-contradictory findings, but it may not handle guarded values correctly. Guarded values are values that are checked for being non-zero or non-null before use. The code may not handle these values correctly, which can lead to errors.

Evidence:

  1. The code checks for self-contradictory findings, but it does not check if the values are guarded or not. (line 86)
  2. The code does not include any validation for guarded values. (line 86-90)
  3. The code assumes that the values are not guarded, but it does not check for this. (line 86-90)
Suggested change
}
if (a.value === 0 || a.value === null) {
doc.confidence = 0;
return;
}
🔍 Individual Reviews (4 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code does not validate if the divisor is checked for being non-zero before division....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the divisor is zero or not. (line 86)
  • The code does not include any validation for guarded division by zero. (line 86-90)
  • The code assumes that the divisor is non-zero, but it does not check for this. (line 86-90)

Suggestion: Add a check for the divisor being non-zero before division. This can be done by using a guard clause to check if the divisor is zero or not....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle guarded values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the values are guarded or not. (line 86)
  • The code does not include any validation for guarded values. (line 86-90)
  • The code assumes that the values are not guarded, but it does not check for this. (line 86-90)

Suggestion: Add a check for guarded values before using them. This can be done by using a guard clause to check if the values are non-zero or non-null before use. ```typescript if (a.value === 0 || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle null or undefined values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for null or undefined values. (line 86)
  • The code assumes that the values are not null or undefined, but it does not check for this. (line 86-90)

Suggestion: Add a check for null or undefined values before using them. This can be done by using a guard clause to check if the values are null or undefined before use. ```typescript if (a.value === null || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle "what if" speculation correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for "what if" speculation. (line 86)
  • The code assumes that the findings are not "what if" speculation, but it does not check for this. (line 86-90)

Suggestion: Add a check for "what if" speculation before using it. This can be done by using a guard clause to check if the findings are "what if" speculation before use. ```typescript if (a....

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — Null/Undefined Values May Not Be Handled Correctly

Confidence: 🟢 100%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90

The code checks for self-contradictory findings, but it may not handle null or undefined values correctly. The code assumes that the values are not null or undefined, but it does not check for this.

Evidence:

  1. The code checks for self-contradictory findings, but it does not check for null or undefined values. (line 86)
  2. The code assumes that the values are not null or undefined, but it does not check for this. (line 86-90)
Suggested change
}
if (a.value === null || a.value === undefined) {
doc.confidence = 0;
return;
}
🔍 Individual Reviews (4 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code does not validate if the divisor is checked for being non-zero before division....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the divisor is zero or not. (line 86)
  • The code does not include any validation for guarded division by zero. (line 86-90)
  • The code assumes that the divisor is non-zero, but it does not check for this. (line 86-90)

Suggestion: Add a check for the divisor being non-zero before division. This can be done by using a guard clause to check if the divisor is zero or not....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle guarded values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the values are guarded or not. (line 86)
  • The code does not include any validation for guarded values. (line 86-90)
  • The code assumes that the values are not guarded, but it does not check for this. (line 86-90)

Suggestion: Add a check for guarded values before using them. This can be done by using a guard clause to check if the values are non-zero or non-null before use. ```typescript if (a.value === 0 || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle null or undefined values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for null or undefined values. (line 86)
  • The code assumes that the values are not null or undefined, but it does not check for this. (line 86-90)

Suggestion: Add a check for null or undefined values before using them. This can be done by using a guard clause to check if the values are null or undefined before use. ```typescript if (a.value === null || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle "what if" speculation correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for "what if" speculation. (line 86)
  • The code assumes that the findings are not "what if" speculation, but it does not check for this. (line 86-90)

Suggestion: Add a check for "what if" speculation before using it. This can be done by using a guard clause to check if the findings are "what if" speculation before use. ```typescript if (a....

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — "What If" Speculation May Not Be Handled Correctly

Confidence: 🟢 100%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90

The code checks for self-contradictory findings, but it may not handle "what if" speculation correctly. "What if" speculation is a type of speculation that assumes a hypothetical situation. The code may not handle this type of speculation correctly, which can lead to errors.

Evidence:

  1. The code checks for self-contradictory findings, but it does not check for "what if" speculation. (line 86)
  2. The code assumes that the findings are not "what if" speculation, but it does not check for this. (line 86-90)
Suggested change
}
if (a.speculation === "what if") {
doc.confidence = 0;
return;
}
🔍 Individual Reviews (4 reviewers)

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code does not validate if the divisor is checked for being non-zero before division....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the divisor is zero or not. (line 86)
  • The code does not include any validation for guarded division by zero. (line 86-90)
  • The code assumes that the divisor is non-zero, but it does not check for this. (line 86-90)

Suggestion: Add a check for the divisor being non-zero before division. This can be done by using a guard clause to check if the divisor is zero or not....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle guarded values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check if the values are guarded or not. (line 86)
  • The code does not include any validation for guarded values. (line 86-90)
  • The code assumes that the values are not guarded, but it does not check for this. (line 86-90)

Suggestion: Add a check for guarded values before using them. This can be done by using a guard clause to check if the values are non-zero or non-null before use. ```typescript if (a.value === 0 || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle null or undefined values correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for null or undefined values. (line 86)
  • The code assumes that the values are not null or undefined, but it does not check for this. (line 86-90)

Suggestion: Add a check for null or undefined values before using them. This can be done by using a guard clause to check if the values are null or undefined before use. ```typescript if (a.value === null || a....

r-llama31 💬 llama-3.1-8b-instant (🔴 HARSHLY_CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90 The code checks for self-contradictory findings, but it may not handle "what if" speculation correctly....

Evidence:

  • The code checks for self-contradictory findings, but it does not check for "what if" speculation. (line 86)
  • The code assumes that the findings are not "what if" speculation, but it does not check for this. (line 86-90)

Suggestion: Add a check for "what if" speculation before using it. This can be done by using a guard clause to check if the findings are "what if" speculation before use. ```typescript if (a....

Flagged by: r-llama31  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Improper Hallucination Penalty for Self-Contradiction

Confidence: 🟢 100%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:80-83

The current implementation of hallucination filtering penalizes findings for self-contradiction by reducing their confidence to 30% (80 * 0.3). However, this approach might be too harsh and could lead to false negatives, especially in cases where the self-contradiction is minor or context-dependent.

Evidence:

  1. The reduction factor of 0.3 seems arbitrary and not empirically validated.
  2. Penalizing findings based on a simple string matching with predefined patterns may not accurately capture the nuance of self-contradictions.
  3. A more dynamic approach to evaluating self-contradictions could provide a more accurate assessment.

Suggestion: Implement a more nuanced approach to evaluating self-contradictions, potentially involving machine learning models or more sophisticated natural language processing techniques to better assess the context and severity of self-contradictions.

✅ Discussion d001 — 1 round(s), consensus

Verdict: CRITICAL — All supporters rejected HARSHLY_CRITICAL claim — downgraded to CRITICAL for human review

Flagged by: r-scout  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Potential False Positives in Hallucination Filter

Confidence: 🔴 25%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:83-90

The hallucination filter currently checks for self-contradiction by admitting the issue is handled. However, this approach may lead to potential false positives. For instance, findings with evidence that mentions a prior check or guard might be incorrectly down-weighted if the context of the check is not properly considered.

Evidence:

  1. The current implementation uses a simple regex to detect self-contradiction, which might not cover all edge cases.
  2. The detectSelfContradiction function does not account for the context of the check or guard.
  3. The function reduces confidence based solely on the presence of certain phrases, without considering the actual logic.
Suggested change
}
// Example improvement: Add context-aware logic to detect self-contradiction
export function detectSelfContradiction(doc: EvidenceDocument, context: string): boolean {
// Implement more sophisticated logic here
// ...
}

Flagged by: r-scout  |  CodeAgora


// Check 4: Self-contradiction — finding admits the issue is already handled
if (detectSelfContradiction(doc)) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — Potential for false positives in hallucination filtering

Confidence: 🟢 93%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:86-93

The implementation of the self-contradiction detection has patterns that may cause legitimate findings to be incorrectly marked as self-contradictory.

Evidence:

  1. The pattern /already\s+returns?\s+(?:early|before|50|default)/i will match any mention of "already returns", not just the intended pattern of a finding admission
  2. The pattern /not\s+a\s+concern/i will incorrectly filter findings that mention "not a security concern" when appropriate
  3. The pattern /already\s+returns?\s+(?:early|before|50|default)/i incorrectly includes 50 as a match pattern, which could match numeric values

Suggestion: 1. Refactor pattern to /already\s+returns?\s+(?:early|before|default)/i to remove the numeric match
2. Add more context to the patterns to ensure they only match admissions. For example: /admitted|acknowledged|already\s+returns?\s+(early|before|default)/i
3. Increase pattern specificity to reduce false positives while maintaining coverage for self-contradictory phrases


Flagged by: r-qwen3  |  CodeAgora

doc.confidence = Math.round((doc.confidence ?? 50) * 0.3);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Improper Confidence Adjustment for Self-Contradicting Findings

Confidence: 🔴 22%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:87-90

The code checks for self-contradicting findings and adjusts their confidence levels. However, it does not handle cases where the evidence list is empty.

Evidence:

  1. The function detectSelfContradiction checks if a document contains self-contradicting statements by searching for specific phrases in the problem, evidence, and suggestion fields.
  2. If a self-contradicting statement is found, the confidence level is adjusted by multiplying it by 0.3.
  3. However, the function does not check if the evidence list is empty before accessing its elements.
Suggested change
doc.confidence = Math.round((doc.confidence ?? 50) * 0.3);
// Add a check for empty evidence list
export function detectSelfContradiction(doc: EvidenceDocument): boolean {
if (!doc) return false;
const text = [
doc.problem || '',
...(doc.evidence || []), // Check if evidence list is empty
doc.suggestion || ''
].join(' ').trim();
if (!text) return false;
return SELF_CONTRADICTION_PATTERNS.some(p => p.test(text));
}

Flagged by: r-scout  |  CodeAgora

}

filtered.push(doc);
}

return { filtered, removed };
}

const SELF_CONTRADICTION_PATTERNS = [
/already\s+(?:handled|checked|validated|prevented|avoided|guarded|addressed)/i,
/prior\s+check/i,
/(?:is|are)\s+avoided\s+due\s+to/i,
/not\s+a\s+concern/i,
/already\s+returns?\s+(?:early|before|50|default)/i,
/(?:guard|check)\s+(?:above|before|prevents?|ensures?)/i,
/however.*(?:is|are)\s+(?:already|properly)\s+(?:handled|checked)/i,
];

/**
* Detect findings that contradict themselves by admitting the issue is handled.
* Example: "Division by zero possible" + evidence "avoided due to prior check"
*/
export function detectSelfContradiction(doc: EvidenceDocument): boolean {
if (!doc) return false;
const text = [
doc.problem || '',
...(doc.evidence || []),
doc.suggestion || ''
].join(' ').trim();

if (!text) return false;
return SELF_CONTRADICTION_PATTERNS.some(p => p.test(text));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Incomplete handling of optional evidence fields

Confidence: 🟢 99%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:112

The detectSelfContradiction function doesn't properly handle optional fields in the evidence document.

Evidence:

  1. Line 112: The function uses [doc.problem, ...doc.evidence, doc.suggestion].join(' ') which may include undefined values if any of these fields are missing
  2. The spread operator ...doc.evidence assumes evidence is always an array
  3. There's no fallback in case any of these fields are missing
Suggested change
return SELF_CONTRADICTION_PATTERNS.some(p => p.test(text));
I've also noticed some test coverage issues that might be worth mentioning.
## Issues in Test Coverage
### Incomplete test coverage for edge cases in deduplication

Flagged by: r-qwen3  |  CodeAgora

}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — Missing Input Validation

Confidence: 🟡 51%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:120-123

The input doc in detectSelfContradiction function is not validated for null or undefined, which could lead to a null pointer dereference if the input is not properly handled.

Evidence:

  1. The function does not check if doc is null or undefined before accessing its properties.
  2. If doc is null or undefined, the function will throw a null pointer exception.
  3. The function does not have any input validation to prevent such scenarios.
Suggested change
}
export function detectSelfContradiction(doc: EvidenceDocument | null): boolean {
if (!doc) return false;
// ...
}
✅ Discussion d001 — 1 round(s), consensus

Verdict: CRITICAL — Majority rejected HARSHLY_CRITICAL claim (2/3 disagree) — downgraded to CRITICAL

Flagged by: r-llama31  |  CodeAgora


/**
* Deduplicate evidence documents at the individual finding level.
* Merges findings on the same file + overlapping line range + similar title.
* Keeps highest confidence, combines evidence lists.
*
* Optimized: Groups by filePath first to avoid O(N^2) comparison across different files.
*/
export function deduplicateEvidence(docs: EvidenceDocument[]): EvidenceDocument[] {
if (!docs || docs.length <= 1) return docs || [];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 HARSHLY CRITICAL — Missing Security Boundary Checks

Confidence: 🔴 33%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:130-137

The code is not performing any security boundary checks on the docs input. If the input is not properly sanitized, it could lead to a security vulnerability.

Evidence:

  1. The code does not have any security boundary checks on the input docs.
  2. If the input is not properly sanitized, it could lead to a security vulnerability.
  3. The code does not have any input validation to prevent such scenarios.
Suggested change
if (!docs || docs.length <= 1) return docs || [];
export function deduplicateEvidence(docs: EvidenceDocument[]): EvidenceDocument[] {
if (!docs || docs.length === 0) {
throw new Error('Invalid input: empty array');
}
// ...
}

Flagged by: r-llama31  |  CodeAgora


// Group by filePath to reduce comparison space
const fileGroups = new Map<string, EvidenceDocument[]>();
for (const doc of docs) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Potential Performance Issue in deduplicateEvidence Function

Confidence: 🔴 0%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:134-139

The deduplicateEvidence function groups by filePath and then merges findings on the same file and similar title. However, this approach can lead to performance issues when dealing with a large number of findings.

Evidence:

  1. Grouping by filePath first can lead to an exponential number of groups.
  2. Merging findings on similar title can lead to an exponential number of merges.

Suggestion: Consider optimizing the deduplicateEvidence function to use a more efficient data structure or algorithm, such as using a Trie or a suffix tree.

Flagged by: r-llama31  |  CodeAgora

const list = fileGroups.get(doc.filePath || 'unknown') || [];
list.push(doc);
fileGroups.set(doc.filePath || 'unknown', list);
}

const result: EvidenceDocument[] = [];

for (const group of fileGroups.values()) {
if (group.length === 1) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Inefficient Algorithm in Deduplication

Confidence: 🔴 22%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:143-176

The deduplicateEvidence function uses a naive approach to merge findings, which could lead to inefficient performance for large inputs.

Evidence:

  1. The function uses a nested loop to compare findings, resulting in O(n^2) complexity.
  2. This could lead to performance issues if the input array is large.
Suggested change
if (group.length === 1) {
// Example improvement: Use a hash table to group findings
export function deduplicateEvidence(docs: EvidenceDocument[]): EvidenceDocument[] {
const findingGroups = new Map<string, EvidenceDocument[]>();
// ...
}

Flagged by: r-scout  |  CodeAgora

result.push(group[0]);
continue;
}

const mergedIndices = new Set<number>();
for (let i = 0; i < group.length; i++) {
if (mergedIndices.has(i)) continue;

let primary = { ...group[i], evidence: [...(group[i].evidence || [])] };
let hasSelfContradiction = detectSelfContradiction(primary);

for (let j = i + 1; j < group.length; j++) {
if (mergedIndices.has(j)) continue;

if (shouldMergeEvidence(primary, group[j])) {
// Merge: keep higher confidence, combine evidence
const primaryConf = primary.confidence ?? 0;
const otherConf = group[j].confidence ?? 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Missing Confidence Check

Confidence: 🔴 25%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:161-162

The code is not checking if the confidence level of a finding is within the expected range (0-100). If the confidence level is outside this range, it could lead to incorrect calculation of the L1 confidence.

Evidence:

  1. The code does not have any confidence check validation.
  2. If the confidence level is outside the expected range, it could lead to incorrect calculation of the L1 confidence.
  3. The code does not have any input validation to prevent such scenarios.
Suggested change
const otherConf = group[j].confidence ?? 0;
if (primary.confidence < 0 || primary.confidence > 100) {
throw new Error('Invalid confidence level');
}

Flagged by: r-llama31  |  CodeAgora

if (otherConf > primaryConf) {
primary.confidence = otherConf;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Potential for evidence ordering issues

Confidence: 🔴 27%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:143-144

The implementation adds evidence items to the primary document without preserving their original order.

Evidence:

  1. Line 143: Evidence items are added in the order they appear in the documents being merged
  2. Line 130-148: The algorithm doesn't account for the importance or relevance of evidence items
  3. There's no consideration for the order of evidence items as provided by the original documents
Suggested change
}
### Incomplete handling of optional evidence fields

Flagged by: r-qwen3  |  CodeAgora


// Add unique evidence items
const otherEvidence = group[j].evidence || [];
for (const e of otherEvidence) {
if (!primary.evidence.includes(e)) {
primary.evidence.push(e);
}
}
hasSelfContradiction = hasSelfContradiction || detectSelfContradiction(group[j]);
mergedIndices.add(j);
}
}
if (hasSelfContradiction) {
primary.confidence = Math.round((primary.confidence ?? 50) * 0.3);
}
result.push(primary);
}
}

return result;
}

function shouldMergeEvidence(a: EvidenceDocument, b: EvidenceDocument): boolean {
// Same file is guaranteed by caller (deduplicateEvidence groups by filePath)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Potential Edge Case in shouldMergeEvidence Function

Confidence: 🔴 16%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:184-187

The shouldMergeEvidence function checks for overlapping line ranges and similar titles before merging findings. However, this approach may not handle edge cases where the line ranges or titles are very similar but not exactly the same.

Evidence:

  1. Line ranges may be very similar but not exactly the same.
  2. Titles may be very similar but not exactly the same.

Suggestion: Consider modifying the shouldMergeEvidence function to use a more robust algorithm for comparing line ranges and titles, such as using a tolerance value or a fuzzy matching algorithm.

No other issues found.

Flagged by: r-llama31  |  CodeAgora

// Overlapping line ranges (within 15 lines)
const LINE_TOLERANCE = 15;
const overlap =
a.lineRange[0] <= b.lineRange[1] + LINE_TOLERANCE &&
b.lineRange[0] <= a.lineRange[1] + LINE_TOLERANCE;

if (!overlap) return false;

// Similar titles (Jaccard > 0.5)
const titleSim = jaccardSimilarity(a.issueTitle || '', b.issueTitle || '');
return titleSim > 0.5;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — jaccardSimilarity function does not handle cases with non-string inputs

Confidence: 🟡 72%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:166

The jaccardSimilarity function does not handle cases where the input is not a string.

Evidence:

  1. The function uses the Set data structure to calculate the intersection and union of the two sets.
  2. However, the function does not handle cases where the input is not a string.
Suggested change
return titleSim > 0.5;
if (typeof text !== 'string') {
return 0;
}

Flagged by: r-llama31  |  CodeAgora

}

function jaccardSimilarity(a: string, b: string): number {
if (!a && !b) return 1;
if (!a || !b) return 0;

const wordsA = new Set(a.toLowerCase().split(/\s+/).filter(Boolean));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 CRITICAL — Potential Divide by Zero Error in jaccardSimilarity function

Confidence: 🟢 86%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-179

The jaccardSimilarity function calculates the Jaccard similarity between two sets of words. However, it does not check if the union of the two sets is zero before calculating the similarity. This could lead to a divide by zero error if the union is zero.

Evidence:

  1. The jaccardSimilarity function calculates the union of the two sets of words.
  2. The function uses the union to calculate the Jaccard similarity.
  3. The function does not check if the union is zero before calculating the similarity.
Suggested change
const wordsA = new Set(a.toLowerCase().split(/\s+/).filter(Boolean));
function jaccardSimilarity(a: string, b: string): number {
const wordsA = new Set(a.toLowerCase().split(/\s+/).filter(Boolean));
const wordsB = new Set(b.toLowerCase().split(/\s+/).filter(Boolean));
let intersection = 0;
for (const w of wordsA) {
if (wordsB.has(w)) intersection++;
}
const union = wordsA.size + wordsB.size - intersection;
if (union === 0) return 0; // add check for zero union
return intersection / union;
}
🔍 Individual Reviews (2 reviewers)

r-scout 💬 meta-llama/llama-4-scout-17b-16e-instruct (🟡 WARNING)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-170 The jaccardSimilarity function does not validate its inputs, which could lead to errors or unexpected behavior if the inputs are not as...

Evidence:

  • The function assumes that the input strings a and b are non-empty and can be split into words.
  • If either a or b is an empty string, the function will return 1, which might not be the expected behavior.
  • If either a or b is not a string, the function will throw an error.

Suggestion: Add input validation to ensure that a and b are non-empty strings before processing them. ```typescript function jaccardSimilarity(a: string, b: string): number { if (typeof a !== 'string' || t...

r-llama33 💬 llama-3.3-70b-versatile (🔴 CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-179 The jaccardSimilarity function calculates the Jaccard similarity between two sets of words....

Evidence:

  • The jaccardSimilarity function calculates the union of the two sets of words.
  • The function uses the union to calculate the Jaccard similarity.
  • The function does not check if the union is zero before calculating the similarity.

Suggestion: Add a check to ensure that the union is not zero before calculating the Jaccard similarity. ```typescript function jaccardSimilarity(a: string, b: string): number { const wordsA = new Set(a....

Flagged by: r-scout, r-llama33  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Inadequate Input Validation in jaccardSimilarity Function

Confidence: 🟡 79%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-170

The jaccardSimilarity function does not validate its inputs, which could lead to errors or unexpected behavior if the inputs are not as expected.

Evidence:

  1. The function assumes that the input strings a and b are non-empty and can be split into words.
  2. If either a or b is an empty string, the function will return 1, which might not be the expected behavior.
  3. If either a or b is not a string, the function will throw an error.
Suggested change
const wordsA = new Set(a.toLowerCase().split(/\s+/).filter(Boolean));
function jaccardSimilarity(a: string, b: string): number {
if (typeof a !== 'string' || typeof b !== 'string') {
throw new Error('Both inputs must be strings');
}
if (a.trim() === '' || b.trim() === '') {
return 0; // or throw an error, depending on the desired behavior
}
const wordsA = new Set(a.toLowerCase().split(/\s+/).filter(Boolean));
const wordsB = new Set(b.toLowerCase().split(/\s+/).filter(Boolean));
let intersection = 0;
for (const w of wordsA) {
if (wordsB.has(w)) intersection++;
}
const union = wordsA.size + wordsB.size - intersection;
return union === 0 ? 1 : intersection / union;
}
🔍 Individual Reviews (2 reviewers)

r-scout 💬 meta-llama/llama-4-scout-17b-16e-instruct (🟡 WARNING)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-170 The jaccardSimilarity function does not validate its inputs, which could lead to errors or unexpected behavior if the inputs are not as...

Evidence:

  • The function assumes that the input strings a and b are non-empty and can be split into words.
  • If either a or b is an empty string, the function will return 1, which might not be the expected behavior.
  • If either a or b is not a string, the function will throw an error.

Suggestion: Add input validation to ensure that a and b are non-empty strings before processing them. ```typescript function jaccardSimilarity(a: string, b: string): number { if (typeof a !== 'string' || t...

r-llama33 💬 llama-3.3-70b-versatile (🔴 CRITICAL)

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-179 The jaccardSimilarity function calculates the Jaccard similarity between two sets of words....

Evidence:

  • The jaccardSimilarity function calculates the union of the two sets of words.
  • The function uses the union to calculate the Jaccard similarity.
  • The function does not check if the union is zero before calculating the similarity.

Suggestion: Add a check to ensure that the union is not zero before calculating the Jaccard similarity. ```typescript function jaccardSimilarity(a: string, b: string): number { const wordsA = new Set(a....

Flagged by: r-scout, r-llama33  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Inefficient Jaccard Similarity Computation

Confidence: 🔴 13%

Problem: In packages/core/src/pipeline/hallucination-filter.ts:170-179

The Jaccard similarity computation involves set operations that could be inefficient for very large sets of words.

Evidence:

  1. The current implementation may not scale well for large inputs.
  2. A more efficient algorithm or data structure could improve performance.

Suggestion: Explore more efficient algorithms for computing Jaccard similarity, such as using bitwise operations or optimized set libraries.

Flagged by: r-scout  |  CodeAgora

const wordsB = new Set(b.toLowerCase().split(/\s+/).filter(Boolean));

if (wordsA.size === 0 && wordsB.size === 0) return 1;

let intersection = 0;
for (const w of wordsA) {
if (wordsB.has(w)) intersection++;
}

const union = wordsA.size + wordsB.size - intersection;
return union <= 0 ? 0 : intersection / union;
}
4 changes: 4 additions & 0 deletions packages/core/src/pipeline/orchestrator.ts
Original file line number Diff line number Diff line change
Expand Up @@ -750,6 +750,10 @@ export async function runPipeline(input: PipelineInput, progress?: ProgressEmitt
}
allEvidenceDocs = hallucinationResult.filtered;

// === EVIDENCE DEDUP: Merge duplicate findings (#439) ===
const { deduplicateEvidence } = await import('./hallucination-filter.js');
allEvidenceDocs = deduplicateEvidence(allEvidenceDocs);
Comment on lines +753 to +755
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Score corroboration before deduplicating evidence.

deduplicateEvidence() here collapses independent L1 findings before computeL1Confidence() runs on Line 762. Because computeL1Confidence() derives corroboration from allDocs, two reviewers reporting the same issue become a single-doc finding and can get the single-reviewer penalty instead of a corroboration boost.

💡 Suggested change
-    // === EVIDENCE DEDUP: Merge duplicate findings (`#439`) ===
-    const { deduplicateEvidence } = await import('./hallucination-filter.js');
-    allEvidenceDocs = deduplicateEvidence(allEvidenceDocs);
-
     // === CONFIDENCE: Compute L1 confidence for non-rule docs ===
     const totalReviewers = allReviewerInputs.length;
     const totalDiffLines = filteredDiffContent.split('\n').length;
     for (const doc of allEvidenceDocs) {
       if (doc.source !== 'rule') {
         doc.confidence = computeL1Confidence(doc, allEvidenceDocs, totalReviewers, totalDiffLines);
       }
     }
+
+    // === EVIDENCE DEDUP: Merge duplicate findings (`#439`) ===
+    const { deduplicateEvidence } = await import('./hallucination-filter.js');
+    allEvidenceDocs = deduplicateEvidence(allEvidenceDocs);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/pipeline/orchestrator.ts` around lines 753 - 755, The
pipeline is deduplicating evidence with deduplicateEvidence(allEvidenceDocs)
before computeL1Confidence runs, which collapses independent reviewer findings
and prevents corroboration counting; update the sequence so
computeL1Confidence(allEvidenceDocs) is called before deduplicateEvidence —
i.e., call computeL1Confidence(...) on the original allEvidenceDocs (using the
computeL1Confidence function) and only then import/invoke deduplicateEvidence to
collapse duplicates, ensuring corroboration is computed from the full set of
docs.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Inconsistent Import Statements

Confidence: 🔴 28%

Problem: In packages/core/src/pipeline/orchestrator.ts:754-756

The import statement for deduplicateEvidence is dynamic, but it should be a static import for consistency.

Evidence:

  1. The import statement is currently dynamic, which can lead to issues with tree shaking and code optimization.
  2. Static imports are more consistent and efficient.
Suggested change
allEvidenceDocs = deduplicateEvidence(allEvidenceDocs);
// Change dynamic import to static import
import { deduplicateEvidence } from './hallucination-filter.js';
// ...
const { filtered } = hallucinationResult;
allEvidenceDocs = deduplicateEvidence(allEvidenceDocs);

Flagged by: r-scout  |  CodeAgora


// === CONFIDENCE: Compute L1 confidence for non-rule docs ===
const totalReviewers = allReviewerInputs.length;
const totalDiffLines = filteredDiffContent.split('\n').length;
Expand Down
112 changes: 111 additions & 1 deletion packages/core/src/tests/hallucination-filter.test.ts
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
import { describe, it, expect } from 'vitest';
import { filterHallucinations } from '../pipeline/hallucination-filter.js';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 SUGGESTION — Missing Test Cases for Edge Scenarios

Confidence: 🔴 25%

Problem: In packages/core/src/tests/hallucination-filter.test.ts:1-245

The test cases do not cover edge scenarios, such as empty evidence lists or documents with missing fields.

Evidence:

  1. The current test cases do not cover all possible edge scenarios.
  2. Additional test cases are needed to ensure the function behaves correctly in these scenarios.
Suggested change
import { filterHallucinations } from '../pipeline/hallucination-filter.js';
// Add test cases for edge scenarios
it('should handle empty evidence list', () => {
const docs = [makeDoc({
issueTitle: 'Null pointer dereference',
filePath: 'src/utils.ts',
lineRange: [10, 12],
evidence: [], // Empty evidence list
})];
const result = deduplicateEvidence(docs);
expect(result).toHaveLength(1);
});

Flagged by: r-scout  |  CodeAgora

import { filterHallucinations, detectSelfContradiction, deduplicateEvidence } from '../pipeline/hallucination-filter.js';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove the unused detectSelfContradiction import or add a direct suite for it.

Line 2 currently fails @typescript-eslint/no-unused-vars. A small describe('detectSelfContradiction') block would also let this file cover the helper independently of filterHallucinations.

🧰 Tools
🪛 ESLint

[error] 2-2: 'detectSelfContradiction' is defined but never used. Allowed unused vars must match /^_/u.

(@typescript-eslint/no-unused-vars)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/core/src/tests/hallucination-filter.test.ts` at line 2, The import
detectSelfContradiction is unused and triggers a linter error; either remove
detectSelfContradiction from the import list in the top-level import that
currently reads "filterHallucinations, detectSelfContradiction,
deduplicateEvidence", or add a small test suite named
describe('detectSelfContradiction') that calls detectSelfContradiction with
representative inputs and asserts expected output to exercise the helper (keep
existing filterHallucinations and deduplicateEvidence tests untouched).

import type { EvidenceDocument } from '../types/core.js';

// Minimal diff for testing
Expand Down Expand Up @@ -115,6 +115,30 @@ describe('filterHallucinations', () => {
expect(result.removed).toHaveLength(0);
});

it('should penalize confidence for self-contradicting findings', () => {
const docs = [makeDoc({
evidence: ['Division by zero is avoided due to prior check on line 5'],
confidence: 80,
})];
const result = filterHallucinations(docs, SAMPLE_DIFF);

expect(result.filtered).toHaveLength(1);
expect(result.filtered[0].confidence).toBe(24); // 80 * 0.3
});

it('should not penalize confidence for findings without self-contradiction', () => {
const docs = [makeDoc({
problem: 'Division by zero on line 12',
evidence: ['computeValue() may return 0'],
suggestion: 'Validate the divisor is non-zero',
confidence: 80,
})];
const result = filterHallucinations(docs, SAMPLE_DIFF);

expect(result.filtered).toHaveLength(1);
expect(result.filtered[0].confidence).toBe(80);
});

it('should correctly split mixed valid/invalid docs', () => {
const docs = [
makeDoc({ filePath: 'src/utils.ts', lineRange: [11, 12] }), // valid
Expand All @@ -133,3 +157,89 @@ describe('filterHallucinations', () => {
]);
});
});

describe('deduplicateEvidence', () => {
it('should merge findings on same file, same line, similar title', () => {
const docs = [
makeDoc({
issueTitle: 'Null pointer dereference risk',
filePath: 'src/utils.ts',
lineRange: [10, 12],
confidence: 70,
evidence: ['evidence A'],
}),
makeDoc({
issueTitle: 'Null pointer dereference possible',
filePath: 'src/utils.ts',
lineRange: [11, 13],
confidence: 85,
evidence: ['evidence B'],
}),
];
const result = deduplicateEvidence(docs);

expect(result).toHaveLength(1);
expect(result[0].confidence).toBe(85); // higher confidence kept
expect(result[0].evidence).toContain('evidence A');
expect(result[0].evidence).toContain('evidence B');
});

it('should not merge findings on different files', () => {
const docs = [
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Incomplete test coverage for edge cases in deduplication

Confidence: 🔴 22%

Problem: In packages/core/src/tests/hallucination-filter.test.ts:187-203, packages/core/src/tests/hallucination-filter.test.ts:205-221, packages/core/src/tests/hallucination-filter.test.ts:223-248

The test cases for evidence deduplication don't cover several important edge cases.

Evidence:

  1. No test for documents with empty evidence arrays
  2. No test for handling documents with undefined or null fields
  3. No test for documents with overlapping but not touching line numbers
  4. No test for documents with very similar but distinct issue titles
  5. No test for performance with large input sets

Suggestion: 1. Add test for documents with no evidence
2. Add test for documents with undefined fields
3. Add test covering edge cases in line range matching
4. Add test with very similar but distinct issue titles
5. Add performance test with 100+ documents to identify scalability issues
6. Add test for empty input and single-document inputs


Overall, these are the main issues I've identified in the added code. While the code appears to work for the basic cases covered in the tests, there are several potential problems that could manifest in production with more complex inputs.
</think>

<sub>Flagged by: r-qwen3  |  CodeAgora</sub>

makeDoc({
issueTitle: 'Null pointer dereference',
filePath: 'src/utils.ts',
lineRange: [10, 12],
}),
makeDoc({
issueTitle: 'Null pointer dereference',
filePath: 'src/index.ts',
lineRange: [10, 12],
}),
];
const result = deduplicateEvidence(docs);

expect(result).toHaveLength(2);
});

it('should not merge findings with very different titles', () => {
const docs = [
makeDoc({
issueTitle: 'SQL injection vulnerability',
filePath: 'src/utils.ts',
lineRange: [10, 12],
}),
makeDoc({
issueTitle: 'Memory leak in event handler',
filePath: 'src/utils.ts',
lineRange: [10, 12],
}),
];
const result = deduplicateEvidence(docs);

expect(result).toHaveLength(2);
});

it('should preserve self-contradiction penalty after merge', () => {
const docs = [
makeDoc({
issueTitle: 'Division by zero risk in helper',
filePath: 'src/utils.ts',
lineRange: [10, 12],
confidence: 24,
evidence: ['Division by zero is avoided due to prior check on line 5'],
}),
makeDoc({
issueTitle: 'Division by zero possible in helper',
filePath: 'src/utils.ts',
lineRange: [11, 13],
confidence: 80,
evidence: ['Potential divide by zero in compute path'],
}),
];
const result = deduplicateEvidence(docs);

expect(result).toHaveLength(1);
expect(result[0].confidence).toBe(24); // max(24,80)=80 then contradiction penalty => 24
});
});
1 change: 1 addition & 0 deletions src/tests/critical-core-errors-orchestrator.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ vi.mock('../../packages/core/src/l0/quality-tracker.js', () => ({

vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractMultipleSnippets: vi.fn(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Potential Memory Leak in Mocked Functions

Confidence: 🔴 22%

Problem: In src/tests/critical-core-errors-orchestrator.test.ts:85-89, src/tests/e2e-mock-scenarios.test.ts:82-87, src/tests/orchestrator-branches.test.ts:80-85, src/tests/pipeline-chunk-parallel.test.ts:80-85

The vi.mock function is used to mock several functions from the ../../packages/shared/src/utils/diff.js module. However, the extractFileListFromDiff, parseDiffFileRanges, and readSurroundingContext functions are mocked to return empty arrays or null values. This could potentially lead to memory leaks or unexpected behavior if the mocked functions are not properly restored after each test.

Evidence:

  1. The vi.mock function is used to mock several functions without properly restoring them after each test.
  2. The mocked functions return empty arrays or null values, which could lead to unexpected behavior or memory leaks.
  3. The tests do not appear to properly clean up after themselves, which could lead to memory leaks or other issues.
Suggested change
extractMultipleSnippets: vi.fn(),
afterEach(() => {
vi.restoreAllMocks();
});

Flagged by: r-llama33  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Missing Input Validation for extractFileListFromDiff Mock

Confidence: 🔴 22%

Problem: In src/tests/critical-core-errors-orchestrator.test.ts:85-86

The extractFileListFromDiff function is mocked to return an empty array, but it does not validate its input. This could lead to issues if the function is called with malicious or malformed input.

Evidence:

  1. The extractFileListFromDiff function is mocked to return an empty array without any input validation.
  2. The function is called with a diff string as an argument, which could potentially be malformed or contain security vulnerabilities.
  3. The lack of input validation could lead to errors or security breaches if the function is called with unexpected input.
Suggested change
extractMultipleSnippets: vi.fn(),
vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractFileListFromDiff: vi.fn().mockImplementation((diffString) => {
if (typeof diffString !== 'string') {
throw new Error('Invalid diff string');
}
return [];
}),
}));

Flagged by: r-llama33  |  CodeAgora

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Inconsistent Mocking of extractFileListFromDiff

Confidence: 🔴 22%

Problem: In src/tests/critical-core-errors-orchestrator.test.ts:85, src/tests/e2e-mock-scenarios.test.ts:83, src/tests/orchestrator-branches.test.ts:80, src/tests/pipeline-chunk-parallel.test.ts:81

The extractFileListFromDiff function is being mocked to return an empty array in multiple test files. However, in src/tests/e2e-mock-scenarios.test.ts, this line was previously commented out, but then re-added with the same implementation. This inconsistency may indicate that the mocking of this function is not properly considered across different test files.

Evidence:

  1. The extractFileListFromDiff function is mocked in multiple test files with the same implementation.
  2. The previous version of src/tests/e2e-mock-scenarios.test.ts had this line commented out, indicating potential uncertainty about its necessity.
  3. The function extractFileListFromDiff is part of the diff.js utilities, which are mocked differently across test files.
Suggested change
extractMultipleSnippets: vi.fn(),
vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractMultipleSnippets: vi.fn(),
extractFileListFromDiff: vi.fn().mockReturnValue([]), // Always return empty array for testing purposes
parseDiffFileRanges: vi.fn().mockReturnValue([]),
readSurroundingContext: vi.fn().mockResolvedValue(null),
}));

Flagged by: r-llama33  |  CodeAgora

extractFileListFromDiff: vi.fn().mockReturnValue([]),
parseDiffFileRanges: vi.fn().mockReturnValue([]),
readSurroundingContext: vi.fn().mockResolvedValue(null),
}));
Expand Down
3 changes: 1 addition & 2 deletions src/tests/e2e-mock-scenarios.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,12 +79,11 @@ vi.mock('../../packages/core/src/l0/quality-tracker.js', () => ({
getRecords: vi.fn().mockReturnValue([]),
})),
}));

vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractMultipleSnippets: vi.fn(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Inconsistent Mocking of extractFileListFromDiff

Confidence: 🔴 16%

Problem: In src/tests/e2e-mock-scenarios.test.ts:82-83

The extractFileListFromDiff function is mocked to return an empty array, but it is duplicated in the same file. This could lead to maintenance issues and inconsistencies in the code.

Evidence:

  1. The extractFileListFromDiff function is mocked twice in the same file with the same implementation.
  2. The duplication could lead to maintenance issues if one of the mocks needs to be updated and the other is forgotten.
  3. The inconsistency could also lead to confusion when reading the code.

Suggestion: Remove the duplicated mock and keep only one implementation of the extractFileListFromDiff mock. If the mock needs to be updated, it can be done in one place, reducing the risk of maintenance issues and inconsistencies.

Flagged by: r-llama33  |  CodeAgora

extractFileListFromDiff: vi.fn().mockReturnValue([]),
parseDiffFileRanges: vi.fn().mockReturnValue([]),
readSurroundingContext: vi.fn().mockResolvedValue(null),
extractFileListFromDiff: vi.fn().mockReturnValue([]),
}));

vi.mock('../../packages/shared/src/utils/logger.js', () => ({
Expand Down
3 changes: 3 additions & 0 deletions src/tests/orchestrator-branches.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ vi.mock('../../packages/core/src/l0/quality-tracker.js', () => ({

vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractMultipleSnippets: vi.fn(),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 WARNING — Missing Error Handling for readSurroundingContext Mock

Confidence: 🔴 19%

Problem: In src/tests/orchestrator-branches.test.ts:80-81

The readSurroundingContext function is mocked to return a resolved promise with a value of null, but it does not handle potential errors. This could lead to issues if the function is called with input that causes an error.

Evidence:

  1. The readSurroundingContext function is mocked to return a resolved promise with a value of null.
  2. The function does not handle potential errors that could occur when reading the surrounding context.
  3. The lack of error handling could lead to issues if the function is called with input that causes an error.
Suggested change
extractMultipleSnippets: vi.fn(),
vi.mock('../../packages/shared/src/utils/diff.js', () => ({
readSurroundingContext: vi.fn().mockImplementation((context) => {
try {
// Simulate reading the surrounding context
return Promise.resolve(null);
} catch (error) {
return Promise.reject(error);
}
}),
}));

Flagged by: r-llama33  |  CodeAgora

extractFileListFromDiff: vi.fn().mockReturnValue([]),
parseDiffFileRanges: vi.fn().mockReturnValue([]),
readSurroundingContext: vi.fn().mockResolvedValue(null),
}));

vi.mock('../../packages/shared/src/utils/logger.js', () => ({
Expand Down
3 changes: 3 additions & 0 deletions src/tests/pipeline-chunk-parallel.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,9 @@ vi.mock('../../packages/core/src/l0/quality-tracker.js', () => ({

vi.mock('../../packages/shared/src/utils/diff.js', () => ({
extractMultipleSnippets: vi.fn(),
extractFileListFromDiff: vi.fn().mockReturnValue([]),
parseDiffFileRanges: vi.fn().mockReturnValue([]),
readSurroundingContext: vi.fn().mockResolvedValue(null),
}));

vi.mock('../../packages/shared/src/utils/logger.js', () => ({
Expand Down