Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
6c80884
fix: Allow async dispatches and error page rendering without authenti…
rostilos Jan 23, 2026
b5cab46
fix: Add analysis lock checks to prevent duplicate PR analysis in web…
rostilos Jan 23, 2026
d3a4c5f
Merge pull request #87 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
3bc5967
fix: Implement atomic upsert for command rate limiting to prevent rac…
rostilos Jan 23, 2026
8f3a3bb
Merge pull request #89 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
28a4007
fix: Improve alias management by ensuring direct collections are dele…
rostilos Jan 23, 2026
efc42d2
fix: Enhance locking mechanism in PR webhook handlers to prevent race…
rostilos Jan 23, 2026
c673dce
Merge pull request #90 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
13a63c8
fix: Enhance alias management by implementing backup and migration st…
rostilos Jan 23, 2026
adb68cb
Merge pull request #91 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
95d74e1
fix: Enhance AI analysis by incorporating full PR issue history and r…
rostilos Jan 23, 2026
85c47e3
Merge pull request #92 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
e509ffc
fix: Update issue reconciliation logic to handle previous issues in b…
rostilos Jan 23, 2026
ead4b08
Merge pull request #93 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
a2889c7
fix: Improve handling of issue resolution status and logging for bett…
rostilos Jan 23, 2026
7a30044
Merge pull request #94 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
6fb5693
fix: Implement method to retrieve branch differences from GitLab API
rostilos Jan 23, 2026
c6337aa
Merge pull request #95 from rostilos/bugfix/analysis-issues
rostilos Jan 23, 2026
56761fb
fix: Enhance logging and implement deterministic context retrieval in…
rostilos Jan 23, 2026
8fc46f3
Refactor AI connection handling and improve job deletion logic
rostilos Jan 26, 2026
7c78057
feat: Add pre-acquired lock key to prevent double-locking in PR analy…
rostilos Jan 26, 2026
b7be7fe
Merge pull request #97 from rostilos/feature/rag-smart-querying
rostilos Jan 26, 2026
6d80d71
feat: Implement handling for AnalysisLockedException and DiffTooLarge…
rostilos Jan 26, 2026
e2c1474
feat: Re-fetch job entities in transaction methods to handle detached…
rostilos Jan 27, 2026
342c4fa
feat: Update JobService and WebhookAsyncProcessor to manage job entit…
rostilos Jan 27, 2026
409c42d
feat: Enable transaction management in processWebhookAsync to support…
rostilos Jan 27, 2026
11c983c
feat: Re-fetch job entities in JobService methods to ensure consisten…
rostilos Jan 27, 2026
c75eaba
feat: Add @Transactional annotation to processWebhookAsync for lazy l…
rostilos Jan 27, 2026
8afc0ad
feat: Implement self-injection in WebhookAsyncProcessor for proper tr…
rostilos Jan 27, 2026
402486b
feat: Enhance logging and error handling in processWebhookAsync for i…
rostilos Jan 27, 2026
fdcdca0
feat: Implement webhook deduplication service to prevent duplicate co…
rostilos Jan 27, 2026
e321361
feat: Enhance job deletion process with logging and persistence conte…
rostilos Jan 27, 2026
ebd0fad
feat: Improve job deletion process with enhanced logging and error ha…
rostilos Jan 27, 2026
092b361
feat: Add method to delete job by ID in JobRepository and update JobS…
rostilos Jan 27, 2026
61d2620
feat: Simplify job handling by marking ignored jobs as SKIPPED instea…
rostilos Jan 27, 2026
704a7a2
feat: Enhance AI connection logging and refactor placeholder manageme…
rostilos Jan 28, 2026
2e42ebc
feat: Add logging for LLM creation and enhance diff snippet extractio…
rostilos Jan 28, 2026
d036fa9
feat: Implement AST-based code splitter and scoring configuration
rostilos Jan 28, 2026
642bda0
feat: Enhance lock management in PullRequestAnalysisProcessor and imp…
rostilos Jan 28, 2026
5add89c
feat: Enhance AST processing and metadata extraction in RAG pipeline …
rostilos Jan 28, 2026
1fc484c
feat: Improve deduplication strategy in RAGQueryService to prioritize…
rostilos Jan 28, 2026
c03591e
feat: Enhance comments for clarity on target branch indexing and incr…
rostilos Jan 28, 2026
0bb9ca8
feat: Update default configuration values for chunk size and text chu…
rostilos Jan 28, 2026
585ab6d
Merge pull request #106 from rostilos/feature/pr-analysis-rate-limiting
rostilos Jan 29, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions java-ecosystem/libs/analysis-engine/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,12 @@
<artifactId>okhttp</artifactId>
</dependency>

<!-- JTokkit for token counting -->
<dependency>
<groupId>com.knuddels</groupId>
<artifactId>jtokkit</artifactId>
</dependency>

<!-- Test Dependencies -->
<dependency>
<groupId>org.junit.jupiter</groupId>
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
requires com.fasterxml.jackson.annotation;
requires jakarta.persistence;
requires kotlin.stdlib;
requires jtokkit;

exports org.rostilos.codecrow.analysisengine.aiclient;
exports org.rostilos.codecrow.analysisengine.config;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
package org.rostilos.codecrow.analysisengine.dto.request.ai;

import com.fasterxml.jackson.annotation.JsonProperty;
Expand Down Expand Up @@ -278,6 +278,29 @@
return self();
}

/**
* Set previous issues from ALL PR analysis versions.
* This provides the LLM with complete issue history including resolved issues,
* helping it understand what was already found and fixed.
*
* Issues are deduplicated by keeping only the most recent version of each issue.
* Resolved issues are included so the LLM knows what was already addressed.
*
* @param allPrAnalyses List of all analyses for this PR, ordered by version DESC (newest first)
*/
public T withAllPrAnalysesData(List<CodeAnalysis> allPrAnalyses) {
if (allPrAnalyses == null || allPrAnalyses.isEmpty()) {
return self();
}

this.previousCodeAnalysisIssues = allPrAnalyses.stream()
.flatMap(analysis -> analysis.getIssues().stream())
.map(AiRequestPreviousIssueDTO::fromEntity)
.toList();

return self();
}
Comment on lines +281 to +302
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find CodeAnalysisIssue class definition
find . -name "*.java" -type f | xargs grep -l "class CodeAnalysisIssue" | head -5

Repository: rostilos/CodeCrow

Length of output: 284


🏁 Script executed:

#!/bin/bash
# Search for CodeAnalysis class to understand the structure
find . -name "*.java" -type f | xargs grep -l "class CodeAnalysis\b" | head -5

Repository: rostilos/CodeCrow

Length of output: 165


🏁 Script executed:

#!/bin/bash
# Look for any deduplication logic in the codebase
rg "LinkedHashMap|toMap.*Function.identity" --type java -C2 | head -30

Repository: rostilos/CodeCrow

Length of output: 4362


🏁 Script executed:

#!/bin/bash
# Find usages of withAllPrAnalysesData to understand how it's called
rg "withAllPrAnalysesData" --type java -C3

Repository: rostilos/CodeCrow

Length of output: 5696


🏁 Script executed:

#!/bin/bash
# Read CodeAnalysisIssue class to find stable identifier fields
cat java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/codeanalysis/CodeAnalysisIssue.java

Repository: rostilos/CodeCrow

Length of output: 5060


🏁 Script executed:

#!/bin/bash
# Read CodeAnalysis class to understand the structure
cat java-ecosystem/libs/core/src/main/java/org/rostilos/codecrow/core/model/codeanalysis/CodeAnalysis.java

Repository: rostilos/CodeCrow

Length of output: 5970


🏁 Script executed:

#!/bin/bash
# Find AiRequestPreviousIssueDTO class
find . -name "*.java" -type f | xargs grep -l "class AiRequestPreviousIssueDTO"

Repository: rostilos/CodeCrow

Length of output: 199


🏁 Script executed:

#!/bin/bash
# Check how the Javadoc says "deduplicated by keeping only the most recent version"
# Let's verify the exact context in the file
sed -n '281,302p' java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequestImpl.java

Repository: rostilos/CodeCrow

Length of output: 1080


🏁 Script executed:

#!/bin/bash
# Find and read AiRequestPreviousIssueDTO class
find . -name "AiRequestPreviousIssueDTO.java" -type f -exec cat {} \;

Repository: rostilos/CodeCrow

Length of output: 2395


🏁 Script executed:

#!/bin/bash
# Check if there are any other dedup patterns or discussions in related code
rg "deduplicate|dedup|unique.*issue|fingerprint" --type java -C2 | head -50

Repository: rostilos/CodeCrow

Length of output: 971


🏁 Script executed:

#!/bin/bash
# Check test file to understand expected behavior
cat java-ecosystem/libs/analysis-engine/src/test/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiRequestPreviousIssueDTOTest.java

Repository: rostilos/CodeCrow

Length of output: 12371


Deduplication promised in Javadoc is not implemented.

The method flattens all issues from all PR analysis versions without deduplicating them. The Javadoc states "Issues are deduplicated by keeping only the most recent version of each issue," but the implementation contains no dedup logic. This creates duplicate entries in the stream when the same logical issue (identified by file, line, reason, severity, category) appears across multiple PR versions, inflating the data sent to the LLM and contradicting the documented behavior.

Use a composite key based on issue identity fields (file path + line number + reason + severity + category) to deduplicate while preserving the newest-first order:

Suggested implementation pattern
-            this.previousCodeAnalysisIssues = allPrAnalyses.stream()
-                    .flatMap(analysis -> analysis.getIssues().stream())
-                    .map(AiRequestPreviousIssueDTO::fromEntity)
-                    .toList();
+            this.previousCodeAnalysisIssues = allPrAnalyses.stream()
+                    .flatMap(analysis -> analysis.getIssues().stream())
+                    .map(AiRequestPreviousIssueDTO::fromEntity)
+                    // Keep first occurrence (newest) based on issue identity
+                    .collect(java.util.stream.Collectors.toMap(
+                            dto -> dto.file() + "|" + dto.line() + "|" + dto.reason() + "|" + dto.severity() + "|" + dto.category(),
+                            java.util.function.Function.identity(),
+                            (first, ignored) -> first,  // Keep first (newest) when key collision
+                            java.util.LinkedHashMap::new  // Preserve order
+                    ))
+                    .values()
+                    .stream()
+                    .toList();
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/dto/request/ai/AiAnalysisRequestImpl.java`
around lines 281 - 302, The Javadoc for withAllPrAnalysesData promises
deduplication but the method currently just flattens all issues; update
withAllPrAnalysesData in AiAnalysisRequestImpl to deduplicate by a composite
identity (file path + line number + reason + severity + category) while
preserving newest-first order: when streaming allPrAnalyses -> flatMap ->
map(AiRequestPreviousIssueDTO::fromEntity), collect into a LinkedHashMap keyed
by the composite key and keep the first occurrence (newest) for each key, then
set previousCodeAnalysisIssues to the map.values() list; ensure you reference
the existing AiRequestPreviousIssueDTO.fromEntity conversion and the
previousCodeAnalysisIssues field and return self() as before.


public T withMaxAllowedTokens(int maxAllowedTokens) {
this.maxAllowedTokens = maxAllowedTokens;
return self();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,23 @@ public record AiRequestPreviousIssueDTO(
String branch,
String pullRequestId,
String status, // open|resolved|ignored
String category
String category,
// Resolution tracking fields
Integer prVersion, // Which PR iteration this issue was found in
String resolvedDescription, // Description of how the issue was resolved
String resolvedByCommit, // Commit hash that resolved the issue
Long resolvedInPrVersion // PR version where this was resolved (null if still open)
) {
public static AiRequestPreviousIssueDTO fromEntity(CodeAnalysisIssue issue) {
String categoryStr = issue.getIssueCategory() != null
? issue.getIssueCategory().name()
: IssueCategory.CODE_QUALITY.name();

Integer prVersion = null;
if (issue.getAnalysis() != null) {
prVersion = issue.getAnalysis().getPrVersion();
}

return new AiRequestPreviousIssueDTO(
String.valueOf(issue.getId()),
categoryStr,
Expand All @@ -33,7 +44,11 @@ public static AiRequestPreviousIssueDTO fromEntity(CodeAnalysisIssue issue) {
issue.getAnalysis() == null ? null : issue.getAnalysis().getBranchName(),
issue.getAnalysis() == null || issue.getAnalysis().getPrNumber() == null ? null : String.valueOf(issue.getAnalysis().getPrNumber()),
issue.isResolved() ? "resolved" : "open",
categoryStr
categoryStr,
prVersion,
issue.getResolvedDescription(),
issue.getResolvedCommitHash(),
issue.getResolvedAnalysisId()
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,13 @@ public class PrProcessRequest implements AnalysisProcessRequest {
public String prAuthorId;

public String prAuthorUsername;

/**
* Optional pre-acquired lock key. If set, the processor will skip lock acquisition
* and use this lock key directly. This prevents double-locking when the webhook handler
* has already acquired the lock before calling the processor.
*/
public String preAcquiredLockKey;


public Long getProjectId() {
Expand Down Expand Up @@ -64,4 +71,6 @@ public String getSourceBranchName() {
public String getPrAuthorId() { return prAuthorId; }

public String getPrAuthorUsername() { return prAuthorUsername; }

public String getPreAcquiredLockKey() { return preAcquiredLockKey; }
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
package org.rostilos.codecrow.analysisengine.exception;

/**
* Exception thrown when a diff exceeds the configured token limit for analysis.
* This is a soft skip - the analysis is not performed but the job is not marked as failed.
*/
public class DiffTooLargeException extends RuntimeException {

private final int estimatedTokens;
private final int maxAllowedTokens;
private final Long projectId;
private final Long pullRequestId;

public DiffTooLargeException(int estimatedTokens, int maxAllowedTokens, Long projectId, Long pullRequestId) {
super(String.format(
"PR diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%d, PR=%d)",
estimatedTokens, maxAllowedTokens, projectId, pullRequestId
));
Comment on lines +14 to +18
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n "java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/exception/DiffTooLargeException.java"

Repository: rostilos/CodeCrow

Length of output: 1889


🏁 Script executed:

rg -n "new DiffTooLargeException" -C 3

Repository: rostilos/CodeCrow

Length of output: 4371


🏁 Script executed:

# Test String.format behavior with null values in Java
python3 << 'EOF'
# String.format behavior in Java:
# %d expects int, so if you pass null to %d, it will cause NPE
# %s with null is safe - it converts null to "null" string
print("Java String.format behavior:")
print("- %d with null: NPE (because %d expects primitive int)")
print("- %s with null: works fine, outputs 'null'")
print("\nHowever, Long is an Object that can be null.")
print("If you pass Long value to %d, Java auto-unboxes it.")
print("If the Long is null, auto-unboxing throws NPE before format is called.")
EOF

Repository: rostilos/CodeCrow

Length of output: 356


Replace %d with %s for nullable Long IDs to avoid NPE during auto-unboxing.

When projectId or pullRequestId are null, String.format("%d", null) throws NullPointerException during auto-unboxing, masking the intended "diff exceeds token limit" error. Using %s safely handles null values.

Safer formatting
-        super(String.format(
-            "PR diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%d, PR=%d)",
-            estimatedTokens, maxAllowedTokens, projectId, pullRequestId
-        ));
+        super(String.format(
+            "Diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%s, PR=%s)",
+            estimatedTokens, maxAllowedTokens, projectId, pullRequestId
+        ));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
public DiffTooLargeException(int estimatedTokens, int maxAllowedTokens, Long projectId, Long pullRequestId) {
super(String.format(
"PR diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%d, PR=%d)",
estimatedTokens, maxAllowedTokens, projectId, pullRequestId
));
public DiffTooLargeException(int estimatedTokens, int maxAllowedTokens, Long projectId, Long pullRequestId) {
super(String.format(
"Diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%s, PR=%s)",
estimatedTokens, maxAllowedTokens, projectId, pullRequestId
));
}
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/exception/DiffTooLargeException.java`
around lines 14 - 18, The DiffTooLargeException constructor currently uses
String.format with %d for projectId and pullRequestId which will NPE on null
Longs due to auto-unboxing; update the format string in the
DiffTooLargeException(int estimatedTokens, int maxAllowedTokens, Long projectId,
Long pullRequestId) constructor to use %s for the nullable Long parameters and
pass them unchanged so nulls are rendered safely (leave the integer token
placeholders as %d).

this.estimatedTokens = estimatedTokens;
this.maxAllowedTokens = maxAllowedTokens;
this.projectId = projectId;
this.pullRequestId = pullRequestId;
}

public int getEstimatedTokens() {
return estimatedTokens;
}

public int getMaxAllowedTokens() {
return maxAllowedTokens;
}

public Long getProjectId() {
return projectId;
}

public Long getPullRequestId() {
return pullRequestId;
}

/**
* Returns the percentage of the token limit that would be used.
*/
public double getUtilizationPercentage() {
return maxAllowedTokens > 0 ? (estimatedTokens * 100.0 / maxAllowedTokens) : 0;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
import java.time.Duration;
import java.time.Instant;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Optional;

Expand Down Expand Up @@ -90,34 +91,45 @@
// Publish analysis started event
publishAnalysisStartedEvent(project, request, correlationId);

Optional<String> lockKey = analysisLockService.acquireLockWithWait(
project,
request.getSourceBranchName(),
AnalysisLockType.PR_ANALYSIS,
request.getCommitHash(),
request.getPullRequestId(),
consumer::accept
);

if (lockKey.isEmpty()) {
String message = String.format(
"Failed to acquire lock after %d minutes for project=%s, PR=%d, branch=%s. Another analysis is still in progress.",
analysisLockService.getLockWaitTimeoutMinutes(),
project.getId(),
request.getPullRequestId(),
request.getSourceBranchName()
);
log.warn(message);

// Publish failed event due to lock timeout
publishAnalysisCompletedEvent(project, request, correlationId, startTime,
AnalysisCompletedEvent.CompletionStatus.FAILED, 0, 0, "Lock acquisition timeout");

throw new AnalysisLockedException(
AnalysisLockType.PR_ANALYSIS.name(),
// Check if a lock was already acquired by the caller (e.g., webhook handler)
// to prevent double-locking which causes unnecessary 2-minute waits
String lockKey;
boolean isPreAcquired = false;
if (request.getPreAcquiredLockKey() != null && !request.getPreAcquiredLockKey().isBlank()) {
lockKey = request.getPreAcquiredLockKey();
isPreAcquired = true;
log.info("Using pre-acquired lock: {} for project={}, PR={}", lockKey, project.getId(), request.getPullRequestId());
} else {
Optional<String> acquiredLock = analysisLockService.acquireLockWithWait(
project,
request.getSourceBranchName(),
project.getId()
AnalysisLockType.PR_ANALYSIS,
request.getCommitHash(),
request.getPullRequestId(),
consumer::accept
);

if (acquiredLock.isEmpty()) {
String message = String.format(
"Failed to acquire lock after %d minutes for project=%s, PR=%d, branch=%s. Another analysis is still in progress.",
analysisLockService.getLockWaitTimeoutMinutes(),
project.getId(),
request.getPullRequestId(),
request.getSourceBranchName()
);
log.warn(message);

// Publish failed event due to lock timeout
publishAnalysisCompletedEvent(project, request, correlationId, startTime,
AnalysisCompletedEvent.CompletionStatus.FAILED, 0, 0, "Lock acquisition timeout");

throw new AnalysisLockedException(
AnalysisLockType.PR_ANALYSIS.name(),
request.getSourceBranchName(),
project.getId()
);
}
lockKey = acquiredLock.get();
}

try {
Expand All @@ -139,16 +151,24 @@
return Map.of("status", "cached", "cached", true);
}

Optional<CodeAnalysis> previousAnalysis = codeAnalysisService.getPreviousVersionCodeAnalysis(
// Get all previous analyses for this PR to provide full issue history to AI
List<CodeAnalysis> allPrAnalyses = codeAnalysisService.getAllPrAnalyses(
project.getId(),
request.getPullRequestId()
);

// Get the most recent analysis for incremental diff calculation
Optional<CodeAnalysis> previousAnalysis = allPrAnalyses.isEmpty()
? Optional.empty()
: Optional.of(allPrAnalyses.get(0));

// Ensure branch index exists for target branch if configured
// Ensure branch index exists for TARGET branch (e.g., "1.2.1-rc")
// This is where the PR will merge TO - we want RAG context from this branch
ensureRagIndexForTargetBranch(project, request.getTargetBranchName(), consumer);

VcsAiClientService aiClientService = vcsServiceFactory.getAiClientService(provider);
AiAnalysisRequest aiRequest = aiClientService.buildAiAnalysisRequest(project, request, previousAnalysis);
AiAnalysisRequest aiRequest = aiClientService.buildAiAnalysisRequest(
project, request, previousAnalysis, allPrAnalyses);

Map<String, Object> aiResponse = aiAnalysisClient.performAnalysis(aiRequest, event -> {
try {
Expand Down Expand Up @@ -208,7 +228,9 @@

return Map.of("status", "error", "message", e.getMessage());
} finally {
analysisLockService.releaseLock(lockKey.get());
if (!isPreAcquired) {

Check warning on line 231 in java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/processor/analysis/PullRequestAnalysisProcessor.java

View check run for this annotation

CodeCrow-Local / CodeCrow Analysis

MEDIUM severity issue

The logic only releases the lock if '!isPreAcquired'. While this prevents the processor from releasing a lock it didn't create, it assumes the caller will always handle cleanup. If the caller (e.g., a webhook handler) expects the processor to take ownership of the lifecycle once passed, the lock will leak. Typically, pre-acquired locks used in a 'try-finally' block should have a clearly defined owner to prevent orphaned locks in the database.
Raw output
Suggested fix:
Ensure there is a clear contract on lock ownership. If the processor should not release it, add a comment explaining where the release happens, or ensure the 'isPreAcquired' flag is correctly managed by the caller's transaction scope. Given the existing code, ensure the caller has a finally block to release the lock since this processor now skips it.
analysisLockService.releaseLock(lockKey);
}
}
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
import org.rostilos.codecrow.analysisengine.dto.request.processor.AnalysisProcessRequest;

import java.security.GeneralSecurityException;
import java.util.List;
import java.util.Optional;

/**
Expand All @@ -31,4 +32,23 @@ AiAnalysisRequest buildAiAnalysisRequest(
AnalysisProcessRequest request,
Optional<CodeAnalysis> previousAnalysis
) throws GeneralSecurityException;

/**
* Builds an AI analysis request with full PR issue history.
*
* @param project The project being analyzed
* @param request The analysis process request
* @param previousAnalysis Optional previous analysis for incremental analysis (used for delta diff calculation)
* @param allPrAnalyses All analyses for this PR, ordered by version DESC (for issue history)
* @return The AI analysis request ready to be sent to the AI client
*/
default AiAnalysisRequest buildAiAnalysisRequest(
Project project,
AnalysisProcessRequest request,
Optional<CodeAnalysis> previousAnalysis,
List<CodeAnalysis> allPrAnalyses
) throws GeneralSecurityException {
// Default implementation falls back to the previous method for backward compatibility
return buildAiAnalysisRequest(project, request, previousAnalysis);
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
package org.rostilos.codecrow.analysisengine.util;

import com.knuddels.jtokkit.Encodings;
import com.knuddels.jtokkit.api.Encoding;
import com.knuddels.jtokkit.api.EncodingRegistry;
import com.knuddels.jtokkit.api.EncodingType;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**
* Utility class for estimating token counts in text content.
* Uses the cl100k_base encoding (used by GPT-4, Claude, and most modern LLMs).
*/
public class TokenEstimator {
private static final Logger log = LoggerFactory.getLogger(TokenEstimator.class);

private static final EncodingRegistry ENCODING_REGISTRY = Encodings.newDefaultEncodingRegistry();
private static final Encoding ENCODING = ENCODING_REGISTRY.getEncoding(EncodingType.CL100K_BASE);

/**
* Estimate the number of tokens in the given text.
*
* @param text The text to estimate tokens for
* @return The estimated token count, or 0 if text is null/empty
*/
public static int estimateTokens(String text) {
if (text == null || text.isEmpty()) {
return 0;
}
try {
return ENCODING.countTokens(text);
} catch (Exception e) {
log.warn("Failed to count tokens, using fallback estimation: {}", e.getMessage());
// Fallback: rough estimate of ~4 characters per token
return text.length() / 4;

Check notice on line 35 in java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/TokenEstimator.java

View check run for this annotation

CodeCrow-Local / CodeCrow Analysis

INFO severity issue

The fallback estimation uses a fixed ratio of 4 characters per token. While standard for English, this can be significantly inaccurate for other languages or code-heavy content.
Raw output
Suggested fix:
Consider using a slightly more conservative fallback or documenting the language assumption, though for a fallback this is generally acceptable.
}
}

/**
* Check if the estimated token count exceeds the given limit.
*
* @param text The text to check
* @param maxTokens The maximum allowed tokens
* @return true if the text exceeds the limit, false otherwise
*/
public static boolean exceedsLimit(String text, int maxTokens) {
return estimateTokens(text) > maxTokens;
}
Comment on lines +46 to +48
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

rg -n "maxAnalysisTokenLimit" -C 2

Repository: rostilos/CodeCrow

Length of output: 28269


🏁 Script executed:

cat -n java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/TokenEstimator.java

Repository: rostilos/CodeCrow

Length of output: 3548


Inconsistent handling of maxTokens <= 0 between exceedsLimit and estimateAndCheck.

The exceedsLimit method lacks the defensive check present in estimateAndCheck. While estimateAndCheck guards against division by zero (line 75), it doesn't apply the same guard to the exceedsLimit boolean logic (line 79). If maxTokens <= 0, both methods will incorrectly mark any non-empty text as exceeding the limit. Add a guard to exceedsLimit for consistency, or apply it to the exceedsLimit field in estimateAndCheck:

Suggested fix
 public static boolean exceedsLimit(String text, int maxTokens) {
-    return estimateTokens(text) > maxTokens;
+    return maxTokens > 0 && estimateTokens(text) > maxTokens;
 }
@@
 public static TokenEstimationResult estimateAndCheck(String text, int maxTokens) {
     int estimated = estimateTokens(text);
     double utilization = maxTokens > 0 ? (estimated * 100.0 / maxTokens) : 0;
     return new TokenEstimationResult(
         estimated,
         maxTokens,
-        estimated > maxTokens,
+        maxTokens > 0 && estimated > maxTokens,
         utilization
     );
 }
🤖 Prompt for AI Agents
In
`@java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/TokenEstimator.java`
around lines 46 - 48, The exceedsLimit method should apply the same defensive
check as estimateAndCheck: ensure maxTokens > 0 before comparing; update
exceedsLimit(String text, int maxTokens) to return false when maxTokens <= 0 (or
use the same boolean expression pattern used in estimateAndCheck) and then
compare estimateTokens(text) > maxTokens, referencing the exceedsLimit and
estimateAndCheck methods and the estimateTokens call to keep behavior consistent
and avoid the incorrect true result for non-empty text when maxTokens <= 0.


/**
* Result of a token estimation check with details.
*/
public record TokenEstimationResult(
int estimatedTokens,
int maxAllowedTokens,
boolean exceedsLimit,
double utilizationPercentage
) {
public String toLogString() {
return String.format("Tokens: %d / %d (%.1f%%) - %s",
estimatedTokens, maxAllowedTokens, utilizationPercentage,
exceedsLimit ? "EXCEEDS LIMIT" : "within limit");
}
}

/**
* Estimate tokens and check against limit, returning detailed result.
*
* @param text The text to check
* @param maxTokens The maximum allowed tokens
* @return Detailed estimation result
*/
public static TokenEstimationResult estimateAndCheck(String text, int maxTokens) {
int estimated = estimateTokens(text);
double utilization = maxTokens > 0 ? (estimated * 100.0 / maxTokens) : 0;
return new TokenEstimationResult(
estimated,
maxTokens,
estimated > maxTokens,
utilization
);
}
}
Loading