-
Notifications
You must be signed in to change notification settings - Fork 0
1.2.1 rc #88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
1.2.1 rc #88
Changes from all commits
6c80884
b5cab46
d3a4c5f
3bc5967
8f3a3bb
28a4007
efc42d2
c673dce
13a63c8
adb68cb
95d74e1
85c47e3
e509ffc
ead4b08
a2889c7
7a30044
6fb5693
c6337aa
56761fb
8fc46f3
7c78057
b7be7fe
6d80d71
e2c1474
342c4fa
409c42d
11c983c
c75eaba
8afc0ad
402486b
fdcdca0
e321361
ebd0fad
092b361
61d2620
704a7a2
2e42ebc
d036fa9
642bda0
5add89c
1fc484c
c03591e
0bb9ca8
585ab6d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,47 @@ | ||||||||||||||||||||||||
| package org.rostilos.codecrow.analysisengine.exception; | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||
| * Exception thrown when a diff exceeds the configured token limit for analysis. | ||||||||||||||||||||||||
| * This is a soft skip - the analysis is not performed but the job is not marked as failed. | ||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||
| public class DiffTooLargeException extends RuntimeException { | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| private final int estimatedTokens; | ||||||||||||||||||||||||
| private final int maxAllowedTokens; | ||||||||||||||||||||||||
| private final Long projectId; | ||||||||||||||||||||||||
| private final Long pullRequestId; | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| public DiffTooLargeException(int estimatedTokens, int maxAllowedTokens, Long projectId, Long pullRequestId) { | ||||||||||||||||||||||||
| super(String.format( | ||||||||||||||||||||||||
| "PR diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%d, PR=%d)", | ||||||||||||||||||||||||
| estimatedTokens, maxAllowedTokens, projectId, pullRequestId | ||||||||||||||||||||||||
| )); | ||||||||||||||||||||||||
|
Comment on lines
+14
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: cat -n "java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/exception/DiffTooLargeException.java"Repository: rostilos/CodeCrow Length of output: 1889 🏁 Script executed: rg -n "new DiffTooLargeException" -C 3Repository: rostilos/CodeCrow Length of output: 4371 🏁 Script executed: # Test String.format behavior with null values in Java
python3 << 'EOF'
# String.format behavior in Java:
# %d expects int, so if you pass null to %d, it will cause NPE
# %s with null is safe - it converts null to "null" string
print("Java String.format behavior:")
print("- %d with null: NPE (because %d expects primitive int)")
print("- %s with null: works fine, outputs 'null'")
print("\nHowever, Long is an Object that can be null.")
print("If you pass Long value to %d, Java auto-unboxes it.")
print("If the Long is null, auto-unboxing throws NPE before format is called.")
EOFRepository: rostilos/CodeCrow Length of output: 356 Replace When Safer formatting- super(String.format(
- "PR diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%d, PR=%d)",
- estimatedTokens, maxAllowedTokens, projectId, pullRequestId
- ));
+ super(String.format(
+ "Diff exceeds token limit: estimated %d tokens, max allowed %d tokens (project=%s, PR=%s)",
+ estimatedTokens, maxAllowedTokens, projectId, pullRequestId
+ ));📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||||
| this.estimatedTokens = estimatedTokens; | ||||||||||||||||||||||||
| this.maxAllowedTokens = maxAllowedTokens; | ||||||||||||||||||||||||
| this.projectId = projectId; | ||||||||||||||||||||||||
| this.pullRequestId = pullRequestId; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| public int getEstimatedTokens() { | ||||||||||||||||||||||||
| return estimatedTokens; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| public int getMaxAllowedTokens() { | ||||||||||||||||||||||||
| return maxAllowedTokens; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| public Long getProjectId() { | ||||||||||||||||||||||||
| return projectId; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| public Long getPullRequestId() { | ||||||||||||||||||||||||
| return pullRequestId; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
|
|
||||||||||||||||||||||||
| /** | ||||||||||||||||||||||||
| * Returns the percentage of the token limit that would be used. | ||||||||||||||||||||||||
| */ | ||||||||||||||||||||||||
| public double getUtilizationPercentage() { | ||||||||||||||||||||||||
| return maxAllowedTokens > 0 ? (estimatedTokens * 100.0 / maxAllowedTokens) : 0; | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
| } | ||||||||||||||||||||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| package org.rostilos.codecrow.analysisengine.util; | ||
|
|
||
| import com.knuddels.jtokkit.Encodings; | ||
| import com.knuddels.jtokkit.api.Encoding; | ||
| import com.knuddels.jtokkit.api.EncodingRegistry; | ||
| import com.knuddels.jtokkit.api.EncodingType; | ||
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| /** | ||
| * Utility class for estimating token counts in text content. | ||
| * Uses the cl100k_base encoding (used by GPT-4, Claude, and most modern LLMs). | ||
| */ | ||
| public class TokenEstimator { | ||
| private static final Logger log = LoggerFactory.getLogger(TokenEstimator.class); | ||
|
|
||
| private static final EncodingRegistry ENCODING_REGISTRY = Encodings.newDefaultEncodingRegistry(); | ||
| private static final Encoding ENCODING = ENCODING_REGISTRY.getEncoding(EncodingType.CL100K_BASE); | ||
|
|
||
| /** | ||
| * Estimate the number of tokens in the given text. | ||
| * | ||
| * @param text The text to estimate tokens for | ||
| * @return The estimated token count, or 0 if text is null/empty | ||
| */ | ||
| public static int estimateTokens(String text) { | ||
| if (text == null || text.isEmpty()) { | ||
| return 0; | ||
| } | ||
| try { | ||
| return ENCODING.countTokens(text); | ||
| } catch (Exception e) { | ||
| log.warn("Failed to count tokens, using fallback estimation: {}", e.getMessage()); | ||
| // Fallback: rough estimate of ~4 characters per token | ||
| return text.length() / 4; | ||
|
Check notice on line 35 in java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/TokenEstimator.java
|
||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Check if the estimated token count exceeds the given limit. | ||
| * | ||
| * @param text The text to check | ||
| * @param maxTokens The maximum allowed tokens | ||
| * @return true if the text exceeds the limit, false otherwise | ||
| */ | ||
| public static boolean exceedsLimit(String text, int maxTokens) { | ||
| return estimateTokens(text) > maxTokens; | ||
| } | ||
|
Comment on lines
+46
to
+48
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: rg -n "maxAnalysisTokenLimit" -C 2Repository: rostilos/CodeCrow Length of output: 28269 🏁 Script executed: cat -n java-ecosystem/libs/analysis-engine/src/main/java/org/rostilos/codecrow/analysisengine/util/TokenEstimator.javaRepository: rostilos/CodeCrow Length of output: 3548 Inconsistent handling of The Suggested fix public static boolean exceedsLimit(String text, int maxTokens) {
- return estimateTokens(text) > maxTokens;
+ return maxTokens > 0 && estimateTokens(text) > maxTokens;
}
@@
public static TokenEstimationResult estimateAndCheck(String text, int maxTokens) {
int estimated = estimateTokens(text);
double utilization = maxTokens > 0 ? (estimated * 100.0 / maxTokens) : 0;
return new TokenEstimationResult(
estimated,
maxTokens,
- estimated > maxTokens,
+ maxTokens > 0 && estimated > maxTokens,
utilization
);
}🤖 Prompt for AI Agents |
||
|
|
||
| /** | ||
| * Result of a token estimation check with details. | ||
| */ | ||
| public record TokenEstimationResult( | ||
| int estimatedTokens, | ||
| int maxAllowedTokens, | ||
| boolean exceedsLimit, | ||
| double utilizationPercentage | ||
| ) { | ||
| public String toLogString() { | ||
| return String.format("Tokens: %d / %d (%.1f%%) - %s", | ||
| estimatedTokens, maxAllowedTokens, utilizationPercentage, | ||
| exceedsLimit ? "EXCEEDS LIMIT" : "within limit"); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Estimate tokens and check against limit, returning detailed result. | ||
| * | ||
| * @param text The text to check | ||
| * @param maxTokens The maximum allowed tokens | ||
| * @return Detailed estimation result | ||
| */ | ||
| public static TokenEstimationResult estimateAndCheck(String text, int maxTokens) { | ||
| int estimated = estimateTokens(text); | ||
| double utilization = maxTokens > 0 ? (estimated * 100.0 / maxTokens) : 0; | ||
| return new TokenEstimationResult( | ||
| estimated, | ||
| maxTokens, | ||
| estimated > maxTokens, | ||
| utilization | ||
| ); | ||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 284
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 165
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 4362
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 5696
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 5060
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 5970
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 199
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 1080
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 2395
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 971
🏁 Script executed:
Repository: rostilos/CodeCrow
Length of output: 12371
Deduplication promised in Javadoc is not implemented.
The method flattens all issues from all PR analysis versions without deduplicating them. The Javadoc states "Issues are deduplicated by keeping only the most recent version of each issue," but the implementation contains no dedup logic. This creates duplicate entries in the stream when the same logical issue (identified by file, line, reason, severity, category) appears across multiple PR versions, inflating the data sent to the LLM and contradicting the documented behavior.
Use a composite key based on issue identity fields (file path + line number + reason + severity + category) to deduplicate while preserving the newest-first order:
Suggested implementation pattern
🤖 Prompt for AI Agents