Skip to content

Release model context fix and branching feature#326

Merged
locnguyen1986 merged 6 commits intoreleasefrom
main
Dec 19, 2025
Merged

Release model context fix and branching feature#326
locnguyen1986 merged 6 commits intoreleasefrom
main

Conversation

@locnguyen1986
Copy link
Collaborator

Release model context fix and branching feature

Copilot AI review requested due to automatic review settings December 19, 2025 02:25
@locnguyen1986 locnguyen1986 merged commit d9deed8 into release Dec 19, 2025
11 of 12 checks passed
@github-project-automation github-project-automation bot moved this to QA in Jan Dec 19, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a comprehensive conversation branching feature and improves the model context/token management system. The branching feature allows users to fork conversations at any point, edit messages, regenerate responses, and manage multiple conversation paths. The context management improvements include better token budget calculation, content truncation for oversized inputs, and CJK character support.

Key Changes:

  • Conversation Branching: New branch CRUD operations (create, list, activate, delete) with fork-and-swap semantics for edit/regenerate/delete operations
  • Token Budget Management: Refactored token estimation with explicit budget validation, support for tool schemas, images, and CJK text
  • Message Actions: Edit, regenerate, and delete operations that create branches and promote them to MAIN while preserving history

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
tests/automation/conversations-postman-scripts.json Comprehensive test suite (19 tests) for branching feature covering create, list, activate, delete, edit, and regenerate operations
services/response-api/internal/domain/llm/message_trimmer.go Simplified trimming logic to remove oldest items chronologically instead of role-based prioritization
services/llm-api/internal/utils/platformerrors/errors.go Added IsValidationError helper function for error type checking
services/llm-api/internal/interfaces/httpserver/routes/v1/v1_route.go Registered new BranchRoute in v1 API
services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/conversation_route.go Updated delete item documentation to reflect branch-based deletion
services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/branch_route.go New file implementing branch endpoints (list, create, get, delete, activate) and message actions (edit, regenerate)
services/llm-api/internal/interfaces/httpserver/routes/v1/chat/completion_route.go Added logging and validation error handling for chat completions
services/llm-api/internal/interfaces/httpserver/routes/routes_provider.go Added BranchHandler to wire dependency injection
services/llm-api/internal/interfaces/httpserver/responses/chat/chat.go Added Trimmed boolean field to indicate if messages were trimmed
services/llm-api/internal/interfaces/httpserver/requests/conversation/conversation.go Added Branch query parameter for filtering items by branch
services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/conversation_handler.go Updated to support branch parameter in ListItems and changed DeleteItem to use branching
services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/branch_handler.go New handler implementing branch operations with proper validation and response formatting
services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go Major refactor with TokenBudget struct, tool/image token estimation, CJK support, and content truncation
services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go Integrated token budget validation, user input size validation, duplicate message detection for branches
services/llm-api/internal/infrastructure/database/repository/conversationrepo/conversation_repository.go Implemented all branch repository operations including SwapBranchToMain for edit/regenerate semantics
services/llm-api/internal/domain/provider.go Added MessageActionService to service provider
services/llm-api/internal/domain/conversation/message_action_service.go New service implementing edit, regenerate, and delete operations with fork-and-swap semantics
services/llm-api/internal/domain/conversation/item.go Added Branch field to ItemFilter for branch-aware queries
services/llm-api/internal/domain/conversation/conversation_service.go Updated AddItemsToConversation to use branch-aware repository methods
services/llm-api/internal/domain/conversation/conversation.go Added SwapBranchToMain to repository interface and CreateBranchMetadata helper
services/llm-api/cmd/server/wire_gen.go Updated wire dependency injection for new handlers and services

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1125 to +1139
if userItem != nil {
skipUserItem := false

// Get the last item in the branch to check for duplicates
existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil)
if err == nil && len(existingItems) > 0 {
lastItem := existingItems[len(existingItems)-1]
// If the last item is a user message, check if it has the same content
if lastItem.Role != nil && *lastItem.Role == conversation.ItemRoleUser {
// Compare content - if it's the same, skip adding
if h.isSameMessageContent(userItem, &lastItem) {
skipUserItem = true
}
}
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate detection logic using isSameMessageContent could fail to detect duplicates when the content format differs slightly (e.g., whitespace differences in JSON-formatted content, or when one message has TextString and another has Text.Text). The comparison only checks if normalized text matches, but the extractTextFromContent function tries multiple content field types which may not be semantically equivalent. Consider a more robust comparison that accounts for all content variations or log when duplicate detection fails.

Copilot uses AI. Check for mistakes.
Comment on lines +656 to +679
// Rename the promoted branch metadata to MAIN
_, err = q.ConversationBranch.WithContext(ctx).
Where(q.ConversationBranch.ConversationID.Eq(conversationID)).
Where(q.ConversationBranch.Name.Eq(branchToPromote)).
Update(q.ConversationBranch.Name, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")
}

// Update all items in promoted branch to use MAIN
_, err = q.ConversationItem.WithContext(ctx).
Where(q.ConversationItem.ConversationID.Eq(conversationID)).
Where(q.ConversationItem.Branch.Eq(branchToPromote)).
Update(q.ConversationItem.Branch, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")
}

// Set MAIN as active branch
_, err = q.Conversation.WithContext(ctx).
Where(q.Conversation.ID.Eq(conversationID)).
Update(q.Conversation.ActiveBranch, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SwapBranchToMain function performs multiple database operations (renaming branches, updating items) without transaction management. If any of these operations fail partway through, the database could be left in an inconsistent state where items point to the wrong branch or branch metadata is out of sync. This is a critical operation that should be wrapped in a transaction to ensure atomicity.

Suggested change
// Rename the promoted branch metadata to MAIN
_, err = q.ConversationBranch.WithContext(ctx).
Where(q.ConversationBranch.ConversationID.Eq(conversationID)).
Where(q.ConversationBranch.Name.Eq(branchToPromote)).
Update(q.ConversationBranch.Name, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")
}
// Update all items in promoted branch to use MAIN
_, err = q.ConversationItem.WithContext(ctx).
Where(q.ConversationItem.ConversationID.Eq(conversationID)).
Where(q.ConversationItem.Branch.Eq(branchToPromote)).
Update(q.ConversationItem.Branch, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")
}
// Set MAIN as active branch
_, err = q.Conversation.WithContext(ctx).
Where(q.Conversation.ID.Eq(conversationID)).
Update(q.Conversation.ActiveBranch, "MAIN")
if err != nil {
return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")
// Perform final branch swap updates in a single transaction to ensure atomicity
err = q.Transaction(func(tx *gormgen.Query) error {
// Rename the promoted branch metadata to MAIN
_, err := tx.ConversationBranch.WithContext(ctx).
Where(tx.ConversationBranch.ConversationID.Eq(conversationID)).
Where(tx.ConversationBranch.Name.Eq(branchToPromote)).
Update(tx.ConversationBranch.Name, "MAIN")
if err != nil {
return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")
}
// Update all items in promoted branch to use MAIN
_, err = tx.ConversationItem.WithContext(ctx).
Where(tx.ConversationItem.ConversationID.Eq(conversationID)).
Where(tx.ConversationItem.Branch.Eq(branchToPromote)).
Update(tx.ConversationItem.Branch, "MAIN")
if err != nil {
return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")
}
// Set MAIN as active branch
_, err = tx.Conversation.WithContext(ctx).
Where(tx.Conversation.ID.Eq(conversationID)).
Update(tx.Conversation.ActiveBranch, "MAIN")
if err != nil {
return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")
}
return nil
})
if err != nil {
return "", err

Copilot uses AI. Check for mistakes.
Comment on lines +639 to +642
} else {
// No MAIN items exist, no backup needed
oldMainBackupName = ""
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When no MAIN items exist, oldMainBackupName is set to an empty string, but the code later checks if oldMainBackupName is not empty before updating items. However, if there are MAIN items but the branch record doesn't exist and count returns 0 incorrectly (due to a race condition or database issue), the backup would be skipped silently. This could lead to data loss. Consider adding logging or validation to ensure the count matches expectations.

Copilot uses AI. Check for mistakes.
Comment on lines +1129 to +1131
existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil)
if err == nil && len(existingItems) > 0 {
lastItem := existingItems[len(existingItems)-1]
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The duplicate detection fetches all existing items from the branch using GetConversationItems with nil pagination. For conversations with thousands of items, this could be a significant performance overhead just to check if the last item is a duplicate. Consider optimizing by only fetching the last N items (e.g., last 5) or adding a dedicated repository method to get just the last item from a branch.

Copilot uses AI. Check for mistakes.
const (
// DefaultContextLength is used when model context length is unknown.
DefaultContextLength = 128000 // 128k tokens as fallback
DefaultContextLength = 220000 // 220k tokens as fallback
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DefaultContextLength has been increased from 128000 to 220000 tokens. This is a significant increase that could cause issues if models with smaller context windows are used. If a model has a context window of 128k tokens but the code uses 220k as the default, it will cause token budget validation failures or trimming issues. Consider whether this default should be model-specific or if there's a safer fallback value.

Suggested change
DefaultContextLength = 220000 // 220k tokens as fallback
DefaultContextLength = 128000 // 128k tokens as a safer conservative fallback

Copilot uses AI. Check for mistakes.
Comment on lines +63 to +65
// TrimMessagesToFitContext removes oldest conversation items to fit within the context length limit.
// Removes oldest non-system messages first, regardless of role (user, assistant, tool).
// Never removes: system prompts at index 0
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "Removes oldest conversation items first, regardless of role (user, assistant, tool)" but then the code at line 95 explicitly skips system messages with a continue statement. This is inconsistent - the comment should clarify that system messages are never removed, not just "system prompts at index 0".

Copilot uses AI. Check for mistakes.
Comment on lines +81 to +99
// Calculate response reserve
if b.MaxCompletionTokens > 0 {
b.ResponseReserve = b.MaxCompletionTokens
} else {
b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))
}

// Calculate available space for messages
b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead

// Hard floor check: if available space is too small, return error
if b.AvailableForMessages < MinMessagesTokenFloor {
return fmt.Errorf(
"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",
b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead,
b.AvailableForMessages, MinMessagesTokenFloor,
)
}

Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token budget validation can fail with a descriptive error message, but the function proceeds to calculate AvailableForMessages which may be negative. When AvailableForMessages is less than MinMessagesTokenFloor, the error message is constructed and returned, but the struct fields have already been modified. If the budget validation fails and the error is ignored upstream, negative values could propagate. Consider returning early before modifying the struct fields if the budget is invalid.

Suggested change
// Calculate response reserve
if b.MaxCompletionTokens > 0 {
b.ResponseReserve = b.MaxCompletionTokens
} else {
b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))
}
// Calculate available space for messages
b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead
// Hard floor check: if available space is too small, return error
if b.AvailableForMessages < MinMessagesTokenFloor {
return fmt.Errorf(
"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",
b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead,
b.AvailableForMessages, MinMessagesTokenFloor,
)
}
// Calculate response reserve using a local variable to avoid mutating
// the struct until we've confirmed the budget is valid.
var responseReserve int
if b.MaxCompletionTokens > 0 {
responseReserve = b.MaxCompletionTokens
} else {
responseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))
}
// Calculate available space for messages (also using a local variable).
availableForMessages := b.ContextLength - b.ToolsTokens - responseReserve - b.FixedOverhead
// Hard floor check: if available space is too small, return error
if availableForMessages < MinMessagesTokenFloor {
return fmt.Errorf(
"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",
b.ContextLength, b.ToolsTokens, responseReserve, b.FixedOverhead,
availableForMessages, MinMessagesTokenFloor,
)
}
// Only commit computed values to the struct once validation passes.
b.ResponseReserve = responseReserve
b.AvailableForMessages = availableForMessages

Copilot uses AI. Check for mistakes.
Comment on lines +530 to +533
// ValidateUserInputSize checks if the last user message (current input) exceeds MaxUserContentTokens.
// Returns an error if the user input is too large, preventing the request from proceeding.
// This only validates the LAST user message (current input), not historical messages.
func ValidateUserInputSize(messages []openai.ChatCompletionMessage) error {
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states "This only validates the LAST user message (current input), not historical messages" which correctly describes the intent. However, the function iterates backwards through ALL messages looking for the last user message. If there are multiple user messages in the conversation history, this could be confusing. Consider adding a comment explaining why we iterate backwards instead of just checking the last message in the array.

Copilot uses AI. Check for mistakes.
Comment on lines +368 to +438
// truncateTextPreservingJSON truncates text content while trying to preserve JSON structure.
// If content is JSON-stringified MultiContent, it parses and truncates the nested text fields.
func truncateTextPreservingJSON(content string, maxTokens int) (string, bool) {
maxChars := maxTokens * TokenEstimateRatio
trimmed := strings.TrimSpace(content)

// Check if it looks like a JSON array (MultiContent format)
if len(trimmed) > 0 && trimmed[0] == '[' {
var parts []map[string]interface{}
if err := json.Unmarshal([]byte(content), &parts); err == nil {
// Successfully parsed as JSON array - truncate nested text fields
modified := false
for i := range parts {
if textVal, ok := parts[i]["text"]; ok {
if textStr, ok := textVal.(string); ok {
textTokens := estimateTokenCount(textStr)
if textTokens > maxTokens {
textRunes := []rune(textStr)
if len(textRunes) > maxChars {
parts[i]["text"] = string(textRunes[:maxChars]) + "\n\n[Content truncated]"
modified = true
}
}
}
}
}
if modified {
if newContent, err := json.Marshal(parts); err == nil {
return string(newContent), true
}
}
return content, false
}
}

// Check if it looks like a JSON object with nested content
if len(trimmed) > 0 && trimmed[0] == '{' {
var obj map[string]interface{}
if err := json.Unmarshal([]byte(content), &obj); err == nil {
modified := false
// Truncate common large text fields
for _, key := range []string{"text", "content", "markdown", "raw_text", "body"} {
if textVal, ok := obj[key]; ok {
if textStr, ok := textVal.(string); ok {
textTokens := estimateTokenCount(textStr)
if textTokens > maxTokens {
textRunes := []rune(textStr)
if len(textRunes) > maxChars {
obj[key] = string(textRunes[:maxChars]) + "\n\n[Content truncated]"
modified = true
}
}
}
}
}
if modified {
if newContent, err := json.Marshal(obj); err == nil {
return string(newContent), true
}
}
return content, false
}
}

// Plain text - simple truncation
runes := []rune(content)
if len(runes) > maxChars {
return string(runes[:maxChars]) + "\n\n[Content truncated - exceeded " + strconv.Itoa(maxTokens) + " token limit]", true
}
return content, false
}
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The truncateTextPreservingJSON function has deeply nested conditional logic with multiple JSON parsing attempts. This creates high cyclomatic complexity and makes the function hard to test and maintain. Consider refactoring into separate functions for each truncation strategy: truncateJSONArray, truncateJSONObject, truncatePlainText. This would improve readability and allow for easier unit testing of each strategy.

Copilot uses AI. Check for mistakes.
Comment on lines +233 to +265

// isImageType checks if a map represents an image content type.
func isImageType(item map[string]any) bool {
for _, key := range []string{"type", "kind", "contentType"} {
if val, ok := item[key].(string); ok {
if val == "image" || strings.HasPrefix(val, "image/") {
return true
}
}
}
return false
}

// hasImageDataURL checks if a map contains an image data URL.
func hasImageDataURL(item map[string]any) bool {
for _, key := range []string{"data", "url", "src", "imageUrl"} {
if val, ok := item[key].(string); ok {
if strings.HasPrefix(val, "data:image/") {
return true
}
}
}
return false
}

// countDataURLImages counts data URL images in non-JSON content (fallback).
func countDataURLImages(content string) int {
count := 0
count += strings.Count(content, "data:image/png;base64")
count += strings.Count(content, "data:image/jpeg;base64")
count += strings.Count(content, "data:image/webp;base64")
count += strings.Count(content, "data:image/gif;base64")
return count
Copy link

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The image detection logic in countImagesInToolResult tries multiple approaches (JSON array parsing, fallback string counting) which is good, but the code is complex and deeply nested. Consider extracting the JSON parsing logic and string counting into separate helper functions to improve readability and testability. This would also make it easier to add new image format detection in the future.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: QA

Development

Successfully merging this pull request may close these issues.

3 participants