Release model context fix and branching feature#326
Conversation
model context size checking
add branching
There was a problem hiding this comment.
Pull request overview
This PR introduces a comprehensive conversation branching feature and improves the model context/token management system. The branching feature allows users to fork conversations at any point, edit messages, regenerate responses, and manage multiple conversation paths. The context management improvements include better token budget calculation, content truncation for oversized inputs, and CJK character support.
Key Changes:
- Conversation Branching: New branch CRUD operations (create, list, activate, delete) with fork-and-swap semantics for edit/regenerate/delete operations
- Token Budget Management: Refactored token estimation with explicit budget validation, support for tool schemas, images, and CJK text
- Message Actions: Edit, regenerate, and delete operations that create branches and promote them to MAIN while preserving history
Reviewed changes
Copilot reviewed 21 out of 21 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/automation/conversations-postman-scripts.json | Comprehensive test suite (19 tests) for branching feature covering create, list, activate, delete, edit, and regenerate operations |
| services/response-api/internal/domain/llm/message_trimmer.go | Simplified trimming logic to remove oldest items chronologically instead of role-based prioritization |
| services/llm-api/internal/utils/platformerrors/errors.go | Added IsValidationError helper function for error type checking |
| services/llm-api/internal/interfaces/httpserver/routes/v1/v1_route.go | Registered new BranchRoute in v1 API |
| services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/conversation_route.go | Updated delete item documentation to reflect branch-based deletion |
| services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/branch_route.go | New file implementing branch endpoints (list, create, get, delete, activate) and message actions (edit, regenerate) |
| services/llm-api/internal/interfaces/httpserver/routes/v1/chat/completion_route.go | Added logging and validation error handling for chat completions |
| services/llm-api/internal/interfaces/httpserver/routes/routes_provider.go | Added BranchHandler to wire dependency injection |
| services/llm-api/internal/interfaces/httpserver/responses/chat/chat.go | Added Trimmed boolean field to indicate if messages were trimmed |
| services/llm-api/internal/interfaces/httpserver/requests/conversation/conversation.go | Added Branch query parameter for filtering items by branch |
| services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/conversation_handler.go | Updated to support branch parameter in ListItems and changed DeleteItem to use branching |
| services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/branch_handler.go | New handler implementing branch operations with proper validation and response formatting |
| services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go | Major refactor with TokenBudget struct, tool/image token estimation, CJK support, and content truncation |
| services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go | Integrated token budget validation, user input size validation, duplicate message detection for branches |
| services/llm-api/internal/infrastructure/database/repository/conversationrepo/conversation_repository.go | Implemented all branch repository operations including SwapBranchToMain for edit/regenerate semantics |
| services/llm-api/internal/domain/provider.go | Added MessageActionService to service provider |
| services/llm-api/internal/domain/conversation/message_action_service.go | New service implementing edit, regenerate, and delete operations with fork-and-swap semantics |
| services/llm-api/internal/domain/conversation/item.go | Added Branch field to ItemFilter for branch-aware queries |
| services/llm-api/internal/domain/conversation/conversation_service.go | Updated AddItemsToConversation to use branch-aware repository methods |
| services/llm-api/internal/domain/conversation/conversation.go | Added SwapBranchToMain to repository interface and CreateBranchMetadata helper |
| services/llm-api/cmd/server/wire_gen.go | Updated wire dependency injection for new handlers and services |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if userItem != nil { | ||
| skipUserItem := false | ||
|
|
||
| // Get the last item in the branch to check for duplicates | ||
| existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil) | ||
| if err == nil && len(existingItems) > 0 { | ||
| lastItem := existingItems[len(existingItems)-1] | ||
| // If the last item is a user message, check if it has the same content | ||
| if lastItem.Role != nil && *lastItem.Role == conversation.ItemRoleUser { | ||
| // Compare content - if it's the same, skip adding | ||
| if h.isSameMessageContent(userItem, &lastItem) { | ||
| skipUserItem = true | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The duplicate detection logic using isSameMessageContent could fail to detect duplicates when the content format differs slightly (e.g., whitespace differences in JSON-formatted content, or when one message has TextString and another has Text.Text). The comparison only checks if normalized text matches, but the extractTextFromContent function tries multiple content field types which may not be semantically equivalent. Consider a more robust comparison that accounts for all content variations or log when duplicate detection fails.
| // Rename the promoted branch metadata to MAIN | ||
| _, err = q.ConversationBranch.WithContext(ctx). | ||
| Where(q.ConversationBranch.ConversationID.Eq(conversationID)). | ||
| Where(q.ConversationBranch.Name.Eq(branchToPromote)). | ||
| Update(q.ConversationBranch.Name, "MAIN") | ||
| if err != nil { | ||
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN") | ||
| } | ||
|
|
||
| // Update all items in promoted branch to use MAIN | ||
| _, err = q.ConversationItem.WithContext(ctx). | ||
| Where(q.ConversationItem.ConversationID.Eq(conversationID)). | ||
| Where(q.ConversationItem.Branch.Eq(branchToPromote)). | ||
| Update(q.ConversationItem.Branch, "MAIN") | ||
| if err != nil { | ||
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN") | ||
| } | ||
|
|
||
| // Set MAIN as active branch | ||
| _, err = q.Conversation.WithContext(ctx). | ||
| Where(q.Conversation.ID.Eq(conversationID)). | ||
| Update(q.Conversation.ActiveBranch, "MAIN") | ||
| if err != nil { | ||
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN") |
There was a problem hiding this comment.
The SwapBranchToMain function performs multiple database operations (renaming branches, updating items) without transaction management. If any of these operations fail partway through, the database could be left in an inconsistent state where items point to the wrong branch or branch metadata is out of sync. This is a critical operation that should be wrapped in a transaction to ensure atomicity.
| // Rename the promoted branch metadata to MAIN | |
| _, err = q.ConversationBranch.WithContext(ctx). | |
| Where(q.ConversationBranch.ConversationID.Eq(conversationID)). | |
| Where(q.ConversationBranch.Name.Eq(branchToPromote)). | |
| Update(q.ConversationBranch.Name, "MAIN") | |
| if err != nil { | |
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN") | |
| } | |
| // Update all items in promoted branch to use MAIN | |
| _, err = q.ConversationItem.WithContext(ctx). | |
| Where(q.ConversationItem.ConversationID.Eq(conversationID)). | |
| Where(q.ConversationItem.Branch.Eq(branchToPromote)). | |
| Update(q.ConversationItem.Branch, "MAIN") | |
| if err != nil { | |
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN") | |
| } | |
| // Set MAIN as active branch | |
| _, err = q.Conversation.WithContext(ctx). | |
| Where(q.Conversation.ID.Eq(conversationID)). | |
| Update(q.Conversation.ActiveBranch, "MAIN") | |
| if err != nil { | |
| return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN") | |
| // Perform final branch swap updates in a single transaction to ensure atomicity | |
| err = q.Transaction(func(tx *gormgen.Query) error { | |
| // Rename the promoted branch metadata to MAIN | |
| _, err := tx.ConversationBranch.WithContext(ctx). | |
| Where(tx.ConversationBranch.ConversationID.Eq(conversationID)). | |
| Where(tx.ConversationBranch.Name.Eq(branchToPromote)). | |
| Update(tx.ConversationBranch.Name, "MAIN") | |
| if err != nil { | |
| return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN") | |
| } | |
| // Update all items in promoted branch to use MAIN | |
| _, err = tx.ConversationItem.WithContext(ctx). | |
| Where(tx.ConversationItem.ConversationID.Eq(conversationID)). | |
| Where(tx.ConversationItem.Branch.Eq(branchToPromote)). | |
| Update(tx.ConversationItem.Branch, "MAIN") | |
| if err != nil { | |
| return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN") | |
| } | |
| // Set MAIN as active branch | |
| _, err = tx.Conversation.WithContext(ctx). | |
| Where(tx.Conversation.ID.Eq(conversationID)). | |
| Update(tx.Conversation.ActiveBranch, "MAIN") | |
| if err != nil { | |
| return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN") | |
| } | |
| return nil | |
| }) | |
| if err != nil { | |
| return "", err |
| } else { | ||
| // No MAIN items exist, no backup needed | ||
| oldMainBackupName = "" | ||
| } |
There was a problem hiding this comment.
When no MAIN items exist, oldMainBackupName is set to an empty string, but the code later checks if oldMainBackupName is not empty before updating items. However, if there are MAIN items but the branch record doesn't exist and count returns 0 incorrectly (due to a race condition or database issue), the backup would be skipped silently. This could lead to data loss. Consider adding logging or validation to ensure the count matches expectations.
| existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil) | ||
| if err == nil && len(existingItems) > 0 { | ||
| lastItem := existingItems[len(existingItems)-1] |
There was a problem hiding this comment.
The duplicate detection fetches all existing items from the branch using GetConversationItems with nil pagination. For conversations with thousands of items, this could be a significant performance overhead just to check if the last item is a duplicate. Consider optimizing by only fetching the last N items (e.g., last 5) or adding a dedicated repository method to get just the last item from a branch.
| const ( | ||
| // DefaultContextLength is used when model context length is unknown. | ||
| DefaultContextLength = 128000 // 128k tokens as fallback | ||
| DefaultContextLength = 220000 // 220k tokens as fallback |
There was a problem hiding this comment.
The DefaultContextLength has been increased from 128000 to 220000 tokens. This is a significant increase that could cause issues if models with smaller context windows are used. If a model has a context window of 128k tokens but the code uses 220k as the default, it will cause token budget validation failures or trimming issues. Consider whether this default should be model-specific or if there's a safer fallback value.
| DefaultContextLength = 220000 // 220k tokens as fallback | |
| DefaultContextLength = 128000 // 128k tokens as a safer conservative fallback |
| // TrimMessagesToFitContext removes oldest conversation items to fit within the context length limit. | ||
| // Removes oldest non-system messages first, regardless of role (user, assistant, tool). | ||
| // Never removes: system prompts at index 0 |
There was a problem hiding this comment.
The comment states "Removes oldest conversation items first, regardless of role (user, assistant, tool)" but then the code at line 95 explicitly skips system messages with a continue statement. This is inconsistent - the comment should clarify that system messages are never removed, not just "system prompts at index 0".
| // Calculate response reserve | ||
| if b.MaxCompletionTokens > 0 { | ||
| b.ResponseReserve = b.MaxCompletionTokens | ||
| } else { | ||
| b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio)) | ||
| } | ||
|
|
||
| // Calculate available space for messages | ||
| b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead | ||
|
|
||
| // Hard floor check: if available space is too small, return error | ||
| if b.AvailableForMessages < MinMessagesTokenFloor { | ||
| return fmt.Errorf( | ||
| "token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context", | ||
| b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead, | ||
| b.AvailableForMessages, MinMessagesTokenFloor, | ||
| ) | ||
| } | ||
|
|
There was a problem hiding this comment.
The token budget validation can fail with a descriptive error message, but the function proceeds to calculate AvailableForMessages which may be negative. When AvailableForMessages is less than MinMessagesTokenFloor, the error message is constructed and returned, but the struct fields have already been modified. If the budget validation fails and the error is ignored upstream, negative values could propagate. Consider returning early before modifying the struct fields if the budget is invalid.
| // Calculate response reserve | |
| if b.MaxCompletionTokens > 0 { | |
| b.ResponseReserve = b.MaxCompletionTokens | |
| } else { | |
| b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio)) | |
| } | |
| // Calculate available space for messages | |
| b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead | |
| // Hard floor check: if available space is too small, return error | |
| if b.AvailableForMessages < MinMessagesTokenFloor { | |
| return fmt.Errorf( | |
| "token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context", | |
| b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead, | |
| b.AvailableForMessages, MinMessagesTokenFloor, | |
| ) | |
| } | |
| // Calculate response reserve using a local variable to avoid mutating | |
| // the struct until we've confirmed the budget is valid. | |
| var responseReserve int | |
| if b.MaxCompletionTokens > 0 { | |
| responseReserve = b.MaxCompletionTokens | |
| } else { | |
| responseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio)) | |
| } | |
| // Calculate available space for messages (also using a local variable). | |
| availableForMessages := b.ContextLength - b.ToolsTokens - responseReserve - b.FixedOverhead | |
| // Hard floor check: if available space is too small, return error | |
| if availableForMessages < MinMessagesTokenFloor { | |
| return fmt.Errorf( | |
| "token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context", | |
| b.ContextLength, b.ToolsTokens, responseReserve, b.FixedOverhead, | |
| availableForMessages, MinMessagesTokenFloor, | |
| ) | |
| } | |
| // Only commit computed values to the struct once validation passes. | |
| b.ResponseReserve = responseReserve | |
| b.AvailableForMessages = availableForMessages |
| // ValidateUserInputSize checks if the last user message (current input) exceeds MaxUserContentTokens. | ||
| // Returns an error if the user input is too large, preventing the request from proceeding. | ||
| // This only validates the LAST user message (current input), not historical messages. | ||
| func ValidateUserInputSize(messages []openai.ChatCompletionMessage) error { |
There was a problem hiding this comment.
The comment states "This only validates the LAST user message (current input), not historical messages" which correctly describes the intent. However, the function iterates backwards through ALL messages looking for the last user message. If there are multiple user messages in the conversation history, this could be confusing. Consider adding a comment explaining why we iterate backwards instead of just checking the last message in the array.
| // truncateTextPreservingJSON truncates text content while trying to preserve JSON structure. | ||
| // If content is JSON-stringified MultiContent, it parses and truncates the nested text fields. | ||
| func truncateTextPreservingJSON(content string, maxTokens int) (string, bool) { | ||
| maxChars := maxTokens * TokenEstimateRatio | ||
| trimmed := strings.TrimSpace(content) | ||
|
|
||
| // Check if it looks like a JSON array (MultiContent format) | ||
| if len(trimmed) > 0 && trimmed[0] == '[' { | ||
| var parts []map[string]interface{} | ||
| if err := json.Unmarshal([]byte(content), &parts); err == nil { | ||
| // Successfully parsed as JSON array - truncate nested text fields | ||
| modified := false | ||
| for i := range parts { | ||
| if textVal, ok := parts[i]["text"]; ok { | ||
| if textStr, ok := textVal.(string); ok { | ||
| textTokens := estimateTokenCount(textStr) | ||
| if textTokens > maxTokens { | ||
| textRunes := []rune(textStr) | ||
| if len(textRunes) > maxChars { | ||
| parts[i]["text"] = string(textRunes[:maxChars]) + "\n\n[Content truncated]" | ||
| modified = true | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if modified { | ||
| if newContent, err := json.Marshal(parts); err == nil { | ||
| return string(newContent), true | ||
| } | ||
| } | ||
| return content, false | ||
| } | ||
| } | ||
|
|
||
| // Check if it looks like a JSON object with nested content | ||
| if len(trimmed) > 0 && trimmed[0] == '{' { | ||
| var obj map[string]interface{} | ||
| if err := json.Unmarshal([]byte(content), &obj); err == nil { | ||
| modified := false | ||
| // Truncate common large text fields | ||
| for _, key := range []string{"text", "content", "markdown", "raw_text", "body"} { | ||
| if textVal, ok := obj[key]; ok { | ||
| if textStr, ok := textVal.(string); ok { | ||
| textTokens := estimateTokenCount(textStr) | ||
| if textTokens > maxTokens { | ||
| textRunes := []rune(textStr) | ||
| if len(textRunes) > maxChars { | ||
| obj[key] = string(textRunes[:maxChars]) + "\n\n[Content truncated]" | ||
| modified = true | ||
| } | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if modified { | ||
| if newContent, err := json.Marshal(obj); err == nil { | ||
| return string(newContent), true | ||
| } | ||
| } | ||
| return content, false | ||
| } | ||
| } | ||
|
|
||
| // Plain text - simple truncation | ||
| runes := []rune(content) | ||
| if len(runes) > maxChars { | ||
| return string(runes[:maxChars]) + "\n\n[Content truncated - exceeded " + strconv.Itoa(maxTokens) + " token limit]", true | ||
| } | ||
| return content, false | ||
| } |
There was a problem hiding this comment.
The truncateTextPreservingJSON function has deeply nested conditional logic with multiple JSON parsing attempts. This creates high cyclomatic complexity and makes the function hard to test and maintain. Consider refactoring into separate functions for each truncation strategy: truncateJSONArray, truncateJSONObject, truncatePlainText. This would improve readability and allow for easier unit testing of each strategy.
|
|
||
| // isImageType checks if a map represents an image content type. | ||
| func isImageType(item map[string]any) bool { | ||
| for _, key := range []string{"type", "kind", "contentType"} { | ||
| if val, ok := item[key].(string); ok { | ||
| if val == "image" || strings.HasPrefix(val, "image/") { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| // hasImageDataURL checks if a map contains an image data URL. | ||
| func hasImageDataURL(item map[string]any) bool { | ||
| for _, key := range []string{"data", "url", "src", "imageUrl"} { | ||
| if val, ok := item[key].(string); ok { | ||
| if strings.HasPrefix(val, "data:image/") { | ||
| return true | ||
| } | ||
| } | ||
| } | ||
| return false | ||
| } | ||
|
|
||
| // countDataURLImages counts data URL images in non-JSON content (fallback). | ||
| func countDataURLImages(content string) int { | ||
| count := 0 | ||
| count += strings.Count(content, "data:image/png;base64") | ||
| count += strings.Count(content, "data:image/jpeg;base64") | ||
| count += strings.Count(content, "data:image/webp;base64") | ||
| count += strings.Count(content, "data:image/gif;base64") | ||
| return count |
There was a problem hiding this comment.
The image detection logic in countImagesInToolResult tries multiple approaches (JSON array parsing, fallback string counting) which is good, but the code is complex and deeply nested. Consider extracting the JSON parsing logic and string counting into separate helper functions to improve readability and testability. This would also make it easier to add new image format detection in the future.
Release model context fix and branching feature