Release model context fix and branching feature by locnguyen1986 · Pull Request #326 · janhq/server

locnguyen1986 · 2025-12-19T02:25:19Z

Release model context fix and branching feature

model context size checking

add branching

Copilot

Pull request overview

This PR introduces a comprehensive conversation branching feature and improves the model context/token management system. The branching feature allows users to fork conversations at any point, edit messages, regenerate responses, and manage multiple conversation paths. The context management improvements include better token budget calculation, content truncation for oversized inputs, and CJK character support.

Key Changes:

Conversation Branching: New branch CRUD operations (create, list, activate, delete) with fork-and-swap semantics for edit/regenerate/delete operations
Token Budget Management: Refactored token estimation with explicit budget validation, support for tool schemas, images, and CJK text
Message Actions: Edit, regenerate, and delete operations that create branches and promote them to MAIN while preserving history

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 16 comments.

Show a summary per file

File	Description
tests/automation/conversations-postman-scripts.json	Comprehensive test suite (19 tests) for branching feature covering create, list, activate, delete, edit, and regenerate operations
services/response-api/internal/domain/llm/message_trimmer.go	Simplified trimming logic to remove oldest items chronologically instead of role-based prioritization
services/llm-api/internal/utils/platformerrors/errors.go	Added IsValidationError helper function for error type checking
services/llm-api/internal/interfaces/httpserver/routes/v1/v1_route.go	Registered new BranchRoute in v1 API
services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/conversation_route.go	Updated delete item documentation to reflect branch-based deletion
services/llm-api/internal/interfaces/httpserver/routes/v1/conversation/branch_route.go	New file implementing branch endpoints (list, create, get, delete, activate) and message actions (edit, regenerate)
services/llm-api/internal/interfaces/httpserver/routes/v1/chat/completion_route.go	Added logging and validation error handling for chat completions
services/llm-api/internal/interfaces/httpserver/routes/routes_provider.go	Added BranchHandler to wire dependency injection
services/llm-api/internal/interfaces/httpserver/responses/chat/chat.go	Added Trimmed boolean field to indicate if messages were trimmed
services/llm-api/internal/interfaces/httpserver/requests/conversation/conversation.go	Added Branch query parameter for filtering items by branch
services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/conversation_handler.go	Updated to support branch parameter in ListItems and changed DeleteItem to use branching
services/llm-api/internal/interfaces/httpserver/handlers/conversationhandler/branch_handler.go	New handler implementing branch operations with proper validation and response formatting
services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go	Major refactor with TokenBudget struct, tool/image token estimation, CJK support, and content truncation
services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go	Integrated token budget validation, user input size validation, duplicate message detection for branches
services/llm-api/internal/infrastructure/database/repository/conversationrepo/conversation_repository.go	Implemented all branch repository operations including SwapBranchToMain for edit/regenerate semantics
services/llm-api/internal/domain/provider.go	Added MessageActionService to service provider
services/llm-api/internal/domain/conversation/message_action_service.go	New service implementing edit, regenerate, and delete operations with fork-and-swap semantics
services/llm-api/internal/domain/conversation/item.go	Added Branch field to ItemFilter for branch-aware queries
services/llm-api/internal/domain/conversation/conversation_service.go	Updated AddItemsToConversation to use branch-aware repository methods
services/llm-api/internal/domain/conversation/conversation.go	Added SwapBranchToMain to repository interface and CreateBranchMetadata helper
services/llm-api/cmd/server/wire_gen.go	Updated wire dependency injection for new handlers and services

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-19T02:30:56Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go

+	if userItem != nil {
+		skipUserItem := false
+
+		// Get the last item in the branch to check for duplicates
+		existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil)
+		if err == nil && len(existingItems) > 0 {
+			lastItem := existingItems[len(existingItems)-1]
+			// If the last item is a user message, check if it has the same content
+			if lastItem.Role != nil && *lastItem.Role == conversation.ItemRoleUser {
+				// Compare content - if it's the same, skip adding
+				if h.isSameMessageContent(userItem, &lastItem) {
+					skipUserItem = true
+				}
+			}
+		}


The duplicate detection logic using isSameMessageContent could fail to detect duplicates when the content format differs slightly (e.g., whitespace differences in JSON-formatted content, or when one message has TextString and another has Text.Text). The comparison only checks if normalized text matches, but the extractTextFromContent function tries multiple content field types which may not be semantically equivalent. Consider a more robust comparison that accounts for all content variations or log when duplicate detection fails.

Copilot · 2025-12-19T02:30:56Z

...-api/internal/infrastructure/database/repository/conversationrepo/conversation_repository.go

+	// Rename the promoted branch metadata to MAIN
+	_, err = q.ConversationBranch.WithContext(ctx).
+		Where(q.ConversationBranch.ConversationID.Eq(conversationID)).
+		Where(q.ConversationBranch.Name.Eq(branchToPromote)).
+		Update(q.ConversationBranch.Name, "MAIN")
+	if err != nil {
+		return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")
+	}
+
+	// Update all items in promoted branch to use MAIN
+	_, err = q.ConversationItem.WithContext(ctx).
+		Where(q.ConversationItem.ConversationID.Eq(conversationID)).
+		Where(q.ConversationItem.Branch.Eq(branchToPromote)).
+		Update(q.ConversationItem.Branch, "MAIN")
+	if err != nil {
+		return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")
+	}
+
+	// Set MAIN as active branch
+	_, err = q.Conversation.WithContext(ctx).
+		Where(q.Conversation.ID.Eq(conversationID)).
+		Update(q.Conversation.ActiveBranch, "MAIN")
+	if err != nil {
+		return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")


The SwapBranchToMain function performs multiple database operations (renaming branches, updating items) without transaction management. If any of these operations fail partway through, the database could be left in an inconsistent state where items point to the wrong branch or branch metadata is out of sync. This is a critical operation that should be wrapped in a transaction to ensure atomicity.

Suggested change

// Rename the promoted branch metadata to MAIN

_, err = q.ConversationBranch.WithContext(ctx).

Where(q.ConversationBranch.ConversationID.Eq(conversationID)).

Where(q.ConversationBranch.Name.Eq(branchToPromote)).

Update(q.ConversationBranch.Name, "MAIN")

if err != nil {

return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")

}

// Update all items in promoted branch to use MAIN

_, err = q.ConversationItem.WithContext(ctx).

Where(q.ConversationItem.ConversationID.Eq(conversationID)).

Where(q.ConversationItem.Branch.Eq(branchToPromote)).

Update(q.ConversationItem.Branch, "MAIN")

if err != nil {

return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")

}

// Set MAIN as active branch

_, err = q.Conversation.WithContext(ctx).

Where(q.Conversation.ID.Eq(conversationID)).

Update(q.Conversation.ActiveBranch, "MAIN")

if err != nil {

return "", platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")

// Perform final branch swap updates in a single transaction to ensure atomicity

err = q.Transaction(func(tx *gormgen.Query) error {

// Rename the promoted branch metadata to MAIN

_, err := tx.ConversationBranch.WithContext(ctx).

Where(tx.ConversationBranch.ConversationID.Eq(conversationID)).

Where(tx.ConversationBranch.Name.Eq(branchToPromote)).

Update(tx.ConversationBranch.Name, "MAIN")

if err != nil {

return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to rename branch to MAIN")

}

// Update all items in promoted branch to use MAIN

_, err = tx.ConversationItem.WithContext(ctx).

Where(tx.ConversationItem.ConversationID.Eq(conversationID)).

Where(tx.ConversationItem.Branch.Eq(branchToPromote)).

Update(tx.ConversationItem.Branch, "MAIN")

if err != nil {

return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to update promoted items to MAIN")

}

// Set MAIN as active branch

_, err = tx.Conversation.WithContext(ctx).

Where(tx.Conversation.ID.Eq(conversationID)).

Update(tx.Conversation.ActiveBranch, "MAIN")

if err != nil {

return platformerrors.AsError(ctx, platformerrors.LayerRepository, err, "failed to set active branch to MAIN")

}

return nil

})

if err != nil {

return "", err

Copilot · 2025-12-19T02:30:57Z

...-api/internal/infrastructure/database/repository/conversationrepo/conversation_repository.go

+		} else {
+			// No MAIN items exist, no backup needed
+			oldMainBackupName = ""
+		}


When no MAIN items exist, oldMainBackupName is set to an empty string, but the code later checks if oldMainBackupName is not empty before updating items. However, if there are MAIN items but the branch record doesn't exist and count returns 0 incorrectly (due to a race condition or database issue), the backup would be skipped silently. This could lead to data loss. Consider adding logging or validation to ensure the count matches expectations.

Copilot · 2025-12-19T02:30:57Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/chat_handler.go

+		existingItems, err := h.conversationService.GetConversationItems(ctx, conv, branchName, nil)
+		if err == nil && len(existingItems) > 0 {
+			lastItem := existingItems[len(existingItems)-1]


The duplicate detection fetches all existing items from the branch using GetConversationItems with nil pagination. For conversations with thousands of items, this could be a significant performance overhead just to check if the last item is a duplicate. Consider optimizing by only fetching the last N items (e.g., last 5) or adding a dedicated repository method to get just the last item from a branch.

Copilot · 2025-12-19T02:30:57Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go

 const (
 	// DefaultContextLength is used when model context length is unknown.
-	DefaultContextLength = 128000 // 128k tokens as fallback
+	DefaultContextLength = 220000 // 220k tokens as fallback


The DefaultContextLength has been increased from 128000 to 220000 tokens. This is a significant increase that could cause issues if models with smaller context windows are used. If a model has a context window of 128k tokens but the code uses 220k as the default, it will cause token budget validation failures or trimming issues. Consider whether this default should be model-specific or if there's a safer fallback value.

Suggested change

DefaultContextLength = 220000 // 220k tokens as fallback

DefaultContextLength = 128000 // 128k tokens as a safer conservative fallback

Copilot · 2025-12-19T02:30:59Z

services/response-api/internal/domain/llm/message_trimmer.go

+// TrimMessagesToFitContext removes oldest conversation items to fit within the context length limit.
+// Removes oldest non-system messages first, regardless of role (user, assistant, tool).
+// Never removes: system prompts at index 0


The comment states "Removes oldest conversation items first, regardless of role (user, assistant, tool)" but then the code at line 95 explicitly skips system messages with a continue statement. This is inconsistent - the comment should clarify that system messages are never removed, not just "system prompts at index 0".

Copilot · 2025-12-19T02:30:59Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go

+	// Calculate response reserve
+	if b.MaxCompletionTokens > 0 {
+		b.ResponseReserve = b.MaxCompletionTokens
+	} else {
+		b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))
+	}
+
+	// Calculate available space for messages
+	b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead
+
+	// Hard floor check: if available space is too small, return error
+	if b.AvailableForMessages < MinMessagesTokenFloor {
+		return fmt.Errorf(
+			"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",
+			b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead,
+			b.AvailableForMessages, MinMessagesTokenFloor,
+		)
+	}
+


The token budget validation can fail with a descriptive error message, but the function proceeds to calculate AvailableForMessages which may be negative. When AvailableForMessages is less than MinMessagesTokenFloor, the error message is constructed and returned, but the struct fields have already been modified. If the budget validation fails and the error is ignored upstream, negative values could propagate. Consider returning early before modifying the struct fields if the budget is invalid.

Suggested change

// Calculate response reserve

if b.MaxCompletionTokens > 0 {

b.ResponseReserve = b.MaxCompletionTokens

} else {

b.ResponseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))

}

// Calculate available space for messages

b.AvailableForMessages = b.ContextLength - b.ToolsTokens - b.ResponseReserve - b.FixedOverhead

// Hard floor check: if available space is too small, return error

if b.AvailableForMessages < MinMessagesTokenFloor {

return fmt.Errorf(

"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",

b.ContextLength, b.ToolsTokens, b.ResponseReserve, b.FixedOverhead,

b.AvailableForMessages, MinMessagesTokenFloor,

)

}

// Calculate response reserve using a local variable to avoid mutating

// the struct until we've confirmed the budget is valid.

var responseReserve int

if b.MaxCompletionTokens > 0 {

responseReserve = b.MaxCompletionTokens

} else {

responseReserve = int(float64(b.ContextLength) * (1 - SafetyMarginRatio))

}

// Calculate available space for messages (also using a local variable).

availableForMessages := b.ContextLength - b.ToolsTokens - responseReserve - b.FixedOverhead

// Hard floor check: if available space is too small, return error

if availableForMessages < MinMessagesTokenFloor {

return fmt.Errorf(

"token budget exhausted: context=%d, tools=%d, response_reserve=%d, overhead=%d → only %d tokens available (minimum required: %d). Reduce max_tokens, use fewer tools, or choose a model with larger context",

b.ContextLength, b.ToolsTokens, responseReserve, b.FixedOverhead,

availableForMessages, MinMessagesTokenFloor,

)

}

// Only commit computed values to the struct once validation passes.

b.ResponseReserve = responseReserve

b.AvailableForMessages = availableForMessages

Copilot · 2025-12-19T02:30:59Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go

+// ValidateUserInputSize checks if the last user message (current input) exceeds MaxUserContentTokens.
+// Returns an error if the user input is too large, preventing the request from proceeding.
+// This only validates the LAST user message (current input), not historical messages.
+func ValidateUserInputSize(messages []openai.ChatCompletionMessage) error {


The comment states "This only validates the LAST user message (current input), not historical messages" which correctly describes the intent. However, the function iterates backwards through ALL messages looking for the last user message. If there are multiple user messages in the conversation history, this could be confusing. Consider adding a comment explaining why we iterate backwards instead of just checking the last message in the array.

Copilot · 2025-12-19T02:31:00Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go

+// truncateTextPreservingJSON truncates text content while trying to preserve JSON structure.
+// If content is JSON-stringified MultiContent, it parses and truncates the nested text fields.
+func truncateTextPreservingJSON(content string, maxTokens int) (string, bool) {
+	maxChars := maxTokens * TokenEstimateRatio
+	trimmed := strings.TrimSpace(content)
+
+	// Check if it looks like a JSON array (MultiContent format)
+	if len(trimmed) > 0 && trimmed[0] == '[' {
+		var parts []map[string]interface{}
+		if err := json.Unmarshal([]byte(content), &parts); err == nil {
+			// Successfully parsed as JSON array - truncate nested text fields
+			modified := false
+			for i := range parts {
+				if textVal, ok := parts[i]["text"]; ok {
+					if textStr, ok := textVal.(string); ok {
+						textTokens := estimateTokenCount(textStr)
+						if textTokens > maxTokens {
+							textRunes := []rune(textStr)
+							if len(textRunes) > maxChars {
+								parts[i]["text"] = string(textRunes[:maxChars]) + "\n\n[Content truncated]"
+								modified = true
+							}
+						}
+					}
+				}
+			}
+			if modified {
+				if newContent, err := json.Marshal(parts); err == nil {
+					return string(newContent), true
+				}
+			}
+			return content, false
+		}
+	}
+
+	// Check if it looks like a JSON object with nested content
+	if len(trimmed) > 0 && trimmed[0] == '{' {
+		var obj map[string]interface{}
+		if err := json.Unmarshal([]byte(content), &obj); err == nil {
+			modified := false
+			// Truncate common large text fields
+			for _, key := range []string{"text", "content", "markdown", "raw_text", "body"} {
+				if textVal, ok := obj[key]; ok {
+					if textStr, ok := textVal.(string); ok {
+						textTokens := estimateTokenCount(textStr)
+						if textTokens > maxTokens {
+							textRunes := []rune(textStr)
+							if len(textRunes) > maxChars {
+								obj[key] = string(textRunes[:maxChars]) + "\n\n[Content truncated]"
+								modified = true
+							}
+						}
+					}
+				}
+			}
+			if modified {
+				if newContent, err := json.Marshal(obj); err == nil {
+					return string(newContent), true
+				}
+			}
+			return content, false
+		}
+	}
+
+	// Plain text - simple truncation
+	runes := []rune(content)
+	if len(runes) > maxChars {
+		return string(runes[:maxChars]) + "\n\n[Content truncated - exceeded " + strconv.Itoa(maxTokens) + " token limit]", true
+	}
+	return content, false
+}


The truncateTextPreservingJSON function has deeply nested conditional logic with multiple JSON parsing attempts. This creates high cyclomatic complexity and makes the function hard to test and maintain. Consider refactoring into separate functions for each truncation strategy: truncateJSONArray, truncateJSONObject, truncatePlainText. This would improve readability and allow for easier unit testing of each strategy.

Copilot · 2025-12-19T02:31:00Z

services/llm-api/internal/interfaces/httpserver/handlers/chathandler/message_trimmer.go

+
+// isImageType checks if a map represents an image content type.
+func isImageType(item map[string]any) bool {
+	for _, key := range []string{"type", "kind", "contentType"} {
+		if val, ok := item[key].(string); ok {
+			if val == "image" || strings.HasPrefix(val, "image/") {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+// hasImageDataURL checks if a map contains an image data URL.
+func hasImageDataURL(item map[string]any) bool {
+	for _, key := range []string{"data", "url", "src", "imageUrl"} {
+		if val, ok := item[key].(string); ok {
+			if strings.HasPrefix(val, "data:image/") {
+				return true
+			}
+		}
+	}
+	return false
+}
+
+// countDataURLImages counts data URL images in non-JSON content (fallback).
+func countDataURLImages(content string) int {
+	count := 0
+	count += strings.Count(content, "data:image/png;base64")
+	count += strings.Count(content, "data:image/jpeg;base64")
+	count += strings.Count(content, "data:image/webp;base64")
+	count += strings.Count(content, "data:image/gif;base64")
+	return count


The image detection logic in countImagesInToolResult tries multiple approaches (JSON array parsing, fallback string counting) which is good, but the code is complex and deeply nested. Consider extracting the JSON parsing logic and string counting into separate helper functions to improve readability and testability. This would also make it easier to add new image format detection in the future.

locnguyen1986 added 6 commits December 18, 2025 23:24

model context size checking

9941981

Merge branch 'main' into feat/model-context-validation

fff9f2a

Merge pull request #323 from janhq/feat/model-context-validation

aa10890

model context size checking

add branching

ba9e68b

Merge branch 'main' into feat/regenerate-branch

e1228c6

Merge pull request #325 from janhq/feat/regenerate-branch

b09d936

add branching

Copilot AI review requested due to automatic review settings December 19, 2025 02:25

github-project-automation bot added this to Jan Dec 19, 2025

Copilot started reviewing on behalf of locnguyen1986 December 19, 2025 02:26 View session

Vanalite approved these changes Dec 19, 2025

View reviewed changes

locnguyen1986 merged commit d9deed8 into release Dec 19, 2025
11 of 12 checks passed

github-project-automation bot moved this to QA in Jan Dec 19, 2025

Copilot AI reviewed Dec 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release model context fix and branching feature#326

Release model context fix and branching feature#326
locnguyen1986 merged 6 commits intoreleasefrom
main

locnguyen1986 commented Dec 19, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Copilot AI Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	DefaultContextLength = 220000 // 220k tokens as fallback
	DefaultContextLength = 128000 // 128k tokens as a safer conservative fallback

Conversation

locnguyen1986 commented Dec 19, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes:

Reviewed changes

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants