Open
Conversation
…and benchmark applications built on the FastGPT platform. (labring#5476) - Adds a lightweight evaluation framework for app-level tracking and benchmarking. - Changes: 28 files, +1455 additions, -66 deletions. - Branch: add-evaluations -> main. - PR: chanzhi82020#1 Applications built on FastGPT need repeatable, comparable benchmarks to measure regressions, track improvements, and validate releases. This initial implementation provides the primitives to define evaluation scenarios, run them against app endpoints or model components, and persist results for later analysis. I updated the PR description to emphasize that the evaluation system is targeted at FastGPT-built apps and expanded the explanation of the core pieces so reviewers understand the scope and intended use. The new description outlines the feature intent, core components, and how results are captured and aggregated for benchmarking. - Evaluation definitions - Define evaluation tasks that reference an app (app id, version, endpoint), test datasets or input cases, expected outputs (when applicable), and run configuration (parallelism, timeouts). - Support for custom metric plugins so teams can add domain-specific measures. - Runner / Executor - Executes evaluation cases against app endpoints or internal model interfaces. - Captures raw responses, response times, status codes, and any runtime errors. - Computes per-case metrics (e.g., correctness, latency) immediately after each case run. - Metrics & Aggregation - Built-in metrics: accuracy/success rate, latency (p50/p90/p99), throughput, error rate. - Aggregation produces per-run summaries and per-app historical summaries for trend analysis. - Allows combining metrics into composite scores for high-level benchmarking. - Persistence & Logging - Stores run results, input/output pairs (when needed), timestamps, environment info, and app/version metadata so runs are reproducible and auditable. - Logs are retained to facilitate debugging and root-cause analysis of regressions. - Reporting & Comparison - Produces aggregated reports suitable for CI gating, release notes, or dashboards. - Supports comparing multiple app versions or deployments side-by-side. - Extensibility & Integration - Designed to plug into CI (automated runs on PRs or releases), dashboards, and downstream analysis tools. - Easy to add new metrics, evaluators, or dataset connectors. By centering the evaluation system on FastGPT apps, teams can benchmark full application behavior (not only raw model outputs), correlate metrics with deployment configurations, and make informed release decisions. - Expand built-in metric suite (e.g., F1, BLEU/ROUGE where applicable), add dataset connectors, and provide example evaluation scenarios for sample apps. - Integrate with CI pipelines and add basic dashboarding for trend visualization. Related Issue: N/A Co-authored-by: Archer <545436317@qq.com>
…eval-dataset-dev' 到 'eval-dev') feat: Add comprehensive evaluation dataset management system Summary This PR introduces a comprehensive evaluation dataset management system that enables users to create, manage, and process evaluation datasets with advanced features including smart generation and quality assessment. Key Features Added: Dataset Collection Management: Complete CRUD operations for evaluation dataset collections with team/user isolation Dataset Data Management: Complete CRUD operations for evaluation dataset data with team/user isolation Smart Data Generation: AI-powered synthesis of evaluation data using background job processing Quality Assessment: Batch quality evaluation jobs with configurable assessment criteria Task Management: Queue-based processing system for long-running dataset operations Technical Implementation: Database Schemas: New MongoDB schemas for dataset collections and data entries with proper indexing API Endpoints: RESTful APIs for all dataset operations with proper error handling and validation Background Processing: BullMQ integration for handling intensive data processing tasks Type Safety: Comprehensive TypeScript definitions and enums for improved data integrity Team Integration: Full support for team-based access control and data isolation New API Endpoints: Dataset Collection Management - POST /api/core/evaluation/dataset/collection/create - Create evaluation dataset collection - PUT /api/core/evaluation/dataset/collection/update - Update dataset collection - POST /api/core/evaluation/dataset/collection/list - List dataset collections with pagination - POST /api/core/evaluation/dataset/collection/failedTasks - Get failed processing tasks - POST /api/core/evaluation/dataset/collection/retryTask - Retry failed processing task - POST /api/core/evaluation/dataset/collection/deleteTask - Delete processing task Dataset Data Management - POST /api/core/evaluation/dataset/data/create - Create individual dataset data entry - PUT /api/core/evaluation/dataset/data/update - Update dataset data entry - POST /api/core/evaluation/dataset/data/list - List dataset data with pagination - DELETE /api/core/evaluation/dataset/data/delete - Delete dataset data entry Data Import & Processing - POST /api/common/file/upload - Upload CSV files for dataset import - POST /api/core/evaluation/dataset/data/fileId - Import dataset data from uploaded file Quality Assessment - POST /api/core/evaluation/dataset/data/qualityAssessment - Single data entry quality assessment - POST /api/core/evaluation/dataset/collection/qualityAssessmentBatch - Batch quality assessment for entire collection Smart Generation - POST /api/core/evaluation/dataset/data/smartGenerate - AI-powered smart generation of evaluation data from dataset TODO - [x] gridfs 文件上传需定义新的 bucket 专门提供给 evaluation 使用 - [ ] 认证和授权完善 - [ ] limit配额检查 - [ ] 计费统计 - [ ] 审计日志 - [ ] 异常错误码 - [ ] 调用 Diting 服务联调,包括质量评测和数据合成 - [x] 接口单元测试和集成测试 - [ ] 完善3个队列的并发配置确认和提供配置 设计文档: https://xcnxw5z29dc5.feishu.cn/base/BPIRbF5bsakhqIstp9ecp7K7ntc?table=ldx3tltRDjoWp0du 运行单元测试: pnpm exec vitest run test/cases/pages/api/core/evaluation/dataset/ @82020 @31202 @94619 查看合并请求 AI-PaaS/FastGPT!22
- Add evaluation dataset management page and detail page - Add evaluation dimension management page - Refactor evaluation homepage to tab layout, supporting task/dataset/dimension switching - Add basic structure for evaluation task detail page - Update navigation bar route configuration to support new page routes
…tures ('14864/evaluation-container' 到 'eval-dev')
feat: Add evaluation task, dataset and dimension management features
- Add evaluation dataset management page and detail page
- Add evaluation dimension management page
- Refactor evaluation homepage to tab layout, supporting task/dataset/dimension switching
- Add basic structure for evaluation task detail page
- Update navigation bar route configuration to support new page routes
查看合并请求 AI-PaaS/FastGPT!26
…ectors and form validation - Merge multilingual files for evaluation dimensions and datasets - Add evaluation dimension creation and editing pages - Implement evaluation dimension form validation and submission logic - Add evaluation dimension trial run functionality - Optimize resource selector component, support hiding root directory and avatar display control - Add reference template component, provide standard evaluation templates - Implement answer input component, support collapse and automatic height adjustment - Add application selector component, support displaying all application options
…translations' 到 'eval-dev') [feat] updated some English translations 查看合并请求 AI-PaaS/FastGPT!32
…ze selectors and form validation ('14864/add-dimension' 到 'eval-dev')
feat: Add evaluation dimension management functionality, optimize selectors and form validation
- Merge multilingual files for evaluation dimensions and datasets
- Add evaluation dimension creation and editing pages
- Implement evaluation dimension form validation and submission logic
- Add evaluation dimension trial run functionality
- Optimize resource selector component, support hiding root directory and avatar display control
- Add reference template component, provide standard evaluation templates
- Implement answer input component, support collapse and automatic height adjustment
- Add application selector component, support displaying all application options
查看合并请求 AI-PaaS/FastGPT!30
…l-dev') [feat]add evaluation plugin: diting New Addition: Diting - Evaluation and Data Generation Diting Core: Evaluation and Data Generation Engine - Implemented the evaluation engine, supporting multiple evaluation metrics for application performance assessment. - Integrated the data generation engine to meet the needs for synthetic datasets and application evaluations. Diting Server: API Server for Evaluation and Data Generation Added New API Endpoints - POST /api/v1/evaluations/runs API interface for application evaluations. - POST /api/v1/dataset-synthesis/runs API interface for data synthesis. @82020 @64078 @10037 查看合并请求 AI-PaaS/FastGPT!25
…unctionality - Add file import component with drag-and-drop upload and template download support - Implement intelligent dataset generation modal with knowledge base selection and generation parameter configuration - Add evaluation task creation modal with evaluation dimension and parameter configuration support - Improve multi-language translation with new interface text additions - Optimize file upload logic with progress display and error handling support - Add evaluation dimension management component with dimension model selection and configuration support - Implement evaluation parameter configuration modal with threshold and weight settings support
…tion functionality ('14864/evaluation-components' 到 'eval-dev')
feat: Add evaluation dataset file import and intelligent generation functionality
- Add file import component with drag-and-drop upload and template download support
- Implement intelligent dataset generation modal with knowledge base selection and generation parameter configuration
- Add evaluation task creation modal with evaluation dimension and parameter configuration support
- Improve multi-language translation with new interface text additions
- Optimize file upload logic with progress display and error handling support
- Add evaluation dimension management component with dimension model selection and configuration support
- Implement evaluation parameter configuration modal with threshold and weight settings support
查看合并请求 AI-PaaS/FastGPT!37
…tor-evaluation-backend' 到 'eval-dev')
feat: implement comprehensive backend evaluation system
Enhanced Evaluation System - Complete Architecture Refactor
This PR introduces a comprehensive evaluation framework designed specifically for tracking and benchmarking applications built on the FastGPT platform.
📋 Key Features Added
1. Multi-Component Architecture
- Evaluation Datasets: Structured data management with CSV/JSON support
- Evaluation Targets: Configurable workflow-based evaluation targets
- Evaluation Metrics: AI model-based evaluation with custom prompts
- Evaluation Tasks: Orchestrated evaluation execution with item tracking
2. Comprehensive Type System
- Enhanced API Types: 160+ new API interfaces for full CRUD operations
- Complex Schema Types: Dataset, Target, Metric, and Task type definitions
- Display Types: Optimized types for UI presentation
- Validation Types: Import/export and validation result handling
3. Advanced Queue Management
- Specialized Queues: evaluation_task and evaluation_item queues
- Parallel Processing: Concurrent evaluation item processing
- Error Handling: Comprehensive error status tracking
- Retry Logic: Built-in retry mechanisms for failed evaluations
5. Enhanced Status Management
- Added Error State: New error status for failed evaluations
- Status Tracking: Comprehensive status progression
- Progress Monitoring: Real-time evaluation progress
6. Chat Integration
- Evaluation Source: New chat source type for evaluation contexts
- Logging Support: Dedicated evaluation chat logging
🛠 Technical Improvements
Service Layer Enhancements
- Resource Validation: Unified resource access validation
- Permission Management: Team-based resource permissions
- Pagination Support: Consistent pagination across all endpoints
- Error Handling: Standardized error responses
Database Schema Updates
- Complex Schemas: Multi-level nested schema definitions
- Indexing Strategy: Optimized database indexes
- Relationship Management: Proper MongoDB relationships
Queue System Improvements
- Worker Management: Enhanced worker error handling
- Job Orchestration: Sophisticated job dependency management
- Background Processing: Efficient background task execution
📊 Architecture Benefits
1. Scalability: Modular design supports large-scale evaluations
2. Flexibility: Configurable evaluation targets and metrics
3. Reliability: Robust error handling and retry mechanisms
4. Maintainability: Clean separation of concerns
5. Extensibility: Plugin-like architecture for custom metrics
🏗 Architecture Overview
flowchart TD
A[Dataset Management] --> B[Evaluation Task]
C[Metric Configuration] --> B
D[Target Configuration] --> B
B --> E[Queue System]
E --> F[Parallel Processing]
F --> G[Result Aggregation]
G --> H[Status Updates]
📋 Evaluation Task Execution Flow
sequenceDiagram
participant User
participant API
participant Queue
participant Worker
participant Database
User->>API: Create Evaluation Task
API->>Database: Validate Components
API->>Queue: Submit Task
Queue->>Worker: Process Items
Worker->>Database: Store Results
Worker->>API: Update Status
API->>User: Return Results
This represents a major architectural advancement from a basic evaluation system to a comprehensive, enterprise-grade evaluation platform with advanced features for dataset management, flexible metrics, and robust processing capabilities.
查看合并请求 AI-PaaS/FastGPT!38
… components - Add EditDataModal component for editing evaluation questions and answers - Implement ModifyEvaluationModal for modifying evaluation results - Add evaluation status constants and mapping - Support multiple evaluation statuses (HighQuality, NeedsImprovement, Abnormal) - Add i18n translations for new evaluation features - Include save, cancel and save+next functionality in edit modal - Support manual evaluation result modification with reasons - Add evaluation status badges and feedback display
…gement components ('76887/0904/eidt-modal' 到 'eval-dev')
feat(evaluation): add evaluation result editing and status management components
- Add EditDataModal component for editing evaluation questions and answers
- Implement ModifyEvaluationModal for modifying evaluation results
- Add evaluation status constants and mapping
- Support multiple evaluation statuses (HighQuality, NeedsImprovement, Abnormal)
- Add i18n translations for new evaluation features
- Include save, cancel and save+next functionality in edit modal
- Support manual evaluation result modification with reasons
- Add evaluation status badges and feedback display
查看合并请求 AI-PaaS/FastGPT!41
[update] Optimize code 查看合并请求 AI-PaaS/FastGPT!72
…r evaluation tasks
…rds for evaluation tasks ('eval-task-optimization-code-format' 到 'eval-dev')
feat: add task.dataItem layer API, and optimise the code standards for evaluation tasks
查看合并请求 AI-PaaS/FastGPT!74
… 'eval-dev') feat: add summary error code and test case completed * eval summary error code(评估总结错误码) * test case(api端到端单测) * summary config auditlog(总结生成修改配置添加操作日志) 查看合并请求 AI-PaaS/FastGPT!69
- Add getEvalDatasetDataDetail API types and response structure - Implement authEvaluationDatasetDataReadById authorization function - Create new API endpoint for fetching evaluation dataset data details - Enhance dataset list API with optional dataCount for evaluation scene - Remove obsolete evaluation dataset listDataset.ts file
…sting ('eval-refactor-front' 到 'eval-dev')
feat: Add evaluation dataset data detail and enhance dataset listing
- Add getEvalDatasetDataDetail API types and response structure
- Implement authEvaluationDatasetDataReadById authorization function
- Create new API endpoint for fetching evaluation dataset data details
- Enhance dataset list API with optional dataCount for evaluation scene
- Remove obsolete evaluation dataset listDataset.ts file
查看合并请求 AI-PaaS/FastGPT!75
- Replace dataset and collection selection with new FilesCascader component - Add multi-language support - Optimize selection flow with unified cascading selector - Add "skip knowledge base" option with helpful hints - Improve UI layout and interaction with consistent modal design - Add state management for input data modal visibility
…ascader' 到 'eval-dev') feat: refactor expected answer annotation modal - Replace dataset and collection selection with new FilesCascader component - Add multi-language support - Optimize selection flow with unified cascading selector - Add "skip knowledge base" option with helpful hints - Improve UI layout and interaction with consistent modal design - Add state management for input data modal visibility 查看合并请求 AI-PaaS/FastGPT!76
…asets - Implement intelligent generation of evaluation datasets - Add APIs for creating, updating, and deleting evaluation datasets - Complete the display and manipulation of evaluation dataset data lists - Implement manual addition and intelligent generation of dataset data - Add data quality assessment functionality for datasets - Optimize interaction logic on the dataset details page
…on datasets ('76887/0909/01' 到 'eval-dev')
feat(evaluation): Implement complete functionality for evaluation datasets
- Implement intelligent generation of evaluation datasets
- Add APIs for creating, updating, and deleting evaluation datasets
- Complete the display and manipulation of evaluation dataset data lists
- Implement manual addition and intelligent generation of dataset data
- Add data quality assessment functionality for datasets
- Optimize interaction logic on the dataset details page
查看合并请求 AI-PaaS/FastGPT!67
…k-optimization-validate' 到 'eval-dev') refactor: optimization param validation of eval-task Refactor: Optimization of Evaluation Task Parameter Validation Summary Unified validation framework for evaluation module with enhanced error handling and async support. Key Changes - New validation framework - Added Validatable base class and ValidationResult structure - Async validation - createEvaluatorInstance and createTargetInstance now support Promise-based validation - Enhanced error reporting - Detailed error codes, field names, and debug information - Performance optimization - Optional validation flag for high-performance scenarios - Improved type safety - Better TypeScript support throughout validation chain Breaking Changes - Function signatures changed from sync to async for evaluator/target creation Files Modified - packages/global/core/evaluation/validate.ts - NEW validation framework - packages/service/core/evaluation/evaluator/index.ts - Async evaluator validation - packages/service/core/evaluation/target/index.ts - Enhanced target validation - packages/service/core/evaluation/utils/index.ts - Improved parameter validation - packages/service/core/evaluation/task/processor.ts - Updated instance creation calls @31202 @64078 @94619 查看合并请求 AI-PaaS/FastGPT!77
- Add evaluation model selector to account model configuration table - Add multi-language translation support - Evaluation models will be used for app evaluation and data quality assessment scenarios - Integrate AIModelSelector component and filter available model list
- Replace mock data with real API calls in task creation modal - Optimize dimension management with scroll pagination and model selection - Add multilingual prompts and error handling - Update task list page with real API integration and status optimization - Add default model selection utility function - Improve type definitions and remove unused fields
- Reorganize and expand evaluation error codes with better naming conventions - Add comprehensive validation constants for name, description, model fields - Standardize error enum usage across dataset collection and data APIs - Add detailed field validation with proper length limits and type checking - Update i18n translations for Chinese, English, and Traditional Chinese - Enhance API error responses with consistent error enum references - Update test cases to align with new error handling patterns
…system ('eval-refact-errorcode' 到 'eval-dev')
refactor: Standardize evaluation error handling and validation system
- Reorganize and expand evaluation error codes with better naming conventions
- Add comprehensive validation constants for name, description, model fields
- Standardize error enum usage across dataset collection and data APIs
- Add detailed field validation with proper length limits and type checking
- Update i18n translations for Chinese, English, and Traditional Chinese
- Enhance API error responses with consistent error enum references
- Update test cases to align with new error handling patterns
查看合并请求 AI-PaaS/FastGPT!83
…' 到 'eval-dev') feat: enhance task creation and management - Replace mock data with real API calls in task creation modal - Optimize dimension management with scroll pagination and model selection - Add multilingual prompts and error handling - Update task list page with real API integration and status optimization - Add default model selection utility function - Improve type definitions and remove unused fields 查看合并请求 AI-PaaS/FastGPT!79
…dels' 到 'eval-dev') feat: add evaluation model configuration options - Add evaluation model selector to account model configuration table - Add multi-language translation support - Evaluation models will be used for app evaluation and data quality assessment scenarios - Integrate AIModelSelector component and filter available model list 查看合并请求 AI-PaaS/FastGPT!80
… format ('14864/test-run-score' 到 'eval-dev')
feat: adjust test run result score display to full score of 100 format
查看合并请求 AI-PaaS/FastGPT!81
- Added functionality to navigate from the dataset details page to the file import page, passing the collectionId. - Refactored the file import page to support two modes: - Create a new dataset and import files - Append files to an existing dataset - Optimized the file import form layout and interaction: - Dynamically show/hide the dataset name input box based on the mode - Improved the file selector UI and error handling - Added more detailed template file content - Implemented file import API call logic: - Supports serial import of multiple files - Provides detailed error feedback - Automatically redirects back to the previous page - Optimized the automatic evaluation feature: - Display the model selector only when enabled - Updated prompt text and layout
…87/0909/01' 到 'eval-dev') feat(evaluation): Enhance the dataset file import feature - Added functionality to navigate from the dataset details page to the file import page, passing the collectionId. - Refactored the file import page to support two modes: - Create a new dataset and import files - Append files to an existing dataset - Optimized the file import form layout and interaction: - Dynamically show/hide the dataset name input box based on the mode - Improved the file selector UI and error handling - Added more detailed template file content - Implemented file import API call logic: - Supports serial import of multiple files - Provides detailed error feedback - Automatically redirects back to the previous page - Optimized the automatic evaluation feature: - Display the model selector only when enabled - Updated prompt text and layout 查看合并请求 AI-PaaS/FastGPT!78
- Add getEvalDatasetCollectionDetailQuery/Response types for collection details - Add retryAllTaskBody/Response types for batch retry functionality - Create shared utils module to eliminate code duplication: - Extract getCollectionStatus function - Add buildCollectionAggregationPipeline for standardized queries - Add formatCollectionBase for consistent response formatting - Enhance failedTasks endpoint to include datasetId mapping - Refactor list.ts to use shared utilities
…'eval-dataset-dev' 到 'eval-dev') feat: add collection detail API and refactor shared utilities - Add getEvalDatasetCollectionDetailQuery/Response types for collection details - Add retryAllTaskBody/Response types for batch retry functionality - Create shared utils module to eliminate code duplication: - Extract getCollectionStatus function - Add buildCollectionAggregationPipeline for standardized queries - Add formatCollectionBase for consistent response formatting - Enhance failedTasks endpoint to include datasetId mapping - Refactor list.ts to use shared utilities 查看合并请求 AI-PaaS/FastGPT!86
52d441a to
28d1771
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.