From b782de2d3dda4211a756e5498f37351c59a03635 Mon Sep 17 00:00:00 2001 From: SJTUyh Date: Tue, 31 Mar 2026 09:10:04 +0800 Subject: [PATCH] change base url of readthedocs --- .github/ISSUE_TEMPLATE/1_bug.yml | 4 +- .github/ISSUE_TEMPLATE/2_consult.yml | 4 +- .github/ISSUE_TEMPLATE/3_doc.yml | 2 +- .github/ISSUE_TEMPLATE/5_bug_en.yml | 4 +- .github/ISSUE_TEMPLATE/6_consult_en.yml | 4 +- .github/ISSUE_TEMPLATE/7_doc_en.yml | 2 +- .github/ISSUE_TEMPLATE/config.yml | 4 +- .github/workflows/issue_auto_reply_cn.yml | 4 +- .github/workflows/issue_auto_reply_en.yml | 4 +- README.md | 42 +++++++++---------- README_en.md | 42 +++++++++---------- .../benchmark/utils/logging/error_codes.py | 2 +- .../judge_model_evaluate.md | 2 +- docs/source_en/faqs/error_codes.md | 2 +- .../judge_model_evaluate.md | 2 +- docs/source_zh_cn/faqs/error_codes.md | 2 +- tests/UT/utils/logging/test_error_codes.py | 6 +-- 17 files changed, 66 insertions(+), 66 deletions(-) diff --git a/.github/ISSUE_TEMPLATE/1_bug.yml b/.github/ISSUE_TEMPLATE/1_bug.yml index c2a04341..342a5ed0 100644 --- a/.github/ISSUE_TEMPLATE/1_bug.yml +++ b/.github/ISSUE_TEMPLATE/1_bug.yml @@ -9,10 +9,10 @@ body: value: | ## 👉 遇到问题先看这里 ### 🌟 第一次使用工具遇到问题? - 按[🚀 快速入门](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/get_started/quick_start.html)走一遍能解决90%的工具基本使用问题! + 按[🚀 快速入门](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/get_started/quick_start.html)走一遍能解决90%的工具基本使用问题! ### 🧭 尝试检索FAQ查看共性问题解决方法 - 检索[📑 FAQ](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/faqs/faq.html),目前FAQ可以解决0%的共性问题 + 检索[📑 FAQ](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/faqs/faq.html),目前FAQ可以解决0%的共性问题 ### ❓ 搜索历史issue,查看冷门的同类问题 在[🔖 Issue](https://github.com/AISBench/benchmark/issues)中搜索历史类似问题 diff --git a/.github/ISSUE_TEMPLATE/2_consult.yml b/.github/ISSUE_TEMPLATE/2_consult.yml index f716c0be..46ab4513 100644 --- a/.github/ISSUE_TEMPLATE/2_consult.yml +++ b/.github/ISSUE_TEMPLATE/2_consult.yml @@ -8,10 +8,10 @@ body: value: | ## 👉 遇到问题先看这里 ### 🌟 第一次使用工具遇到问题? - 按[🚀 快速入门](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/get_started/quick_start.html)走一遍能解决90%的工具基本使用问题! + 按[🚀 快速入门](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/get_started/quick_start.html)走一遍能解决90%的工具基本使用问题! ### 🧭 尝试检索FAQ查看共性问题解决方法 - 检索[📑 FAQ](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/faqs/faq.html),目前FAQ可以解决0%的共性问题 + 检索[📑 FAQ](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/faqs/faq.html),目前FAQ可以解决0%的共性问题 ### ❓ 搜索历史issue,查看冷门的同类问题 在[🔖 Issue](https://github.com/AISBench/benchmark/issues)中搜索历史类似问题 diff --git a/.github/ISSUE_TEMPLATE/3_doc.yml b/.github/ISSUE_TEMPLATE/3_doc.yml index b8520015..f6285b80 100644 --- a/.github/ISSUE_TEMPLATE/3_doc.yml +++ b/.github/ISSUE_TEMPLATE/3_doc.yml @@ -7,7 +7,7 @@ body: id: location attributes: label: 文档位置(可指定多个文档链接) - placeholder: 例:https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/get_started/install.html + placeholder: 例:https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/get_started/install.html validations: required: true - type: textarea diff --git a/.github/ISSUE_TEMPLATE/5_bug_en.yml b/.github/ISSUE_TEMPLATE/5_bug_en.yml index c687578b..5d931616 100644 --- a/.github/ISSUE_TEMPLATE/5_bug_en.yml +++ b/.github/ISSUE_TEMPLATE/5_bug_en.yml @@ -10,10 +10,10 @@ body: value: | ## 👉 Check Here First If You Encounter Issues ### 🌟 First Time Using the Tool and Facing Problems? - Following the [🚀 Quick Start](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/get_started/quick_start.html) guide can resolve 90% of basic tool usage issues! + Following the [🚀 Quick Start](https://ais-bench-benchmark.readthedocs.io/en/latest/get_started/quick_start.html) guide can resolve 90% of basic tool usage issues! ### 🧭 Try Searching the FAQ for Solutions to Common Issues - Search the [📑 FAQ](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/faqs/faq.html) — currently, the FAQ can resolve 0% of common issues. + Search the [📑 FAQ](https://ais-bench-benchmark.readthedocs.io/en/latest/faqs/faq.html) — currently, the FAQ can resolve 0% of common issues. ### ❓ Search Historical Issues for Rare Similar Problems Search for similar historical issues in [🔖 Issues](https://github.com/AISBench/benchmark/issues) diff --git a/.github/ISSUE_TEMPLATE/6_consult_en.yml b/.github/ISSUE_TEMPLATE/6_consult_en.yml index 411a78d5..8b82de05 100644 --- a/.github/ISSUE_TEMPLATE/6_consult_en.yml +++ b/.github/ISSUE_TEMPLATE/6_consult_en.yml @@ -8,10 +8,10 @@ body: value: | ## 👉 Check Here First If You Have Questions ### 🌟 First Time Using the Tool and Have Questions? - Following the [🚀 Quick Start](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/get_started/quick_start.html) guide can resolve 90% of basic tool usage issues! + Following the [🚀 Quick Start](https://ais-bench-benchmark.readthedocs.io/en/latest/get_started/quick_start.html) guide can resolve 90% of basic tool usage issues! ### 🧭 Try Searching the FAQ for Solutions to Common Issues - Search the [📑 FAQ](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/faqs/faq.html) — currently, the FAQ can resolve 0% of common issues. + Search the [📑 FAQ](https://ais-bench-benchmark.readthedocs.io/en/latest/faqs/faq.html) — currently, the FAQ can resolve 0% of common issues. ### ❓ Search Historical Issues for Rare Similar Problems Search for similar historical issues in [🔖 Issues](https://github.com/AISBench/benchmark/issues) diff --git a/.github/ISSUE_TEMPLATE/7_doc_en.yml b/.github/ISSUE_TEMPLATE/7_doc_en.yml index 99a51c97..d7897407 100644 --- a/.github/ISSUE_TEMPLATE/7_doc_en.yml +++ b/.github/ISSUE_TEMPLATE/7_doc_en.yml @@ -7,7 +7,7 @@ body: id: location attributes: label: Documentation Location (Multiple document links can be specified) - placeholder: E.g., https://ais-bench-benchmark-rf.readthedocs.io/en/latest/get_started/install.html + placeholder: E.g., https://ais-bench-benchmark.readthedocs.io/en/latest/get_started/install.html validations: required: true - type: textarea diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml index 3fcbd6a2..6fb72c70 100644 --- a/.github/ISSUE_TEMPLATE/config.yml +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -2,8 +2,8 @@ blank_issues_enabled: false # 禁止空白 Issue,强制使用模板 contact_links: - name: 📚 官方文档 - url: https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/ + url: https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/ about: 遇到问题请在官方文档中搜索。 - name: 📚 Documentation - url: https://ais-bench-benchmark-rf.readthedocs.io/en/latest/ + url: https://ais-bench-benchmark.readthedocs.io/en/latest/ about: Check if your question is answered in the documentation. diff --git a/.github/workflows/issue_auto_reply_cn.yml b/.github/workflows/issue_auto_reply_cn.yml index 1419ee9e..ae3b99a9 100644 --- a/.github/workflows/issue_auto_reply_cn.yml +++ b/.github/workflows/issue_auto_reply_cn.yml @@ -36,12 +36,12 @@ jobs: bug: `🌟欢迎给AISBench benchmark测评工具提issue,您可以先尝试以下途径寻求issue的解决方法: 1. 【强烈推荐❤️‍🔥】确保issue描述完整后,可以试着将issue交给[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/AISBench/benchmark)回答,deepwiki包含了和工具相关的所有知识库 2. 在 [工具历史issue](https://github.com/AISBench/benchmark/issues/) 中通过问题的关键日志片段或关键词检索相似issue - 3. 在 [官方文档](https://ais-bench-benchmark-rf.readthedocs.io/zh_cn/latest/) 的**搜索栏**中通过问题的关键日志片段或关键词检索 + 3. 在 [官方文档](https://ais-bench-benchmark.readthedocs.io/zh_cn/latest/) 的**搜索栏**中通过问题的关键日志片段或关键词检索 4. 寻求仓库维护者的帮助(issue评论区中 @SJTUyh)`, question: `🌟欢迎给AISBench benchmark测评工具提issue,您可以先尝试以下途径寻求issue的解决方法: 1. 【强烈推荐❤️‍🔥】确保issue描述完整后,可以试着将issue交给[![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/AISBench/benchmark)回答,deepwiki包含了和工具相关的所有知识库 2. 在 [工具历史issue](https://github.com/AISBench/benchmark/issues/) 中通过问题的关键日志片段或关键词检索相似issue - 3. 在 [官方文档](https://ais-bench-benchmark-rf.readthedocs.io/zh_cn/latest/) 的**搜索栏**中通过问题的关键日志片段或关键词检索 + 3. 在 [官方文档](https://ais-bench-benchmark.readthedocs.io/zh_cn/latest/) 的**搜索栏**中通过问题的关键日志片段或关键词检索 4. 寻求仓库维护者的帮助(issue评论区中 @SJTUyh)`, enhancement: `🌟欢迎给AISBench benchmark测评工具提新特性诉求,请确保在issue中描述清晰,以便仓库维护者能准确理解您的诉求。对于新特性诉求仓库维护者一般会在3天内给您答复。`, documentation: `🙏感谢您发现AISBench benchmark测评工具的文档漏洞,请确保在issue中描述清晰文档问题,以便我们能更快修复文档。对于文档漏洞仓库维护者一般会在1天内给您答复。` diff --git a/.github/workflows/issue_auto_reply_en.yml b/.github/workflows/issue_auto_reply_en.yml index ff21b9ef..51fd5e74 100644 --- a/.github/workflows/issue_auto_reply_en.yml +++ b/.github/workflows/issue_auto_reply_en.yml @@ -36,12 +36,12 @@ jobs: bug: `🌟 Welcome to file an issue for the AISBench benchmark tool. You can try the following ways to find solutions to your issue: 1. [Strongly recommended❤️‍🔥]Ensure your issue description is complete, then try to ask [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/AISBench/benchmark) for help, as deepwiki contains all the knowledge related to the tool 2. Search for similar issues in the [tool's historical issues](https://github.com/AISBench/benchmark/issues/) using key log fragments or keywords - 3. Search in the [official documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/) search bar using key log fragments or keywords + 3. Search in the [official documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/) search bar using key log fragments or keywords 4. Seek help from repository maintainers (mention @SJTUyh in the issue comment)`, question: `🌟 Welcome to file an issue for the AISBench benchmark tool. You can try the following ways to find solutions to your issue: 1. [Strongly recommended❤️‍🔥]Ensure your issue description is complete, then try to ask [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/AISBench/benchmark) for help, as deepwiki contains all the knowledge related to the tool 2. Search for similar issues in the [tool's historical issues](https://github.com/AISBench/benchmark/issues/) using key log fragments or keywords - 3. Search in the [official documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/) search bar using key log fragments or keywords + 3. Search in the [official documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/) search bar using key log fragments or keywords 4. Seek help from repository maintainers (mention @SJTUyh in the issue comment)`, enhancement: `🌟 Welcome to file a new feature request for the AISBench benchmark tool. Please ensure your issue description is clear so that repository maintainers can accurately understand your request. Maintainers will typically respond to new feature requests within 3 days.`, documentation: `🙏 Thank you for finding documentation issues in the AISBench benchmark tool. Please ensure your issue description clearly explains the documentation problem so we can fix it faster. Maintainers will typically respond to documentation issues within 1 day.` diff --git a/README.md b/README.md index 11d7c4e1..5e568965 100644 --- a/README.md +++ b/README.md @@ -17,8 +17,8 @@ [![Ask DeepWiki](https://deepwiki.com/badge.svg)](https://deepwiki.com/AISBench/benchmark)

[🌐官方网站](https://www.aisbench.com) | -[📖工具文档](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/) | -[👨‍💻开发者文档](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/develop_guide/contributing.html) | +[📖工具文档](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/) | +[👨‍💻开发者文档](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/develop_guide/contributing.html) | [🔥最新进展](#-最新进展)| [🤔报告问题](https://github.com/AISBench/benchmark/issues/new/choose)

简体中文 | [English](README_en.md) @@ -29,11 +29,11 @@ > **⭐️收藏项目**,你将能第一时间获取 AISBench评测工具 的最新动态~ ## 🔥 最新进展 -- **\[2026.3.10\]** 接入首个图像生成类评测基准GEdit-Bench, 支持对图像生成模型进行评测,详见[在AISBench中测评GEdit-Bench](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/extended_benchmark/lmm_generate/gedit_bench.html)。 🔥🔥🔥 -- **\[2026.3.1\]** 支持接入裁判模型进行评估,详见[使用裁判模型进行测评](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/advanced_tutorials/judge_model_evaluate.html)。 🔥🔥🔥 +- **\[2026.3.10\]** 接入首个图像生成类评测基准GEdit-Bench, 支持对图像生成模型进行评测,详见[在AISBench中测评GEdit-Bench](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/extended_benchmark/lmm_generate/gedit_bench.html)。 🔥🔥🔥 +- **\[2026.3.1\]** 支持接入裁判模型进行评估,详见[使用裁判模型进行测评](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/advanced_tutorials/judge_model_evaluate.html)。 🔥🔥🔥 - **\[2026.1.31\]** 支持 [Mooncake Trace](ais_bench/benchmark/configs/datasets/mooncake_trace/README.md) trace 数据集性能测评,支持按时间戳调度请求、hash_id 缓存与可复现 prompt 生成,详见数据集 README。🔥🔥🔥 - **\[2025.12.19\]** 🎉 **AISBench 架构全面重构完成!** - - ✨ **架构升级**:对cli、models、inferencer、tasks组件进行了全面重构,支持快速接入新的测试基准,参考📚 [开发者文档](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/develop_guide/contributing.html)了解详情! + - ✨ **架构升级**:对cli、models、inferencer、tasks组件进行了全面重构,支持快速接入新的测试基准,参考📚 [开发者文档](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/develop_guide/contributing.html)了解详情! - 🖥️ **任务管理界面**:全新的任务UI管理界面,支持同时监控每个任务的详细执行状态,包括任务名称、进度、时间成本、状态、日志路径、扩展参数等,让任务执行状态一目了然! - ⚡ **并行执行增强**:扩展了多任务并行功能,支持多个性能或精度测评任务并行执行,大幅提升评测效率! - 📊 **新增15+测评基准**:新增docvqa、infovqa、ocrbench_v2、omnidocbench、mmmu、mmmu_pro、mmstar、videomme、FewCLUE系列、dapo_math、leval等多模态和文本测评基准! @@ -41,29 +41,29 @@ - 🔧 **功能增强**:新增流式推理开关、自定义URL路径、API key配置;支持API模型推理warmup;支持自定义多模态数据集性能测评;部分数据集支持服务化PPL(困惑度)测评等多项功能! - 🏗️ **基础设施优化**:重构local models和api models组件,统一流式和非流式实现;重构inferencer组件,采用多进程+协程调用方式,提高并发能力;测试结果数据格式优化为jsonl,降低IO压力;采用错误码统一管理错误信息等! - **\[2025.11.25\]** 支持服务化模型PPL(Perplexity-based,困惑度)模式精度测评。🔥🔥🔥 -- **\[2025.9.08\]** 支持📚[模拟真实业务流量](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/advanced_tutorials/rps_distribution.html):通过控制请求发送速率波动,感知在模拟真实场景下服务化的性能测评结果!🔥🔥🔥 +- **\[2025.9.08\]** 支持📚[模拟真实业务流量](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/advanced_tutorials/rps_distribution.html):通过控制请求发送速率波动,感知在模拟真实场景下服务化的性能测评结果!🔥🔥🔥 -- **\[2025.8.28\]** 支持📚[多次独立重复推理精度场景](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id12),计算pass@k/cons@k/avg@n等不同维度的精度指标!🔬🔬🔬 +- **\[2025.8.28\]** 支持📚[多次独立重复推理精度场景](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id12),计算pass@k/cons@k/avg@n等不同维度的精度指标!🔬🔬🔬 - **\[2025.8.19\]** - 新增Function Call专用模型配置 [vllm_api_function_call_chat](ais_bench/benchmark/configs/models/vllm_api/vllm_api_function_call_chat.py),支持 [BFCL 函数调用能力评估](ais_bench/benchmark/configs/datasets/BFCL/README.md) 🔥🔥🔥 - - 提供工具支持的[性能测试规格说明](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/performance_benchmark.html#id25),优化推理集群场景工具内存占用及性能计算。最大规格场景(250K条请求,输入/输出token 4K/4K)内存占用降低60%,内存占用小于64GB;性能结果计算效率提升20倍。🚀🚀🚀 + - 提供工具支持的[性能测试规格说明](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/performance_benchmark.html#id25),优化推理集群场景工具内存占用及性能计算。最大规格场景(250K条请求,输入/输出token 4K/4K)内存占用降低60%,内存占用小于64GB;性能结果计算效率提升20倍。🚀🚀🚀 - **\[2025.7.15\]** - - 支持[sharegpt](ais_bench/benchmark/configs/datasets/sharegpt/README.md)和[mtbench](ais_bench/benchmark/configs/datasets/mtbench/README.md)多轮对话数据集服务化性能测评和可视化,测评方式见📚[多轮对话测评指南](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/advanced_tutorials/multiturn_benchmark.html)!🔥🔥🔥 - - 性能评测场景使用[自定义数据集](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/advanced_tutorials/custom_dataset.html),支持按请求粒度指定最大输出长度!🔥🔥🔥 + - 支持[sharegpt](ais_bench/benchmark/configs/datasets/sharegpt/README.md)和[mtbench](ais_bench/benchmark/configs/datasets/mtbench/README.md)多轮对话数据集服务化性能测评和可视化,测评方式见📚[多轮对话测评指南](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/advanced_tutorials/multiturn_benchmark.html)!🔥🔥🔥 + - 性能评测场景使用[自定义数据集](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/advanced_tutorials/custom_dataset.html),支持按请求粒度指定最大输出长度!🔥🔥🔥 -- **\[2025.6.19\]** 支持📚[性能评测结果可视化](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/results_intro/performance_visualization.html),辅助定位推理服务性能瓶颈!🔥🔥🔥 +- **\[2025.6.19\]** 支持📚[性能评测结果可视化](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/results_intro/performance_visualization.html),辅助定位推理服务性能瓶颈!🔥🔥🔥 - **\[2025.6.12\]** 支持[textvqa](ais_bench/benchmark/configs/datasets/textvqa/README.md)、[videobench](ais_bench/benchmark/configs/datasets/videobench/README.md)和[vocalsound](ais_bench/benchmark/configs/datasets/vocalsound/README.md)等多模态数据集的精度和性能评测!🔥🔥🔥 - **\[2025.6.6\]** AISBench支持稳态性能评测,获取系统真实最佳性能,参考📚 [服务化稳定状态性能测试](doc/users_guide/stable_stage.md)进行快速上手! 🔥🔥🔥 -- **\[2025.5.16\]** 支持3W+高并发服务化性能评测,📚 [性能指标](doc/users_guide/performance_metric.md)对齐🔗 [vllm benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks),参考📚 [服务化性能测评指南](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/performance_benchmark.html)了解详情!🔥🔥🔥 +- **\[2025.5.16\]** 支持3W+高并发服务化性能评测,📚 [性能指标](doc/users_guide/performance_metric.md)对齐🔗 [vllm benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks),参考📚 [服务化性能测评指南](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/performance_benchmark.html)了解详情!🔥🔥🔥 -- **\[2025.4.30\]** 精度评测支持断点续测和失败用例重测,大幅提高精度评测鲁棒性,参考📚 [中断续测 & 失败用例重测](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id10)进行快速上手! 🔥🔥🔥 +- **\[2025.4.30\]** 精度评测支持断点续测和失败用例重测,大幅提高精度评测鲁棒性,参考📚 [中断续测 & 失败用例重测](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id10)进行快速上手! 🔥🔥🔥 - **\[2025.4.15\]** 优化固定batch发送请求的方式为continuous batch模式发送请求,大幅提高精度评测效率! 🔥🔥🔥 -- **\[2025.4.12\]** 支持合并MMLU、Ceval等所有多文件数据集为单个数据集任务进行精度评测,参考📚 [合并多文件数据集](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id11)了解详情! 🔥🔥🔥 +- **\[2025.4.12\]** 支持合并MMLU、Ceval等所有多文件数据集为单个数据集任务进行精度评测,参考📚 [合并多文件数据集](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id11)了解详情! 🔥🔥🔥 ## 🌏 简介 @@ -71,9 +71,9 @@ AISBench Benchmark 是基于 [OpenCompass](https://github.com/open-compass/openc 当前,AISBench 支持两大类推理任务的评测场景: -🔍 [精度测评](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/home.html#id2):支持对服务化模型和本地模型在各类问答、推理基准数据集上的精度验证,覆盖文本、多模态等多种场景。 +🔍 [精度测评](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/home.html#id2):支持对服务化模型和本地模型在各类问答、推理基准数据集上的精度验证,覆盖文本、多模态等多种场景。 -🚀 [性能测评](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/home.html#id5):支持对服务化模型的延迟与吞吐率评估,并可进行压测场景下的极限性能测试,支持稳态性能评测和真实业务流量模拟。 +🚀 [性能测评](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/scenes_intro/home.html#id5):支持对服务化模型的延迟与吞吐率评估,并可进行压测场景下的极限性能测试,支持稳态性能评测和真实业务流量模拟。 ## 🛠️ 工具安装 ✅ 环境要求 @@ -150,11 +150,11 @@ ais_bench --models vllm_api_general_chat --datasets demo_gsm8k_gen_4_shot_cot_ch ### 任务含义查询(可选) 所选模型任务`vllm_api_general_chat`、数据集任务`demo_gsm8k_gen_4_shot_cot_chat_prompt`和结果呈现任务`example`的具体信息(简介,使用约束等)可以分别从如下链接中查询含义: -- `--models`: 📚 [服务化推理后端](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/all_params/models.html#id2) +- `--models`: 📚 [服务化推理后端](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/all_params/models.html#id2) -- `--datasets`: 📚 [开源数据集](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/get_started/datasets.html#id3) → 📚 [详细介绍](ais_bench/benchmark/configs/datasets/demo/README.md) +- `--datasets`: 📚 [开源数据集](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/get_started/datasets.html#id3) → 📚 [详细介绍](ais_bench/benchmark/configs/datasets/demo/README.md) -- `--summarizer`: 📚 [结果汇总任务](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/all_params/summarizer.html) +- `--summarizer`: 📚 [结果汇总任务](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/all_params/summarizer.html) ### 运行命令前置准备 - `--models`: 使用`vllm_api_general_chat`模型任务,需要准备支持`v1/chat/completions`子服务的推理服务,可以参考🔗 [VLLM启动OpenAI 兼容服务器](https://docs.vllm.com.cn/en/latest/getting_started/quickstart.html#openai-compatible-server)启动推理服务 @@ -180,7 +180,7 @@ ais_bench --models vllm_api_general_chat --datasets demo_gsm8k_gen_4_shot_cot_ch ``` -- 快速入门中数据集任务配置文件`demo_gsm8k_gen_4_shot_cot_chat_prompt.py`不需要做额外修改,数据集任务配置文件内容介绍可参考📚 [配置开源数据集](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/all_params/datasets.html#id6) +- 快速入门中数据集任务配置文件`demo_gsm8k_gen_4_shot_cot_chat_prompt.py`不需要做额外修改,数据集任务配置文件内容介绍可参考📚 [配置开源数据集](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/all_params/datasets.html#id6) 模型配置文件`vllm_api_general_chat.py`中包含了模型运行相关的配置内容,是需要依据实际情况修改的。快速入门中需要修改的内容用注释标明。 ```python @@ -277,7 +277,7 @@ dataset version metric mode vllm_api_general_chat demo_gsm8k 401e4c accuracy gen 62.50 ``` -更多教程请查看我们的👉[文档](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/) +更多教程请查看我们的👉[文档](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/) ## 🔜 即将推出 diff --git a/README_en.md b/README_en.md index 4a91f475..e49e22c6 100644 --- a/README_en.md +++ b/README_en.md @@ -18,8 +18,8 @@

[🌐 Official Website](https://www.aisbench.com) | -[📖 Tool Documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/) | -[👨‍💻 Developer Documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/develop_guide/contributing.html) | +[📖 Tool Documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/) | +[👨‍💻 Developer Documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/develop_guide/contributing.html) | [🔥 Latest Updates](#-latest-updates)| [🤔 Report Issues](https://github.com/AISBench/benchmark/issues/new/choose)

[简体中文](README.md) | English @@ -30,11 +30,11 @@ > **⭐️Star this project** to get the latest updates of AISBench Benchmark Tool in real time! ## 🔥 Latest Updates -- **\[2026.3.10\]** Integrated the first image generation evaluation benchmark GEdit-Bench, supporting evaluation of image generation models. See [Evaluate GEdit-Bench in AISBench](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/extended_benchmark/lmm_generate/gedit_bench.html) for details. 🔥🔥🔥 -- **\[2026.3.1\]** Supports integrating judge models for evaluation. See [Evaluate with Judge Models](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/advanced_tutorials/judge_model_evaluate.html). 🔥🔥🔥 +- **\[2026.3.10\]** Integrated the first image generation evaluation benchmark GEdit-Bench, supporting evaluation of image generation models. See [Evaluate GEdit-Bench in AISBench](https://ais-bench-benchmark.readthedocs.io/en/latest/extended_benchmark/lmm_generate/gedit_bench.html) for details. 🔥🔥🔥 +- **\[2026.3.1\]** Supports integrating judge models for evaluation. See [Evaluate with Judge Models](https://ais-bench-benchmark.readthedocs.io/en/latest/advanced_tutorials/judge_model_evaluate.html). 🔥🔥🔥 - **\[2026.1.31\]** Support for [Mooncake Trace](ais_bench/benchmark/configs/datasets/mooncake_trace/README_en.md) trace dataset performance evaluation; supports timestamp-based request scheduling, hash_id caching, and reproducible prompt generation. See the dataset README for details. 🔥🔥🔥 - **\[2025.12.19\]** 🎉 **AISBench Architecture Refactoring Completed!** - - ✨ **Architecture Upgrade**: Comprehensive refactoring of cli, models, inferencer, and tasks components, supporting rapid integration of new test benchmarks. See 📚 [Developer Documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/develop_guide/contributing.html) for details! + - ✨ **Architecture Upgrade**: Comprehensive refactoring of cli, models, inferencer, and tasks components, supporting rapid integration of new test benchmarks. See 📚 [Developer Documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/develop_guide/contributing.html) for details! - 🖥️ **Task Management Interface**: Brand new task UI management interface that supports simultaneous monitoring of detailed execution status for each task, including task name, progress, time cost, status, log path, extended parameters, etc., making task execution status clear at a glance! - ⚡ **Enhanced Parallel Execution**: Extended multi-task parallel functionality, supporting parallel execution of multiple performance or accuracy evaluation tasks, significantly improving evaluation efficiency! - 📊 **15+ New Evaluation Benchmarks**: Added docvqa, infovqa, ocrbench_v2, omnidocbench, mmmu, mmmu_pro, mmstar, videomme, FewCLUE series, dapo_math, leval and other multimodal and text evaluation benchmarks! @@ -42,31 +42,31 @@ - 🔧 **Feature Enhancements**: Added streaming inference switch, custom URL path, API key configuration; supports API model inference warmup; supports custom multimodal dataset performance evaluation; some datasets support service-based PPL (perplexity) evaluation and many other features! - 🏗️ **Infrastructure Optimization**: Refactored local models and api models components, unified streaming and non-streaming implementations; refactored inferencer component, adopted multi-process + coroutine calling approach to improve concurrency; optimized test result data format to jsonl, reducing IO pressure; adopted error codes for unified error information management and more! - **\[2025.11.25\]** Support for PPL (Perplexity-based) mode accuracy evaluation for service-deployed models.🔥🔥🔥 -- **\[2025.9.08\]** Support for 📚[Simulating Real Business Traffic](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/advanced_tutorials/rps_distribution.html): By controlling fluctuations in request sending rates, perceive the performance evaluation results of service deployment in simulated real-world scenarios! 🔥🔥🔥 +- **\[2025.9.08\]** Support for 📚[Simulating Real Business Traffic](https://ais-bench-benchmark.readthedocs.io/en/latest/advanced_tutorials/rps_distribution.html): By controlling fluctuations in request sending rates, perceive the performance evaluation results of service deployment in simulated real-world scenarios! 🔥🔥🔥 -- **\[2025.8.28\]** Support for 📚[Multiple Independent Repeated Inference Accuracy Scenarios](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id12), calculating accuracy metrics across different dimensions such as pass@k/cons@k/avg@n! 🔬🔬🔬 +- **\[2025.8.28\]** Support for 📚[Multiple Independent Repeated Inference Accuracy Scenarios](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id12), calculating accuracy metrics across different dimensions such as pass@k/cons@k/avg@n! 🔬🔬🔬 - **\[2025.8.19\]** - Added a dedicated model configuration for Function Call: [vllm_api_function_call_chat](ais_bench/benchmark/configs/models/vllm_api/vllm_api_function_call_chat.py), supporting [BFCL Function Calling Capability Evaluation](ais_bench/benchmark/configs/datasets/BFCL/README_en.md) 🔥🔥🔥 - - Provided [Performance Test Specification Documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/performance_benchmark.html#id25) supported by the tool, optimizing memory usage and performance calculation of the tool in inference cluster scenarios. For the maximum specification scenario (250K requests, input/output tokens: 4K/4K), memory usage is reduced by 60% (now less than 64GB), and performance result calculation efficiency is improved by 20x. 🚀🚀🚀 + - Provided [Performance Test Specification Documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/performance_benchmark.html#id25) supported by the tool, optimizing memory usage and performance calculation of the tool in inference cluster scenarios. For the maximum specification scenario (250K requests, input/output tokens: 4K/4K), memory usage is reduced by 60% (now less than 64GB), and performance result calculation efficiency is improved by 20x. 🚀🚀🚀 - **\[2025.7.15\]** - - Supported service deployment performance evaluation and visualization for multi-turn dialogue datasets such as [sharegpt](ais_bench/benchmark/configs/datasets/sharegpt/README_en.md) and [mtbench](ais_bench/benchmark/configs/datasets/mtbench/README_en.md). See 📚[Multi-Turn Dialogue Evaluation Guide](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/advanced_tutorials/multiturn_benchmark.html) for evaluation methods! 🔥🔥🔥 - - Enabled the use of [custom datasets](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/advanced_tutorials/custom_dataset.html) in performance evaluation scenarios, supporting the specification of maximum output length at the request granularity! 🔥🔥🔥 + - Supported service deployment performance evaluation and visualization for multi-turn dialogue datasets such as [sharegpt](ais_bench/benchmark/configs/datasets/sharegpt/README_en.md) and [mtbench](ais_bench/benchmark/configs/datasets/mtbench/README_en.md). See 📚[Multi-Turn Dialogue Evaluation Guide](https://ais-bench-benchmark.readthedocs.io/en/latest/advanced_tutorials/multiturn_benchmark.html) for evaluation methods! 🔥🔥🔥 + - Enabled the use of [custom datasets](https://ais-bench-benchmark.readthedocs.io/en/latest/advanced_tutorials/custom_dataset.html) in performance evaluation scenarios, supporting the specification of maximum output length at the request granularity! 🔥🔥🔥 -- **\[2025.6.19\]** Support for 📚[Performance Evaluation Result Visualization](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to help locate performance bottlenecks of inference services! 🔥🔥🔥 +- **\[2025.6.19\]** Support for 📚[Performance Evaluation Result Visualization](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to help locate performance bottlenecks of inference services! 🔥🔥🔥 - **\[2025.6.12\]** Supported accuracy and performance evaluation for multimodal datasets including [textvqa](ais_bench/benchmark/configs/datasets/textvqa/README_en.md), [videobench](ais_bench/benchmark/configs/datasets/videobench/README_en.md), and [vocalsound](ais_bench/benchmark/configs/datasets/vocalsound/README_en.md)! 🔥🔥🔥 - **\[2025.6.6\]** AISBench supports steady-state performance evaluation to obtain the true optimal performance of the system. Refer to 📚 [Service Deployment Steady-State Performance Test](doc/users_guide/stable_stage.md) to get started quickly! 🔥🔥🔥 -- **\[2025.5.16\]** Supported performance evaluation for high concurrency service deployment (up to 30,000+ concurrent requests). 📚 [Performance Metrics](doc/users_guide/performance_metric.md) are aligned with 🔗 [vllm benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks). See 📚 [Service Deployment Performance Evaluation Guide](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/performance_benchmark.html) for details! 🔥🔥🔥 +- **\[2025.5.16\]** Supported performance evaluation for high concurrency service deployment (up to 30,000+ concurrent requests). 📚 [Performance Metrics](doc/users_guide/performance_metric.md) are aligned with 🔗 [vllm benchmark](https://github.com/vllm-project/vllm/tree/main/benchmarks). See 📚 [Service Deployment Performance Evaluation Guide](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/performance_benchmark.html) for details! 🔥🔥🔥 -- **\[2025.4.30\]** Accuracy evaluation supports resuming from breakpoints and re-evaluating failed cases, significantly improving the robustness of accuracy evaluation. Refer to 📚 [Resume from Interruption & Re-evaluate Failed Cases](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id10) to get started quickly! 🔥🔥🔥 +- **\[2025.4.30\]** Accuracy evaluation supports resuming from breakpoints and re-evaluating failed cases, significantly improving the robustness of accuracy evaluation. Refer to 📚 [Resume from Interruption & Re-evaluate Failed Cases](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id10) to get started quickly! 🔥🔥🔥 - **\[2025.4.15\]** Optimized the request sending method from fixed-batch to continuous batch mode, significantly improving accuracy evaluation efficiency! 🔥🔥🔥 -- **\[2025.4.12\]** Supported merging all multi-file datasets (such as MMLU, Ceval) into a single dataset task for accuracy evaluation. See 📚 [Merge Multi-File Datasets](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id11) for details! 🔥🔥🔥 +- **\[2025.4.12\]** Supported merging all multi-file datasets (such as MMLU, Ceval) into a single dataset task for accuracy evaluation. See 📚 [Merge Multi-File Datasets](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/accuracy_benchmark.html#id11) for details! 🔥🔥🔥 ## 🌏 Introduction @@ -74,9 +74,9 @@ AISBench Benchmark is a model evaluation tool built based on [OpenCompass](https Currently, AISBench supports evaluation scenarios for two major types of inference tasks: -🔍 [Accuracy Evaluation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/home.html#id2): Supports accuracy verification of service-deployed models and local models on various question-answering and reasoning benchmark datasets, covering text, multimodal and other scenarios. +🔍 [Accuracy Evaluation](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/home.html#id2): Supports accuracy verification of service-deployed models and local models on various question-answering and reasoning benchmark datasets, covering text, multimodal and other scenarios. -🚀 [Performance Evaluation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/scenes_intro/home.html#id5): Supports latency and throughput evaluation of service-deployed models, as well as extreme performance testing under stress test scenarios, supporting steady-state performance evaluation and real business traffic simulation. +🚀 [Performance Evaluation](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/scenes_intro/home.html#id5): Supports latency and throughput evaluation of service-deployed models, as well as extreme performance testing under stress test scenarios, supporting steady-state performance evaluation and real business traffic simulation. ## 🛠️ Tool Installation @@ -150,9 +150,9 @@ This command does not specify other command-line options, so it defaults to an a ### Task Meaning Query (Optional) Detailed information (introduction, usage constraints, etc.) about the selected model task (`vllm_api_general_chat`), dataset task (`demo_gsm8k_gen_4_shot_cot_chat_prompt`), and result presentation task (`example`) can be queried from the following links: -- `--models`: 📚 [Service Deployment Inference Backend](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/all_params/models.html#id2) -- `--datasets`: 📚 [Open-Source Datasets](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/all_params/datasets.html#id3) → 📚 [Detailed Introduction](ais_bench/benchmark/configs/datasets/demo/README_en.md) -- `--summarizer`: 📚 [Result Summary Tasks](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/all_params/summarizer.html) +- `--models`: 📚 [Service Deployment Inference Backend](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/all_params/models.html#id2) +- `--datasets`: 📚 [Open-Source Datasets](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/all_params/datasets.html#id3) → 📚 [Detailed Introduction](ais_bench/benchmark/configs/datasets/demo/README_en.md) +- `--summarizer`: 📚 [Result Summary Tasks](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/all_params/summarizer.html) # Modification of Configuration Files Corresponding to Tasks @@ -174,7 +174,7 @@ After executing the query command, you will get the following query results: ``` -- The dataset task configuration file `demo_gsm8k_gen_4_shot_cot_chat_prompt.py` in the quick start does not require additional modifications. For an introduction to the content of the dataset task configuration file, please refer to 📚 [Configure Open-Source Datasets](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/all_params/datasets.html#id6) +- The dataset task configuration file `demo_gsm8k_gen_4_shot_cot_chat_prompt.py` in the quick start does not require additional modifications. For an introduction to the content of the dataset task configuration file, please refer to 📚 [Configure Open-Source Datasets](https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/all_params/datasets.html#id6) The model configuration file `vllm_api_general_chat.py` contains configuration content related to model operation and needs to be modified according to actual conditions. The content that needs to be modified in the quick start is marked with comments. ```python @@ -270,7 +270,7 @@ dataset version metric mode vllm_api_general_chat demo_gsm8k 401e4c accuracy gen 62.50 ``` -For more tutorials, please refer to our 👉[Documentation](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/) +For more tutorials, please refer to our 👉[Documentation](https://ais-bench-benchmark.readthedocs.io/en/latest/) ## 🔜 Coming Soon diff --git a/ais_bench/benchmark/utils/logging/error_codes.py b/ais_bench/benchmark/utils/logging/error_codes.py index 0bc57aff..bdc1023d 100644 --- a/ais_bench/benchmark/utils/logging/error_codes.py +++ b/ais_bench/benchmark/utils/logging/error_codes.py @@ -40,7 +40,7 @@ class ErrorType(Enum): TASK = "TASK" # task error type class BaseErrorCode: - FAQ_BASE_URL = "https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/faqs/error_codes.html#" + FAQ_BASE_URL = "https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/faqs/error_codes.html#" def __init__(self, code_name: str, module: ErrorModule, err_type: ErrorType, code: int, message: str): diff --git a/docs/source_en/advanced_tutorials/judge_model_evaluate.md b/docs/source_en/advanced_tutorials/judge_model_evaluate.md index 0c27ae1e..9d383fab 100644 --- a/docs/source_en/advanced_tutorials/judge_model_evaluate.md +++ b/docs/source_en/advanced_tutorials/judge_model_evaluate.md @@ -29,7 +29,7 @@ graph LR; ## Quick Start -Taking the aime2025 dataset evaluation as an example, the usage is basically consistent with [AISBench Quick Start](https://ais-bench-benchmark-rf.readthedocs.io/en/latest/get_started/quick_start.html#). This quick start section only covers the differences. +Taking the aime2025 dataset evaluation as an example, the usage is basically consistent with [AISBench Quick Start](https://ais-bench-benchmark.readthedocs.io/en/latest/get_started/quick_start.html#). This quick start section only covers the differences. ### Command Meaning diff --git a/docs/source_en/faqs/error_codes.md b/docs/source_en/faqs/error_codes.md index 2f2726f2..49bd5a72 100644 --- a/docs/source_en/faqs/error_codes.md +++ b/docs/source_en/faqs/error_codes.md @@ -394,7 +394,7 @@ This indicates that all requests during the inference process failed. You need t ### Error Description When calculating steady-state performance metrics, no requests belonging to the steady state were found among all request information, and steady-state metrics cannot be calculated. ### Solution -You can check the concurrency graph of inference requests (reference document: https://ais-bench-benchmark-rf.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to confirm whether the `Request Concurrency Count` in the concurrency step graph reaches the concurrency number set in the model configuration file (the `batch_size` parameter) **and at least two requests reach the maximum concurrency number**. +You can check the concurrency graph of inference requests (reference document: https://ais-bench-benchmark.readthedocs.io/en/latest/base_tutorials/results_intro/performance_visualization.html) to confirm whether the `Request Concurrency Count` in the concurrency step graph reaches the concurrency number set in the model configuration file (the `batch_size` parameter) **and at least two requests reach the maximum concurrency number**. If the above conditions are not met, you can try the following methods to achieve a steady state: diff --git a/docs/source_zh_cn/advanced_tutorials/judge_model_evaluate.md b/docs/source_zh_cn/advanced_tutorials/judge_model_evaluate.md index 533ce41d..0dd6c731 100644 --- a/docs/source_zh_cn/advanced_tutorials/judge_model_evaluate.md +++ b/docs/source_zh_cn/advanced_tutorials/judge_model_evaluate.md @@ -24,7 +24,7 @@ graph LR; ``` ## 快速上手 -以aime2025数据集评测为例,使能方式与[AISBench工具快速入门](https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/get_started/quick_start.html#)基本一致,快速上手中仅做差异化说明 +以aime2025数据集评测为例,使能方式与[AISBench工具快速入门](https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/get_started/quick_start.html#)基本一致,快速上手中仅做差异化说明 ### 命令含义 AISBench命令中通过`--datasets`指定的裁判模型数据集任务`aime2025_gen_0_shot_llmjudge`。 diff --git a/docs/source_zh_cn/faqs/error_codes.md b/docs/source_zh_cn/faqs/error_codes.md index 03d87674..db3f9817 100644 --- a/docs/source_zh_cn/faqs/error_codes.md +++ b/docs/source_zh_cn/faqs/error_codes.md @@ -365,7 +365,7 @@ All requests failed, cannot calculate performance results. Please check the erro ### 错误描述 计算稳态性能指标时,在所有请求信息中找不到属于稳定阶段的请求,无法计算稳态指标。 ### 解决办法 -可以检查一下推理请求的并发图(参考文档:https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/base_tutorials/results_intro/performance_visualization.html),确认并发阶梯图中`Request Concurrency Count`是否达到模型配置文件中设置的并发数(`batch_size`参数)**且至少存在两个请求达到最大并发数**。 +可以检查一下推理请求的并发图(参考文档:https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/base_tutorials/results_intro/performance_visualization.html),确认并发阶梯图中`Request Concurrency Count`是否达到模型配置文件中设置的并发数(`batch_size`参数)**且至少存在两个请求达到最大并发数**。 若未满足上述条件,可以尝试以下方式达到稳定状态: #### 并发阶梯图中`Request Concurrency Count`持续增长之后直接持续下降 1. 降低推理请求的并发数(模型配置文件中的`batch_size`参数)。 diff --git a/tests/UT/utils/logging/test_error_codes.py b/tests/UT/utils/logging/test_error_codes.py index 79a238ae..5e2db42e 100644 --- a/tests/UT/utils/logging/test_error_codes.py +++ b/tests/UT/utils/logging/test_error_codes.py @@ -79,7 +79,7 @@ def test_str(self): def test_faq_url(self): """测试faq_url是否正确生成""" - expected_url = "https://ais-bench-benchmark-rf.readthedocs.io/zh-cn/latest/faqs/error_codes.html#utils-cfg-001" + expected_url = "https://ais-bench-benchmark.readthedocs.io/zh-cn/latest/faqs/error_codes.html#utils-cfg-001" self.assertEqual(self.error_code.faq_url, expected_url) def test_full_code_formatting(self): @@ -95,12 +95,12 @@ def test_full_code_formatting(self): # 测试代码大于等于100的情况 error_code_triple = BaseErrorCode("UTILS-CFG-123", ErrorModule.UTILS, ErrorType.CONFIG, 123, "test") self.assertEqual(error_code_triple.full_code, "UTILS-CFG-123") - + def test_invalid_code_name(self): """测试code_name与full_code不匹配时抛出ValueError""" with self.assertRaises(ValueError) as context: BaseErrorCode("INVALID-CODE", ErrorModule.UTILS, ErrorType.CONFIG, 1, "test") - + self.assertIn("code_name INVALID-CODE is not equal to full_code UTILS-CFG-001", str(context.exception))