-
Notifications
You must be signed in to change notification settings - Fork 873
Fix validate consistency #7679
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix validate consistency #7679
Conversation
|
感谢你贡献飞桨文档,文档预览构建中,Docs-New 跑完后即可预览,预览链接:http://preview-pr-7679.paddle-docs-preview.paddlepaddle.org.cn/documentation/docs/zh/api/index_cn.html |
📚 本次 PR 文档预览链接(点击展开)
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes validation consistency issues in the PyTorch to PaddlePaddle API documentation conversion tools. The changes improve the API difference validation script, refactor the API discovery logic to recursively scan all markdown files, and reorganize API documentation by moving files from category-specific subdirectories to more appropriate locations.
Changes:
- Enhanced validation script with overloaded API support and optimized lookup performance using dictionary mapping
- Refactored API discovery to recursively scan all markdown files regardless of directory structure
- Reorganized API documentation files, moving transformers APIs from
torch_more_args/paddle_more_args/otherstoinvok_only_diff/args_name_diff/composite_implement - Updated various API signatures with missing asterisks (*) to denote keyword-only parameters
Reviewed changes
Copilot reviewed 153 out of 153 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| validate_api_difference_consistency.py | Added OVERLOADED_APIS dictionary, ALLOW_MISSING_DIFF_DOCS list, api_diff_map for O(1) lookup, optimized validation logic |
| get_api_difference_info.py | Refactored discover_all_metas to recursively scan all .md files with automatic library prefix detection |
| transformers.PretrainedConfig.md | Deleted from torch_more_args (moved to invok_only_diff) |
| transformers.GenerationConfig.md | Deleted from torch_more_args (moved to invok_only_diff) |
| transformers.AddedToken.md | Deleted from torch_more_args (moved to invok_only_diff) |
| torchvision.models.inception.*.md | Added new InceptionA-E documentation |
| torchvision.models.Inception3.md | Added new Inception3 documentation |
| torch.*.md (multiple) | Updated API signatures with asterisks for keyword-only parameters |
| transformers.*.md (multiple) | Changed references from paddlenlp to paddleformers |
| torch.nn.Module.*.md (multiple) | Deleted files moved from paddle_more_args/input_args_usage_diff to other categories |
zhwesky2010
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 有些不符合 差异文档规范 的,需要按照这个文档规范来 pytorch_api_mapping_format_cn.md,CI上有自动化工具拦截不符合规范
- 可能有一个问题:paconvert中有些no_need_convert还没更新,最近又修改了不少api,可能得先测下paconvert,将已对齐的都加上去
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.std_mean.md
Outdated
Show resolved
Hide resolved
.../guides/model_convert/convert_from_pytorch/api_difference/args_name_diff/torch.Tensor.std.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.var_mean.md
Outdated
Show resolved
Hide resolved
...s/model_convert/convert_from_pytorch/api_difference/input_args_type_diff/torch.block_diag.md
Outdated
Show resolved
Hide resolved
..._convert/convert_from_pytorch/api_difference/input_args_type_diff/torch.broadcast_tensors.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.Tensor.round.md
Outdated
Show resolved
Hide resolved
...des/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.Tensor.round_.md
Outdated
Show resolved
Hide resolved
docs/guides/model_convert/convert_from_pytorch/api_difference/torch_more_args/torch.baddbmm.md
Outdated
Show resolved
Hide resolved
...el_convert/convert_from_pytorch/api_difference/torch_more_args/torch.nn.functional.kl_div.md
Outdated
Show resolved
Hide resolved
| # functions currently. Currently, we hard code the check of overloaded functions | ||
| # in this file. | ||
|
|
||
| OVERLOADED_APIS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长,这个是不是可以加到pre-commit白名单里
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
白名单只需要能跑就行,一般不需要可读性,不用占太多行或者单独挪一个文件。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
目前感觉还是有一些错误,但是CI通过了。这两个工具还是有不少完善的点:
- 差异文档格式检查工具
- 差异文档内容检查工具
存量修复过程中记录下工具漏检、误检的点,存量修完开展工具完善。
.../guides/model_convert/convert_from_pytorch/api_difference/args_name_diff/torch.Tensor.var.md
Outdated
Show resolved
Hide resolved
...l_convert/convert_from_pytorch/api_difference/args_name_diff/transformers.LogitsProcessor.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.std_mean.md
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.std_mean.md
Outdated
Show resolved
Hide resolved
...ides/model_convert/convert_from_pytorch/api_difference/composite_implement/torch.var_mean.md
Outdated
Show resolved
Hide resolved
| ## [ 仅 API 调用方式不一致 ]transformers.LogitsProcessorList | ||
| ### [transformers.LogitsProcessorList](https://hf-mirror.com/docs/transformers/v4.42.0/en/internal/generation_utils#transformers.LogitsProcessorList) | ||
| ```python | ||
| transformers.LogitsProcessorList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
形参有吗,太多的话写成 **kwargs,不然容易被用户当做无参函数。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上,formers 通过继承 list 来实现,参数不好写
...odel_convert/convert_from_pytorch/api_difference/torch_more_args/torch.linalg.matrix_rank.md
Outdated
Show resolved
Hide resolved
...odel_convert/convert_from_pytorch/api_difference/torch_more_args/torch.linalg.matrix_rank.md
Outdated
Show resolved
Hide resolved
...convert/convert_from_pytorch/api_difference/torch_more_args/torchvision.models.Inception3.md
Outdated
Show resolved
Hide resolved
| # functions currently. Currently, we hard code the check of overloaded functions | ||
| # in this file. | ||
|
|
||
| OVERLOADED_APIS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个名单怎么这么长
移除了可以合并为一个签名的重载。
这个是不是可以加到pre-commit白名单里
感觉没必要吧,格式化后可读性可好一点
白名单只需要能跑就行,一般不需要可读性,不用占太多行或者单独挪一个文件。
zhwesky2010
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先合入下个PR再改
| | transformers | PaddlePaddle | 备注 | | ||
| | ----------------- | ----------------- | --------------------------------------- | | ||
| | input_ids | input_ids | 输入(tokens) 的 id 组成的 Tensor。 | | ||
| | scores | logits | 得分组成的 Tensor,仅参数名不一致。 | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paddle里没有scores
| ### [transformers.StoppingCriteriaList](https://github.com/huggingface/transformers/blob/d625294d79341662784495551abdf45e6cb9372f/src/transformers/generation/stopping_criteria.py#L503) | ||
|
|
||
| ```python | ||
| transformers.StoppingCriteriaList() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个实现继承自 python 内置 List,形参与 List 一致,不太好写
直接写 *args, **kwargs 吧
本 PR 修复差异文档与 PaConvert 中映射规则不一致的问题,主要涉及以下几类