-
Notifications
You must be signed in to change notification settings - Fork 0
Add ascend 310p runtime #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughReworks the custom inference runtime doc to use Xinference as the standard workflow example, adds concrete YAML runtime examples for MLServer and MindIE (Ascend NPU 310P), requires specific annotations and root/security context notes for MindIE, and introduces a comparison table summarizing runtimes and special requirements. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 597-599: The table entry uses an annotation key with an erroneous
trailing colon; change `storage.kserve.io/readonly:` to
`storage.kserve.io/readonly` in the Markdown table so the annotation key is
valid in Kubernetes manifests and docs (update the cell under "Configuration
Key" where `storage.kserve.io/readonly:` appears). Ensure no other instances of
the key include the trailing colon.
- Around line 601-603: The "Root Access" subsection currently states MindIE must
run as root but lacks the actual configuration; update the text under the
heading to either clarify that the existing YAML example already runs as root or
add an explicit securityContext snippet showing how to run as root: set
securityContext with runAsUser: 0 and runAsNonRoot: false and, if required for
NPU access, privileged: true (apply this to the ClusterServingRuntime or
InferenceService pod/container spec), and include a short note about security
implications and when privileged is necessary.
🧹 Nitpick comments (4)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (4)
605-613: Consider relocating the comparison table earlier in the document.The comparison table currently appears after all the detailed runtime examples. For better user experience, consider moving this table to appear right after the "Prerequisites" section (around line 49) or at the beginning of the "Standard Workflow" section. This allows readers to understand the differences between runtimes before diving into specific configurations.
324-561: Consider alternative approaches for the large embedded script.The MindIE startup script is embedded as a 230+ line environment variable, with
__LT__placeholders to work around YAML parsing issues. While functional, this approach has maintainability drawbacks:
- Difficult to version control, test, and debug separately
__LT__workaround adds cognitive overhead- Changes require redeploying the entire ClusterServingRuntime
Alternative approaches to consider:
- ConfigMap: Store the script in a ConfigMap and mount it as a volume
- Image bundling: Include the script in the MindIE image itself
- Init container: Use an init container to prepare the script from a ConfigMap
These alternatives improve maintainability while preserving functionality. However, if portability and self-contained configuration are priorities, the current approach is acceptable.
318-318: Be aware of the security implications of file permissions in the MindIE script.The embedded startup script sets broad permissions (
chmod -R 755) onMODEL_PATHand modifies permissions on the MindIE installation directory. Combined with running as root (documented in section 3), this creates a more permissive security posture compared to other runtimes like MLServer, which runs as non-root user 1000.While these permissions may be necessary for NPU hardware access and the Ascend driver requirements, ensure that:
- The model storage volume doesn't contain sensitive data beyond the model files
- Network policies restrict access to the MindIE pods
- Users understand the security tradeoffs when deploying MindIE
Also applies to: 528-530
290-603: Consider consistent subsection formatting across runtime examples.The MindIE section uses numbered subsections (1.ClusterServingRuntime, 2.Mandatory Annotations, 3.User Privileges), while the Xinference and MLServer examples don't use this format. For consistency, consider either:
- Using the same numbered subsection format for all runtime examples, or
- Using unnumbered subsections or plain headings throughout
Consistent formatting improves document readability and makes it easier for users to navigate between different runtime configurations.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.
Applied to files:
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pipelines as Code CI / doc-pr-build-ai
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Show resolved
Hide resolved
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Outdated
Show resolved
Hide resolved
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Show resolved
Hide resolved
Deploying alauda-ai with
|
| Latest commit: |
b9dac4d
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c06ff28e.alauda-ai.pages.dev |
| Branch Preview URL: | https://add-310p-rt.alauda-ai.pages.dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 318-320: Replace the recursive permission change (chmod -R 755
"${MODEL_PATH}") with a non-recursive change to the directory (chmod 755 on
MODEL_PATH) or remove it entirely and instead investigate/fix the storage
initializer that sets model file permissions; document the security implications
of making model files world-readable and only widen permissions when absolutely
required; also reduce noisy logging by keeping a single ls -ld "${MODEL_PATH}"
(or remove pre-change listing) rather than logging before and after for every
startup.
- Line 453: The default PORT value currently set to 8080 conflicts with the
port-range validation that enforces 1025-65535; either change the default PORT
to a value that satisfies the validator (e.g., set PORT to 1025 or 8081) or
relax the validator to include 8080 by adjusting the lower-bound check in the
port validation logic (the check enforcing 1025-65535 on lines 512-514) so that
PORT (environment variable PORT) can be accepted.
- Around line 582-589: The startupProbe's per-attempt timeout is excessively
long (timeoutSeconds: 180); change the startupProbe definition so timeoutSeconds
is reduced to a short value (e.g., 10) while keeping failureThreshold: 60 and
periodSeconds: 10 so overall retry window remains unchanged; update the
startupProbe block (the httpGet path/port/scheme and timeoutSeconds field) to
set timeoutSeconds: 10 to ensure individual probes fail fast but the probe still
retries up to failureThreshold × periodSeconds.
- Line 394: The error message string "Error: Unable to retrieve device info.
Please check if npu-smi is available for current user (id 1001), or if you are
specifying an occupied device." should be updated to remove or clarify the
hard-coded user id; replace it with a generic reference like "current user" or
interpolate the actual runtime user variable if available (e.g., runtime UID) so
it matches how the MindIE runtime runs (often root). Locate and update that
exact echo message in custom_inference_runtime.mdx to either omit "(id 1001)" or
substitute a dynamic UID placeholder and adjust the surrounding text to remain
consistent.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)
310-312: Consider simplifying the__LT__workaround.The current approach uses
__LT__as a placeholder for<in the embedded script, which is then replaced at runtime. While functional, this adds cognitive overhead and an extra processing step.Alternative approaches to consider:
- Use YAML multiline literal (
|) with proper escaping – the current approach already uses|, so shell comparison operators like<should work directly in most cases- Use arithmetic comparison syntax that doesn't require
<:if [ "$PORT" -lt 1025 ] || [ "$PORT" -gt 65535 ]; then- Test if the current
<actually causes YAML parsing issues – modern YAML parsers handle<in literal blocks without problemsIf you choose to keep the current approach, ensure it's documented with a comment explaining why the workaround is necessary.
Also applies to: 512-513
310-561: Consider externalizing the MindIE startup script.The embedded bash script spans ~235 lines (lines 326-561), making the YAML configuration difficult to read and maintain. While functionally correct, this approach has several drawbacks:
- Harder to review and test the script independently
- YAML syntax highlighting doesn't help with bash code
- Version control diffs become less readable
- The
__LT__workaround adds complexityConsider these alternatives:
- Ship the script as part of the container image at a known path like
/opt/mindie/startup.sh, then reference it directly- Use a ConfigMap to store the script and mount it into the container
- Keep the current approach if dynamic script generation from the YAML is a hard requirement
Note: This is a maintainability suggestion, not a functional issue. The current implementation works correctly.
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.
Applied to files:
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (5)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (5)
50-53: LGTM! Clear section restructuring.The renaming of the section to "Standard Workflow (Example: Xinference)" effectively establishes Xinference as the baseline reference implementation while making it clear that the workflow applies more broadly.
201-283: LGTM! Well-structured MLServer runtime example.The MLServer configuration demonstrates best practices with proper security context (non-root user), startup probes, and flexible model directory handling for different storage initializers.
609-613: LGTM! Root access requirements clearly documented.The explanation effectively clarifies that MindIE requires root privileges by contrasting it with MLServer's explicit non-root security context. The note that the absence of a
securityContextin the YAML means the container runs with image defaults (typically root) is helpful.
615-623: LGTM! Helpful comparison table.The comparison table provides a clear, actionable summary of the key differences between runtimes, making it easy for users to understand special requirements at a glance before diving into detailed configurations.
601-608: No action required—the annotation documentation is complete.The section correctly identifies
storage.kserve.io/readonlyas the required InferenceService metadata annotation for NPU binding. A codebase search for other Ascend/NPU-related annotations found no additional requirements for InferenceService metadata. Theaml-model-repoandaml-pipeline-tagannotations referenced in the runtime templates are custom application-level annotations, not NPU-specific binding requirements.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.