Skip to content

Conversation

@fyuan1316
Copy link
Contributor

@fyuan1316 fyuan1316 commented Jan 6, 2026

Summary by CodeRabbit

  • Documentation
    • Reorganized custom inference runtime guide with a clear "Standard Workflow (Example: Xinference)" and step-by-step flow.
    • Added concrete runtime examples (Xinference, MLServer, MindIE) with full configuration and operational guidance.
    • Introduced a comparison table summarizing runtimes, hardware targets, frameworks, and special requirements.
    • Documented MindIE-specific requirements: mandatory NPU-binding annotations and root/security-context adjustments for the image.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 6, 2026

Walkthrough

Reworks the custom inference runtime doc to use Xinference as the standard workflow example, adds concrete YAML runtime examples for MLServer and MindIE (Ascend NPU 310P), requires specific annotations and root/security context notes for MindIE, and introduces a comparison table summarizing runtimes and special requirements.

Changes

Cohort / File(s) Summary
Documentation
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
Replaced generic "Steps" with "Standard Workflow (Example: Xinference)"; clarified runtime creation to reference ClusterServingRuntime; added "Specific Runtime Examples" containing full YAMLs for MLServer and MindIE (including startup scripts, env setup, probes); documented mandatory MindIE annotations and root/security-context requirement; added a comparison table of runtimes, hardware targets, frameworks, and special requirements.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Poem

🐰 In docs we hop and nibble clear,
Xinference first, then runtimes near,
MLServer sings, MindIE hums in root,
YAML snug, annotations suit,
A rabbit cheers—deploy with cheer! 🥕

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add ascend 310p runtime' accurately describes the main addition in the PR: introducing MindIE runtime documentation for Ascend NPU 310P hardware.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 597-599: The table entry uses an annotation key with an erroneous
trailing colon; change `storage.kserve.io/readonly:` to
`storage.kserve.io/readonly` in the Markdown table so the annotation key is
valid in Kubernetes manifests and docs (update the cell under "Configuration
Key" where `storage.kserve.io/readonly:` appears). Ensure no other instances of
the key include the trailing colon.
- Around line 601-603: The "Root Access" subsection currently states MindIE must
run as root but lacks the actual configuration; update the text under the
heading to either clarify that the existing YAML example already runs as root or
add an explicit securityContext snippet showing how to run as root: set
securityContext with runAsUser: 0 and runAsNonRoot: false and, if required for
NPU access, privileged: true (apply this to the ClusterServingRuntime or
InferenceService pod/container spec), and include a short note about security
implications and when privileged is necessary.
🧹 Nitpick comments (4)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (4)

605-613: Consider relocating the comparison table earlier in the document.

The comparison table currently appears after all the detailed runtime examples. For better user experience, consider moving this table to appear right after the "Prerequisites" section (around line 49) or at the beginning of the "Standard Workflow" section. This allows readers to understand the differences between runtimes before diving into specific configurations.


324-561: Consider alternative approaches for the large embedded script.

The MindIE startup script is embedded as a 230+ line environment variable, with __LT__ placeholders to work around YAML parsing issues. While functional, this approach has maintainability drawbacks:

  • Difficult to version control, test, and debug separately
  • __LT__ workaround adds cognitive overhead
  • Changes require redeploying the entire ClusterServingRuntime

Alternative approaches to consider:

  1. ConfigMap: Store the script in a ConfigMap and mount it as a volume
  2. Image bundling: Include the script in the MindIE image itself
  3. Init container: Use an init container to prepare the script from a ConfigMap

These alternatives improve maintainability while preserving functionality. However, if portability and self-contained configuration are priorities, the current approach is acceptable.


318-318: Be aware of the security implications of file permissions in the MindIE script.

The embedded startup script sets broad permissions (chmod -R 755) on MODEL_PATH and modifies permissions on the MindIE installation directory. Combined with running as root (documented in section 3), this creates a more permissive security posture compared to other runtimes like MLServer, which runs as non-root user 1000.

While these permissions may be necessary for NPU hardware access and the Ascend driver requirements, ensure that:

  1. The model storage volume doesn't contain sensitive data beyond the model files
  2. Network policies restrict access to the MindIE pods
  3. Users understand the security tradeoffs when deploying MindIE

Also applies to: 528-530


290-603: Consider consistent subsection formatting across runtime examples.

The MindIE section uses numbered subsections (1.ClusterServingRuntime, 2.Mandatory Annotations, 3.User Privileges), while the Xinference and MLServer examples don't use this format. For consistency, consider either:

  1. Using the same numbered subsection format for all runtime examples, or
  2. Using unnumbered subsections or plain headings throughout

Consistent formatting improves document readability and makes it easier for users to navigate between different runtime configurations.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bded472 and 22710af.

📒 Files selected for processing (1)
  • docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai

@cloudflare-workers-and-pages
Copy link

cloudflare-workers-and-pages bot commented Jan 6, 2026

Deploying alauda-ai with  Cloudflare Pages  Cloudflare Pages

Latest commit: b9dac4d
Status: ✅  Deploy successful!
Preview URL: https://c06ff28e.alauda-ai.pages.dev
Branch Preview URL: https://add-310p-rt.alauda-ai.pages.dev

View logs

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI Agents
In
@docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx:
- Around line 318-320: Replace the recursive permission change (chmod -R 755
"${MODEL_PATH}") with a non-recursive change to the directory (chmod 755 on
MODEL_PATH) or remove it entirely and instead investigate/fix the storage
initializer that sets model file permissions; document the security implications
of making model files world-readable and only widen permissions when absolutely
required; also reduce noisy logging by keeping a single ls -ld "${MODEL_PATH}"
(or remove pre-change listing) rather than logging before and after for every
startup.
- Line 453: The default PORT value currently set to 8080 conflicts with the
port-range validation that enforces 1025-65535; either change the default PORT
to a value that satisfies the validator (e.g., set PORT to 1025 or 8081) or
relax the validator to include 8080 by adjusting the lower-bound check in the
port validation logic (the check enforcing 1025-65535 on lines 512-514) so that
PORT (environment variable PORT) can be accepted.
- Around line 582-589: The startupProbe's per-attempt timeout is excessively
long (timeoutSeconds: 180); change the startupProbe definition so timeoutSeconds
is reduced to a short value (e.g., 10) while keeping failureThreshold: 60 and
periodSeconds: 10 so overall retry window remains unchanged; update the
startupProbe block (the httpGet path/port/scheme and timeoutSeconds field) to
set timeoutSeconds: 10 to ensure individual probes fail fast but the probe still
retries up to failureThreshold × periodSeconds.
- Line 394: The error message string "Error: Unable to retrieve device info.
Please check if npu-smi is available for current user (id 1001), or if you are
specifying an occupied device." should be updated to remove or clarify the
hard-coded user id; replace it with a generic reference like "current user" or
interpolate the actual runtime user variable if available (e.g., runtime UID) so
it matches how the MindIE runtime runs (often root). Locate and update that
exact echo message in custom_inference_runtime.mdx to either omit "(id 1001)" or
substitute a dynamic UID placeholder and adjust the surrounding text to remain
consistent.
🧹 Nitpick comments (2)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (2)

310-312: Consider simplifying the __LT__ workaround.

The current approach uses __LT__ as a placeholder for < in the embedded script, which is then replaced at runtime. While functional, this adds cognitive overhead and an extra processing step.

Alternative approaches to consider:

  1. Use YAML multiline literal (|) with proper escaping – the current approach already uses |, so shell comparison operators like < should work directly in most cases
  2. Use arithmetic comparison syntax that doesn't require <:
    if [ "$PORT" -lt 1025 ] || [ "$PORT" -gt 65535 ]; then
  3. Test if the current < actually causes YAML parsing issues – modern YAML parsers handle < in literal blocks without problems

If you choose to keep the current approach, ensure it's documented with a comment explaining why the workaround is necessary.

Also applies to: 512-513


310-561: Consider externalizing the MindIE startup script.

The embedded bash script spans ~235 lines (lines 326-561), making the YAML configuration difficult to read and maintain. While functionally correct, this approach has several drawbacks:

  • Harder to review and test the script independently
  • YAML syntax highlighting doesn't help with bash code
  • Version control diffs become less readable
  • The __LT__ workaround adds complexity

Consider these alternatives:

  1. Ship the script as part of the container image at a known path like /opt/mindie/startup.sh, then reference it directly
  2. Use a ConfigMap to store the script and mount it into the container
  3. Keep the current approach if dynamic script generation from the YAML is a hard requirement

Note: This is a maintainability suggestion, not a functional issue. The current implementation works correctly.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22710af and b9dac4d.

📒 Files selected for processing (1)
  • docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-12-31T02:30:16.360Z
Learnt from: EdisonSu768
Repo: alauda/aml-docs PR: 73
File: docs/en/monitoring_ops/resource_monitoring/how_to/add_monitor_dashboard.mdx:28-45
Timestamp: 2025-12-31T02:30:16.360Z
Learning: In MDX documentation files (e.g., docs/.../*.mdx), when including PromQL code blocks, use bash as the syntax highlighter fallback because the rspress system does not support PromQL highlighting. Ensure the code blocks specify the language as bash (e.g., ```bash) where PromQL would appear, to maintain readability and avoid broken highlighting.

Applied to files:

  • docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Pipelines as Code CI / doc-pr-build-ai
🔇 Additional comments (5)
docs/en/model_inference/inference_service/how_to/custom_inference_runtime.mdx (5)

50-53: LGTM! Clear section restructuring.

The renaming of the section to "Standard Workflow (Example: Xinference)" effectively establishes Xinference as the baseline reference implementation while making it clear that the workflow applies more broadly.


201-283: LGTM! Well-structured MLServer runtime example.

The MLServer configuration demonstrates best practices with proper security context (non-root user), startup probes, and flexible model directory handling for different storage initializers.


609-613: LGTM! Root access requirements clearly documented.

The explanation effectively clarifies that MindIE requires root privileges by contrasting it with MLServer's explicit non-root security context. The note that the absence of a securityContext in the YAML means the container runs with image defaults (typically root) is helpful.


615-623: LGTM! Helpful comparison table.

The comparison table provides a clear, actionable summary of the key differences between runtimes, making it easy for users to understand special requirements at a glance before diving into detailed configurations.


601-608: No action required—the annotation documentation is complete.

The section correctly identifies storage.kserve.io/readonly as the required InferenceService metadata annotation for NPU binding. A codebase search for other Ascend/NPU-related annotations found no additional requirements for InferenceService metadata. The aml-model-repo and aml-pipeline-tag annotations referenced in the runtime templates are custom application-level annotations, not NPU-specific binding requirements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants