[Docs] updated lora adapter doc for replica feature redesign (#1813)

nurali-techie · web-flow · commit 4e0ec708df9a · 2025-12-03T10:24:15.000+08:00
docs: updated lora adapter doc for replica feature redesign

Signed-off-by: Nurali Techie &lt;nurali.techie@gmail.com&gt;
diff --git a/docs/source/features/lora-dynamic-loading.rst b/docs/source/features/lora-dynamic-loading.rst
@@ -336,16 +336,9 @@ Robust error handling for various network and startup conditions:
 Advanced Configuration Examples
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-**High-Availability Setup:**
+**High-Availability Setup with Multi-Replica:**
 
-Here's a production-ready configuration demonstrating reliability features:
-
-.. literalinclude:: ../../../samples/adapter/adapter-reliability-demo.yaml
-   :language: yaml
-
-**Multi-Replica Distribution:**
-
-For load balancing across multiple pods:
+Here’s a production-ready configuration demonstrating load balancing across multiple pods:
 
 .. literalinclude:: ../../../samples/adapter/adapter-multi-replica.yaml
    :language: yaml
@@ -355,11 +348,11 @@ For load balancing across multiple pods:
 
 .. code-block:: bash
 
-    # 1. Apply the reliability demo adapter:
-    kubectl apply -f samples/adapter/adapter-reliability-demo.yaml
+    # 1. Apply the multi-replica adapter:
+    kubectl apply -f samples/adapter/adapter-multi-replica.yaml
     
     # 2. Monitor phase transitions:
-    watch kubectl describe modeladapter reliability-demo-lora
+    watch kubectl describe modeladapter sample-lora-multi-replica
     
     # 3. Check controller logs for retry behavior:
     kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f
@@ -369,7 +362,7 @@ For load balancing across multiple pods:
     # Watch how controller handles pod loss and reschedules
     
     # 5. Verify final state:
-    kubectl get modeladapter reliability-demo-lora -o yaml
+    kubectl get modeladapter sample-lora-multi-replica -o yaml
 
 Monitoring and Observability
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -432,9 +425,10 @@ Best Practices for Production
 
    .. code-block:: yaml
    
-       # Distribute adapter across multiple pods
+       # Distribute adapter across all pods
        spec:
-         replicas: 2  # Load adapter on 2 different pods
+         # Load adapter on all pods by omitting replicas field
+         # replicas: 1
 
 3. **Configure Appropriate Health Checks:**
 
@@ -652,7 +646,7 @@ Setup: AWS S3 Storage
       # Reference to S3 credentials
       credentialsSecretRef:
         name: s3-credentials
-      replicas: 2
+      replicas: 1
 
 **Step 4: Verify Artifact Delegation**
 
diff --git a/samples/adapter/adapter-multi-replica.yaml b/samples/adapter/adapter-multi-replica.yaml
@@ -10,10 +10,10 @@ spec:
   # REQUIRED: Base model that this LoRA adapter extends
   baseModel: qwen-coder-1-5b-instruct
   
-  # Specify the number of replicas for the adapter
-  # The adapter will be loaded on this many pods matching the selector
+  # Do not specify the replicas for the adapter
+  # The adapter will be loaded on all pods
   # Provides high availability and load distribution
-  replicas: 3
+  # replicas: 1
   
   # Pod selector to identify which pods can host this adapter
   # Requires pods to have both labels for proper selection
@@ -41,7 +41,6 @@ spec:
     timeout: "120s"
   
   # Multi-replica behavior:
-  # - Controller ensures adapters are loaded on exactly 3 different pods
-  # - Automatic failover if any pod becomes unavailable
+  # - Controller ensures adapters are loaded on all pods
   # - Load balancing across all healthy replicas
   # - Service discovery includes all replica endpoints
diff --git a/samples/adapter/adapter-reliability-demo.yaml b/samples/adapter/adapter-reliability-demo.yaml
diff --git a/samples/adapter/adapter.yaml b/samples/adapter/adapter.yaml
@@ -21,10 +21,11 @@ spec:
   # Supported formats: huggingface://, s3://, or absolute local path
   artifactURL: huggingface://ai-blond/Qwen-Qwen2.5-Coder-1.5B-Instruct-lora
   
-  # Optional: Number of replicas for the adapter (default: 1)
-  # The controller will load the adapter on this many pods
-  # Uncomment to enable high availability across multiple pods
-  # replicas: 2
+  # Replicas controls adapter distribution across pods:
+  # - nil (omitted): Load adapter on ALL matching pods (recommended)
+  # - 1: Load adapter on a single pod selected by the scheduler
+  # Only nil or 1 are supported. Other values will be rejected.
+  # replicas: 1
   
   # Optional: Scheduler to use for pod selection (default: "default")
   # Available schedulers: "default", "least-adapters"