Skip to content

Commit 4e0ec70

Browse files
[Docs] updated lora adapter doc for replica feature redesign (#1813)
docs: updated lora adapter doc for replica feature redesign Signed-off-by: Nurali Techie <nurali.techie@gmail.com>
1 parent f12a6a7 commit 4e0ec70

File tree

4 files changed

+19
-58
lines changed

4 files changed

+19
-58
lines changed

docs/source/features/lora-dynamic-loading.rst

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -336,16 +336,9 @@ Robust error handling for various network and startup conditions:
336336
Advanced Configuration Examples
337337
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338338

339-
**High-Availability Setup:**
339+
**High-Availability Setup with Multi-Replica:**
340340

341-
Here's a production-ready configuration demonstrating reliability features:
342-
343-
.. literalinclude:: ../../../samples/adapter/adapter-reliability-demo.yaml
344-
:language: yaml
345-
346-
**Multi-Replica Distribution:**
347-
348-
For load balancing across multiple pods:
341+
Here’s a production-ready configuration demonstrating load balancing across multiple pods:
349342

350343
.. literalinclude:: ../../../samples/adapter/adapter-multi-replica.yaml
351344
:language: yaml
@@ -355,11 +348,11 @@ For load balancing across multiple pods:
355348

356349
.. code-block:: bash
357350
358-
# 1. Apply the reliability demo adapter:
359-
kubectl apply -f samples/adapter/adapter-reliability-demo.yaml
351+
# 1. Apply the multi-replica adapter:
352+
kubectl apply -f samples/adapter/adapter-multi-replica.yaml
360353
361354
# 2. Monitor phase transitions:
362-
watch kubectl describe modeladapter reliability-demo-lora
355+
watch kubectl describe modeladapter sample-lora-multi-replica
363356
364357
# 3. Check controller logs for retry behavior:
365358
kubectl logs -n aibrix-system deployment/aibrix-controller-manager -f
@@ -369,7 +362,7 @@ For load balancing across multiple pods:
369362
# Watch how controller handles pod loss and reschedules
370363
371364
# 5. Verify final state:
372-
kubectl get modeladapter reliability-demo-lora -o yaml
365+
kubectl get modeladapter sample-lora-multi-replica -o yaml
373366
374367
Monitoring and Observability
375368
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
@@ -432,9 +425,10 @@ Best Practices for Production
432425

433426
.. code-block:: yaml
434427
435-
# Distribute adapter across multiple pods
428+
# Distribute adapter across all pods
436429
spec:
437-
replicas: 2 # Load adapter on 2 different pods
430+
# Load adapter on all pods by omitting replicas field
431+
# replicas: 1
438432
439433
3. **Configure Appropriate Health Checks:**
440434

@@ -652,7 +646,7 @@ Setup: AWS S3 Storage
652646
# Reference to S3 credentials
653647
credentialsSecretRef:
654648
name: s3-credentials
655-
replicas: 2
649+
replicas: 1
656650
657651
**Step 4: Verify Artifact Delegation**
658652

samples/adapter/adapter-multi-replica.yaml

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,10 +10,10 @@ spec:
1010
# REQUIRED: Base model that this LoRA adapter extends
1111
baseModel: qwen-coder-1-5b-instruct
1212

13-
# Specify the number of replicas for the adapter
14-
# The adapter will be loaded on this many pods matching the selector
13+
# Do not specify the replicas for the adapter
14+
# The adapter will be loaded on all pods
1515
# Provides high availability and load distribution
16-
replicas: 3
16+
# replicas: 1
1717

1818
# Pod selector to identify which pods can host this adapter
1919
# Requires pods to have both labels for proper selection
@@ -41,7 +41,6 @@ spec:
4141
timeout: "120s"
4242

4343
# Multi-replica behavior:
44-
# - Controller ensures adapters are loaded on exactly 3 different pods
45-
# - Automatic failover if any pod becomes unavailable
44+
# - Controller ensures adapters are loaded on all pods
4645
# - Load balancing across all healthy replicas
4746
# - Service discovery includes all replica endpoints

samples/adapter/adapter-reliability-demo.yaml

Lines changed: 0 additions & 33 deletions
This file was deleted.

samples/adapter/adapter.yaml

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,11 @@ spec:
2121
# Supported formats: huggingface://, s3://, or absolute local path
2222
artifactURL: huggingface://ai-blond/Qwen-Qwen2.5-Coder-1.5B-Instruct-lora
2323

24-
# Optional: Number of replicas for the adapter (default: 1)
25-
# The controller will load the adapter on this many pods
26-
# Uncomment to enable high availability across multiple pods
27-
# replicas: 2
24+
# Replicas controls adapter distribution across pods:
25+
# - nil (omitted): Load adapter on ALL matching pods (recommended)
26+
# - 1: Load adapter on a single pod selected by the scheduler
27+
# Only nil or 1 are supported. Other values will be rejected.
28+
# replicas: 1
2829

2930
# Optional: Scheduler to use for pod selection (default: "default")
3031
# Available schedulers: "default", "least-adapters"

0 commit comments

Comments
 (0)