Skip to content

Commit 8b5b2d1

Browse files
committed
docs: move network volume troubleshooting to dedicated guide
1 parent 1f813ea commit 8b5b2d1

File tree

3 files changed

+150
-83
lines changed

3 files changed

+150
-83
lines changed

docs/configuration.md

Lines changed: 1 addition & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -23,91 +23,11 @@ This document outlines the environment variables available for configuring the `
2323
| `WEBSOCKET_RECONNECT_ATTEMPTS` | Number of websocket reconnection attempts when connection drops during job execution. | `5` |
2424
| `WEBSOCKET_RECONNECT_DELAY_S` | Delay in seconds between websocket reconnection attempts. | `3` |
2525
| `WEBSOCKET_TRACE` | Enable low-level websocket frame tracing for protocol debugging. Set to `true` only when diagnosing connection issues. | `false` |
26-
| `NETWORK_VOLUME_DEBUG` | Enable detailed network volume diagnostics in worker logs. Useful for debugging model path issues. See [Network Volume Configuration](#network-volume-configuration) below. | `false` |
26+
| `NETWORK_VOLUME_DEBUG` | Enable detailed network volume diagnostics in worker logs. Useful for debugging model path issues. See [Network Volumes & Model Paths](network-volumes.md). | `false` |
2727

2828
> [!TIP]
2929
> **For troubleshooting:** Set `COMFY_LOG_LEVEL=DEBUG` to get detailed logs when ComfyUI crashes or behaves unexpectedly. This helps identify the exact point of failure in your workflows.
3030
31-
## Network Volume Configuration
32-
33-
When using a RunPod network volume to store your models, the worker expects a specific directory structure. If ComfyUI is not finding your models, enable diagnostics by setting `NETWORK_VOLUME_DEBUG=true`.
34-
35-
### Expected Directory Structure
36-
37-
Models must be placed in the following structure on your network volume:
38-
39-
```
40-
/runpod-volume/
41-
└── models/
42-
├── checkpoints/ # Stable Diffusion checkpoints (.safetensors, .ckpt)
43-
├── loras/ # LoRA files (.safetensors, .pt)
44-
├── vae/ # VAE models (.safetensors, .pt)
45-
├── clip/ # CLIP models (.safetensors, .pt)
46-
├── clip_vision/ # CLIP Vision models
47-
├── controlnet/ # ControlNet models (.safetensors, .pt)
48-
├── embeddings/ # Textual inversion embeddings (.safetensors, .pt)
49-
├── upscale_models/ # Upscaling models (.safetensors, .pt)
50-
├── unet/ # UNet models
51-
└── configs/ # Model configs (.yaml, .json)
52-
```
53-
54-
### Supported File Extensions
55-
56-
ComfyUI only recognizes files with specific extensions:
57-
58-
| Model Type | Supported Extensions |
59-
| ---------------- | ----------------------------------- |
60-
| Checkpoints | `.safetensors`, `.ckpt`, `.pt`, `.pth`, `.bin` |
61-
| LoRAs | `.safetensors`, `.pt` |
62-
| VAE | `.safetensors`, `.pt`, `.bin` |
63-
| CLIP | `.safetensors`, `.pt`, `.bin` |
64-
| ControlNet | `.safetensors`, `.pt`, `.pth`, `.bin` |
65-
| Embeddings | `.safetensors`, `.pt`, `.bin` |
66-
| Upscale Models | `.safetensors`, `.pt`, `.pth` |
67-
68-
> [!WARNING]
69-
> **Common Issues:**
70-
> - Models placed directly in `/runpod-volume/checkpoints/` instead of `/runpod-volume/models/checkpoints/` will not be found.
71-
> - Files with incorrect extensions (e.g., `.txt`, `.zip`) will be ignored.
72-
> - Empty directories or missing subdirectories are fine—only create the folders you need.
73-
74-
### Debugging Network Volume Issues
75-
76-
1. **Enable diagnostics** by adding `NETWORK_VOLUME_DEBUG=true` to your endpoint's environment variables.
77-
78-
2. **Send a test request** to your endpoint (any request will trigger the diagnostics).
79-
80-
3. **Check the worker logs** in the RunPod console. You'll see detailed output like:
81-
82-
```
83-
======================================================================
84-
NETWORK VOLUME DIAGNOSTICS (NETWORK_VOLUME_DEBUG=true)
85-
======================================================================
86-
87-
[1] Checking extra_model_paths.yaml configuration...
88-
✓ FOUND: /comfyui/extra_model_paths.yaml
89-
90-
[2] Checking network volume mount at /runpod-volume...
91-
✓ MOUNTED: /runpod-volume
92-
93-
[3] Checking directory structure...
94-
✓ FOUND: /runpod-volume/models
95-
96-
[4] Scanning model directories...
97-
98-
checkpoints/:
99-
- my-model.safetensors (6.5 GB)
100-
101-
loras/:
102-
- style-lora.safetensors (144.2 MB)
103-
104-
[5] Summary
105-
✓ Models found on network volume!
106-
======================================================================
107-
```
108-
109-
4. **Disable diagnostics** once your issue is resolved by removing the environment variable or setting it to `false`.
110-
11131
## AWS S3 Upload Configuration
11232

11333
Configure these variables **only** if you want the worker to upload generated images directly to an AWS S3 bucket. If these are not set, images will be returned as base64-encoded strings in the API response.

docs/deployment.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ This is the simplest method if the official images meet your needs.
1616
- Container Registry Credentials: Leave as default (images are public).
1717
- Container Disk: Adjust based on the chosen image tag, see [GPU Recommendations](#gpu-recommendations).
1818
- (optional) Environment Variables: Configure S3 or other settings (see [Configuration Guide](configuration.md)).
19-
- Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models).
19+
- Note: If you don't configure S3, images are returned as base64. For persistent storage across jobs without S3, consider using a [Network Volume](customization.md#method-2-network-volume-alternative-for-models). If models on your network volume are not being detected, see [Network Volumes & Model Paths](network-volumes.md) for troubleshooting steps.
2020
- Click on `Save Template`
2121

2222
### Create your endpoint
@@ -32,7 +32,7 @@ This is the simplest method if the official images meet your needs.
3232
- Idle Timeout: `5` (Default is usually fine, adjust if needed).
3333
- Flash Boot: `enabled` (Recommended for faster worker startup).
3434
- Select Template: `worker-comfyui` (or the name you gave your template).
35-
- (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models).
35+
- (optional) Advanced: If you are using a Network Volume, select it under `Select Network Volume`. See the [Customization Guide](customization.md#method-2-network-volume-alternative-for-models). For detailed model path layout and debugging tips, see [Network Volumes & Model Paths](network-volumes.md).
3636

3737
- Click `deploy`
3838
- Your endpoint will be created. You can click on it to view the dashboard and find its ID.

docs/network-volumes.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# Network Volumes & Model Paths
2+
3+
This document explains how to use RunPod **Network Volumes** with `worker-comfyui`, how model paths are resolved inside the container, and how to debug cases where models are not detected.
4+
5+
> **Scope**
6+
>
7+
> These instructions apply to **serverless endpoints** using this worker. Pods mount network volumes at `/workspace` by default, while serverless workers see them at `/runpod-volume`.
8+
9+
## Directory Mapping
10+
11+
For **serverless endpoints**:
12+
13+
- Network volume root is mounted at: `/runpod-volume`
14+
- ComfyUI models are expected under: `/runpod-volume/models/...`
15+
16+
For **Pods**:
17+
18+
- Network volume root is mounted at: `/workspace`
19+
- Equivalent ComfyUI model path: `/workspace/models/...`
20+
21+
If you use the S3-compatible API, the same paths map as:
22+
23+
- Serverless: `/runpod-volume/my-folder/file.txt`
24+
- Pod: `/workspace/my-folder/file.txt`
25+
- S3 API: `s3://<NETWORK_VOLUME_ID>/my-folder/file.txt`
26+
27+
## Expected Directory Structure
28+
29+
Models must be placed in the following structure on your network volume:
30+
31+
```text
32+
/runpod-volume/
33+
└── models/
34+
├── checkpoints/ # Stable Diffusion checkpoints (.safetensors, .ckpt)
35+
├── loras/ # LoRA files (.safetensors, .pt)
36+
├── vae/ # VAE models (.safetensors, .pt)
37+
├── clip/ # CLIP models (.safetensors, .pt)
38+
├── clip_vision/ # CLIP Vision models
39+
├── controlnet/ # ControlNet models (.safetensors, .pt)
40+
├── embeddings/ # Textual inversion embeddings (.safetensors, .pt)
41+
├── upscale_models/ # Upscaling models (.safetensors, .pt)
42+
├── unet/ # UNet models
43+
└── configs/ # Model configs (.yaml, .json)
44+
```
45+
46+
> **Note**
47+
>
48+
> Only create the subdirectories you actually need; empty or missing folders are fine.
49+
50+
## Supported File Extensions
51+
52+
ComfyUI only recognizes files with specific extensions when scanning model directories.
53+
54+
| Model Type | Supported Extensions |
55+
| -------------- | ------------------------------------------- |
56+
| Checkpoints | `.safetensors`, `.ckpt`, `.pt`, `.pth`, `.bin` |
57+
| LoRAs | `.safetensors`, `.pt` |
58+
| VAE | `.safetensors`, `.pt`, `.bin` |
59+
| CLIP | `.safetensors`, `.pt`, `.bin` |
60+
| ControlNet | `.safetensors`, `.pt`, `.pth`, `.bin` |
61+
| Embeddings | `.safetensors`, `.pt`, `.bin` |
62+
| Upscale Models | `.safetensors`, `.pt`, `.pth` |
63+
64+
Files with other extensions (for example `.txt`, `.zip`) are **ignored** by ComfyUI’s model discovery.
65+
66+
## Common Issues
67+
68+
- **Wrong root directory**
69+
- Models placed directly under `/runpod-volume/checkpoints/...` instead of `/runpod-volume/models/checkpoints/...`.
70+
- **Incorrect extensions**
71+
- Files named without one of the supported extensions are skipped.
72+
- **Empty directories**
73+
- No actual model files present in `models/checkpoints` (or other folders).
74+
- **Volume not attached**
75+
- Endpoint created without selecting a network volume under **Advanced → Select Network Volume**.
76+
77+
If any of the above is true, ComfyUI will silently fail to discover models from the network volume.
78+
79+
## Debugging with `NETWORK_VOLUME_DEBUG`
80+
81+
The worker exposes an opt‑in debug mode controlled via the `NETWORK_VOLUME_DEBUG` environment variable.
82+
83+
### When to Use
84+
85+
Enable this when:
86+
87+
- Models on your network volume are not appearing in ComfyUI
88+
- You suspect the directory structure or file extensions are wrong
89+
- You want to quickly verify what the worker can actually see on `/runpod-volume`
90+
91+
### How to Enable
92+
93+
1. Go to your serverless **Endpoint → Manage → Edit**.
94+
2. Under **Environment Variables**, add:
95+
96+
- `NETWORK_VOLUME_DEBUG=true`
97+
98+
3. Save and wait for workers to restart (or scale to zero and back up).
99+
4. Send any request to your endpoint (even a minimal one) to trigger the diagnostics.
100+
101+
### Reading the Diagnostics
102+
103+
When enabled, each request prints a detailed report to the worker logs, for example:
104+
105+
```text
106+
======================================================================
107+
NETWORK VOLUME DIAGNOSTICS (NETWORK_VOLUME_DEBUG=true)
108+
======================================================================
109+
110+
[1] Checking extra_model_paths.yaml configuration...
111+
✓ FOUND: /comfyui/extra_model_paths.yaml
112+
113+
[2] Checking network volume mount at /runpod-volume...
114+
✓ MOUNTED: /runpod-volume
115+
116+
[3] Checking directory structure...
117+
✓ FOUND: /runpod-volume/models
118+
119+
[4] Scanning model directories...
120+
121+
checkpoints/:
122+
- my-model.safetensors (6.5 GB)
123+
124+
loras/:
125+
- style-lora.safetensors (144.2 MB)
126+
127+
[5] Summary
128+
✓ Models found on network volume!
129+
======================================================================
130+
```
131+
132+
If there is a problem, the diagnostics will instead highlight it, for example:
133+
134+
- Missing `models/` directory
135+
- No valid model files in any subdirectory
136+
- Files present but ignored due to wrong extensions
137+
138+
### Disabling Debug Mode
139+
140+
Once you have resolved your issue, disable diagnostics to keep logs clean:
141+
142+
- Remove the `NETWORK_VOLUME_DEBUG` environment variable, **or**
143+
- Set `NETWORK_VOLUME_DEBUG=false`
144+
145+
This returns the worker to normal behavior without extra log noise.
146+
147+

0 commit comments

Comments
 (0)