/v1/models exposes models that are not actually loaded

The server currently loads exactly one GGUF model at startup, but the model metadata endpoints make it look like both DeepSeek V4 Flash and DeepSeek V4 Pro are available at the same time.

For example, when the server is started with a Flash GGUF:

```sh
./ds4-server -m <flash.gguf> --host 127.0.0.1 --port 8000
```
```sh
curl http://127.0.0.1:8000/v1/models

The response advertises both:

deepseek-v4-flash
deepseek-v4-pro
  ```

This is misleading because the model field in API requests does not actually switch the loaded GGUF. Inference always uses the single model loaded at server startup.

The same problem also affects GET /v1/models/<id>: the server accepts both deepseek-v4-flash and deepseek-v4-pro as valid metadata endpoints, even if only one of those models is actually loaded.

Expected behavior:
- If a Flash GGUF is loaded, /v1/models should expose only deepseek-v4-flash
- If a Pro GGUF is loaded, /v1/models should expose only deepseek-v4-pro
- GET /v1/models/<id> should return 404 for the non-loaded model id

This matters for OpenAI-compatible clients that inspect /v1/models to decide which model IDs are available. The current behavior can make clients believe both variants are selectable, while the server can only run the already-loaded GGUF.

## Proposed changes

Fix this by making model metadata reflect the single GGUF loaded at startup. Implemented in https://github.com/antirez/ds4/pull/287.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

/v1/models exposes models that are not actually loaded #414

Proposed changes

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

/v1/models exposes models that are not actually loaded #414

Description

Proposed changes

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions