Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment

**Is your feature request related to a problem? Please describe.**
The current llm/call endpoint only supports text and audio input types. However, the AI Assessment module from InquiLab requires support for images and PDFs (including multiple files) as well as mixed-content understanding (text + image/PDF). This limitation prevents the existing LLM integration from handling the required multimodal assessment workflows.

**Describe the solution you'd like**
- Extend the existing providers `openai` and `google` to support both images and pdf.
- The input field from `QueryParams`should accept both a single dict or a list of dicts, allowing mixed content in a single request.
- input: dict | list[dict]
- single image input
```
{
  "input": {
    "type": "image",
    "content": {
      "format": "url",
      "value": "public_url"
    }
  }
}
```
- mixed content image and text
```
{
  "input": [
    {
      "type": "image",
      "content": {
        "format": "url",
        "value": "public_url"
      }
    },
    {
      "type": "text",
      "content": {
        "format": "text",
        "value": "What is in the image?"
      }
    }
  ]
}
```

**Additional context**
The image and pdf content should support both base64 and public url.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment #636

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment #636

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions