Skip to content

Multimodal: Add Support for Image, PDF & Mixed Content for AI Assessment #636

@vprashrex

Description

@vprashrex

Is your feature request related to a problem? Please describe.
The current llm/call endpoint only supports text and audio input types. However, the AI Assessment module from InquiLab requires support for images and PDFs (including multiple files) as well as mixed-content understanding (text + image/PDF). This limitation prevents the existing LLM integration from handling the required multimodal assessment workflows.

Describe the solution you'd like

  • Extend the existing providers openai and google to support both images and pdf.
  • The input field from QueryParamsshould accept both a single dict or a list of dicts, allowing mixed content in a single request.
  • input: dict | list[dict]
  • single image input
{
  "input": {
    "type": "image",
    "content": {
      "format": "url",
      "value": "public_url"
    }
  }
}
  • mixed content image and text
{
  "input": [
    {
      "type": "image",
      "content": {
        "format": "url",
        "value": "public_url"
      }
    },
    {
      "type": "text",
      "content": {
        "format": "text",
        "value": "What is in the image?"
      }
    }
  ]
}

Additional context
The image and pdf content should support both base64 and public url.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

Status

Closed

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions