[Bug] Multimodal query fails to recognize image data from screenshots on the clipboard

**Description:**

**Problem:**
The application's multimodal functionality fails to work with images captured via screenshot tools (e.g., Windows Snipping Tool, `Win+Shift+S`). While a user can paste the screenshot image into other applications like Paint or Discord, Blink does not detect the image data on the clipboard. The feature only works if the user manually saves the screenshot as a file and then copies the file itself.

**Steps to Reproduce:**
1.  Use the Windows Snipping Tool (`Win+Shift+S`) to capture a portion of the screen. A notification confirms the snip has been copied to the clipboard.
2.  Verify the image is on the clipboard by pasting it into an application like MS Paint.
3.  In a text editor, type and select an instruction (e.g., "What is in this image?").
4.  Press the multimodal hotkey (`Ctrl+Alt+/`).

**Expected Behavior:**
Blink should detect the raw image data on the clipboard, combine it with the selected text prompt, and send both to the configured multimodal LLM for processing.

**Actual Behavior:**
The query fails silently or the application returns an error notification like "Unsupported clipboard content" or "Clipboard is empty." The multimodal feature is not triggered.

**Root Cause Analysis:**
This is not a bug in the file-handling logic, but a missing feature in the clipboard inspection module.
*   When a user copies a **file**, the clipboard receives data in a special format (`CF_HDROP` on Windows) which contains a list of file **paths**. Our current `clipboard_manager.py` is correctly designed to handle this.
*   When a user takes a **screenshot**, the clipboard receives raw image data, typically in a bitmap format (`CF_DIB`). Our `clipboard_manager.py` currently has no logic to detect or handle this data type. It only looks for file paths or plain text.

**Proposed Solution:**
The `clipboard_manager.py` and the `hotkey_manager.py` workflow must be enhanced to handle raw image data.

1.  **Add Dependency:** The `Pillow` library is required for this. Add `Pillow` to `requirements.txt`.
2.  **Refactor `clipboard_manager.py`:**
    *   Modify the function that inspects the clipboard (e.g., `get_clipboard_contents`).
    *   It must now check for clipboard content in a specific order of priority:
        1.  **Check for an Image First:** Use `PIL.ImageGrab.grabclipboard()` to attempt to get image data. If it returns a valid `Image` object, the clipboard contains an image.
        2.  **Then Check for Files:** If no image is found, proceed with the existing `win32clipboard` logic to check for file paths (`CF_HDROP`).
        3.  **Finally, Check for Text:** If neither of the above is found, fall back to checking for plain text.
    *   The function should return a structured object that can handle different data types, for example: `{"type": "image_data", "content": <Pillow Image Object>}` or `{"type": "file_list", "content": ["C:\\path..."]}`.

3.  **Refactor `hotkey_manager.py`:**
    *   The `process_clipboard_context()` method must be updated to handle the new `image_data` type from the clipboard manager.
    *   If it receives `image_data`, it needs to:
        a. Convert the Pillow `Image` object into an in-memory byte stream (e.g., using `io.BytesIO`).
        b. **Base64 encode** this byte stream.
        c. Pass this Base64 string to the `llm_interface` as part of the multimodal payload, just as it would for an image read from a file.

**Acceptance Criteria:**
- [ ] After taking a screenshot with `Win+Shift+S`, using the multimodal hotkey successfully sends the image to the LLM.
- [ ] Copying an image *file* from Windows Explorer still works as expected (no regressions).
- [ ] Copying plain text and using the clipboard context feature still works as expected.
- [ ] The system correctly prioritizes image data over any old file paths that might also be lingering on the clipboard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Multimodal query fails to recognize image data from screenshots on the clipboard #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[Bug] Multimodal query fails to recognize image data from screenshots on the clipboard #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions