🤗 HuggingFace | 🤖 ModelScope | 🖥️ Demo | 📄 Technical Report
FireRed-Image-Edit is a general-purpose image editing model that delivers high-fidelity and consistent editing across a wide range of scenarios.
- Strong Editing Performance: FireRed-Image-Edit delivers leading open-source results with accurate instruction following, high image quality, and consistent visual coherence.
- Native Editing Capability from T2I Backbone: Built upon an open-source text-to-image foundation model , we introduce editing ability through a full pipeline of Pretrain, SFT, and RL. This training paradigm is backbone-agnostic and can be applied to other T2I models(currently based on Qwen-Image for better community support). We will progressively open-source additional features, including self-developed FireRed T2I foundation model later.
- Text Style Preservation: Maintains text styles with high fidelity, achieving performance comparable to closed-source solutions.
- Photo Restoration: High-quality old photo restoration and enhancement.
- Multi-Image Editing: Flexible editing of multiple images such as virtual try-on.
- 2026.03.01: We offer a lightweight inference script (including quantization, db_cache, and static compilation), now requiring only 30GB VRAM and ~20s/sample. 🚀 Try it by simply running
python inference.py --optimized True. - 2026.02.28: We released the Train, supporting HSDP/FSDP, Disaggregated Training, and Multi-Condition Aware Bucket Sampler.
- 2026.02.27: We released the Agent module for instruction rewriting, multi-image preprocessing, supporting automatic ROI detection, image stitching for editing with more than 3 images.
- 2026.02.27: We provided FireRed-Image-Edit-1.0-ComfyUI workflow. Check more details on Huggingface
- 2026.02.14: We released FireRed-Image-Edit-1.0 model weights. Check more details on Huggingface and ModelScope.
- 2026.02.10: We released the Technical Report of FireRed-Image-Edit-1.0.
FireRed-Image-Edit establishes a new state-of-the-art among open-source models on Imgedit, Gedit, and RedEdit, while surpassing our closed-source competitors in specific dimensions—a distinction further corroborated by human evaluations highlighting its superior prompt following and visual consistency.
Some real outputs produced by FireRed-Image-Edit across general editing.
- Creative scenarios
- Text scenarios:
- Tryon scenarios:
| Models | Task | Description | Download Link |
|---|---|---|---|
| FireRed-Image-Edit-1.0 | Image-Editing | General-purpose image editing model | 🤗 HuggingFace 🤖 ModelScope |
| FireRed-Image-Edit-1.0-Distilled | Image-Editing | Distilled version of FireRed-Image-Edit-1.0 for faster inference | To be released |
| FireRed-Image | Text-to-Image | High-quality text-to-image generation model | To be released |
- Install dependencies
pip install -r requirements.txt- Use the following code snippets to generate or edit images.
python inference.py \
--input_image ./examples/edit_example.png \
--prompt "在书本封面Python的下方,添加一行英文文字2nd Edition" \
--output_image output_edit.png \
--seed 43
The Agent module provides Recaption & Multi-Image Preprocessing capabilities.
FireRed-Image-Edit natively supports 1–3 input images. When users need to edit with more than 3 images, the built-in Agent module automatically:
- ROI Detection – Sends all images + the user instruction to a Gemini function-calling model that returns a bounding-box for the most relevant region in each image.
- Crop & Stitch – Crops each image to its ROI, then partitions and stitches them into 2–3 composite images (≈1024×1024 each) while minimising whitespace and preserving content at maximum resolution.
- Recaption – Rewrites the user instruction so that image references (图1/图2/image N …) correctly point to the new composite images, and expands the prompt to ~512 words/characters for richer editing context. The user's original language is preserved.
(Optional) To enable the Recaption feature (rewriting instructions via Gemini for better editing results), set your Gemini API key:
export GEMINI_API_KEY="your-gemini-api-key"Note: The Gemini API is not required. Without it, the Agent will still perform ROI detection and image stitching normally, but will skip the instruction rewriting step. Setting a Gemini API key is recommended for best results.
agent/
├── __init__.py # Package entry – exports AgentPipeline
├── config.py # Configuration (API keys, stitch parameters, etc.)
├── gemini_agent.py # Gemini function-calling for ROI detection
├── image_tools.py # Image tools: crop, resize, stitch, partition
├── recaption.py # Instruction rewriting & expansion via Gemini
└── pipeline.py # End-to-end orchestration pipeline
Training is a two-step process:
- Extract VLM embeddings — Run offline extraction on your image–text JSONL.
- SFT training — Train on the extracted embeddings (HSDP/FSDP, multi-node supported).
→ Full details: train/README.md (data format, environment, commands).
To better validate the capabilities of our model, we propose a benchmark called REDEdit-Bench. Our main goal is to build more diverse scenarios and editing instructions that better align with human language, enabling a more comprehensive evaluation of current editing models. We collected over 3,000 images from the internet, and after careful expert-designed selection, we constructed 1,673 bilingual (Chinese–English) editing pairs across 15 categories.
We provide the inference and evaluation code for REDEdit-Bench. Please refer to the redbench_infer.py and redbench_eval.py scripts in the tools directory for more details.
The REDEdit-Bench dataset will be available soon.
| Model | Overall ↑ | Add | Adjust | Extract | Replace | Remove | BG | Style | Hybrid | Action |
|---|---|---|---|---|---|---|---|---|---|---|
| 🔹 Proprietary Models | ||||||||||
| Nano-Banana | 4.29 | 4.62 | 4.41 | 3.68 | 4.34 | 4.39 | 4.40 | 4.18 | 3.72 | 4.83 |
| Seedream4.0 | 4.30 | 4.33 | 4.38 | 3.89 | 4.65 | 4.57 | 4.35 | 4.22 | 3.71 | 4.61 |
| Seedream4.5 | 4.32 | 4.57 | 4.65 | 2.97 | 4.66 | 4.46 | 4.37 | 4.92 | 3.71 | 4.56 |
| Nano-Banana-Pro | 4.37 | 4.44 | 4.62 | 3.42 | 4.60 | 4.63 | 4.32 | 4.97 | 3.64 | 4.69 |
| 🔹 Open-source Models | ||||||||||
| FLUX.1 Kontext [Dev] | 3.71 | 3.99 | 3.88 | 2.19 | 4.27 | 3.13 | 3.98 | 4.51 | 3.23 | 4.18 |
| Step1X-Edit-v1.2 | 3.95 | 3.91 | 4.04 | 2.68 | 4.48 | 4.26 | 3.90 | 4.82 | 3.23 | 4.22 |
| Qwen-Image-Edit-2509 | 4.31 | 4.34 | 4.27 | 3.42 | 4.73 | 4.36 | 4.37 | 4.91 | 3.56 | 4.80 |
| FLUX.2 [Dev] | 4.35 | 4.50 | 4.18 | 3.83 | 4.65 | 4.65 | 4.31 | 4.88 | 3.46 | 4.70 |
| LongCat-Image-Edit | 4.45 | 4.44 | 4.53 | 3.83 | 4.80 | 4.60 | 4.33 | 4.92 | 3.75 | 4.82 |
| Qwen-Image-Edit-2511 | 4.51 | 4.54 | 4.57 | 4.13 | 4.70 | 4.46 | 4.36 | 4.89 | 4.16 | 4.81 |
| FireRed-Image-Edit | 4.56 | 4.55 | 4.66 | 4.34 | 4.75 | 4.58 | 4.45 | 4.97 | 4.07 | 4.71 |
| Model | G_SC ↑ (EN) | G_PQ ↑ (EN) | G_O ↑ (EN) | G_SC ↑ (CN) | G_PQ ↑ (CN) | G_O ↑ (CN) |
|---|---|---|---|---|---|---|
| 🔹 Proprietary Models | ||||||
| Nano-Banana | 7.396 | 8.454 | 7.291 | 7.540 | 8.424 | 7.399 |
| Seedream4.0 | 8.143 | 8.124 | 7.701 | 8.159 | 8.074 | 7.692 |
| Nano-Banana-Pro | 8.102 | 8.344 | 7.738 | 8.135 | 8.306 | 7.799 |
| Seedream4.5 | 8.268 | 8.167 | 7.820 | 8.254 | 8.167 | 7.800 |
| 🔹 Open-source Models | ||||||
| FLUX.2 [Dev] | 7.835 | 8.064 | 7.413 | 7.697 | 8.046 | 7.278 |
| Qwen-Image-Edit-2509 | 7.974 | 7.714 | 7.480 | 7.988 | 7.679 | 7.467 |
| Step1X-Edit-v1.2 | 7.974 | 7.714 | 7.480 | 7.988 | 7.679 | 7.467 |
| Longcat-Image-Edit | 8.128 | 8.177 | 7.748 | 8.141 | 8.117 | 7.731 |
| Qwen-Image-Edit-2511 | 8.297 | 8.202 | 7.877 | 8.252 | 8.134 | 7.819 |
| FireRed-Image-Edit | 8.363 | 8.245 | 7.943 | 8.287 | 8.227 | 7.887 |
| Model | Overall | Add | Adjust | BG | Beauty | Color | Compose | Extract | Portrait | Low-level | Motion | Remove | Replace | Stylize | Text | Viewpoint |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 🔹 Proprietary Models | ||||||||||||||||
| Seedream4.0 | 4.15 | 4.55 | 4.11 | 4.61 | 3.83 | 4.14 | 4.16 | 2.48 | 4.77 | 4.17 | 4.68 | 4.02 | 4.53 | 4.94 | 3.94 | 3.29 |
| Seedream4.5 | 4.18 | 4.58 | 4.09 | 4.57 | 3.97 | 4.12 | 4.05 | 2.56 | 4.80 | 3.99 | 4.78 | 4.12 | 4.53 | 4.94 | 4.07 | 3.53 |
| Nano-Banana | 4.13 | 4.66 | 4.26 | 4.63 | 4.37 | 4.13 | 3.94 | 3.17 | 4.83 | 4.05 | 4.75 | 4.07 | 4.74 | 3.63 | 3.69 | 3.09 |
| Nano-Banana-Pro | 4.48 | 4.66 | 4.41 | 4.58 | 4.35 | 4.58 | 4.36 | 3.42 | 4.86 | 4.46 | 4.91 | 4.54 | 4.79 | 4.85 | 4.69 | 3.75 |
| 🔹 Open-source Models | ||||||||||||||||
| Qwen-Image-Edit-2509 | 4.00 | 4.45 | 4.04 | 4.48 | 3.36 | 4.20 | 3.92 | 2.64 | 4.16 | 3.52 | 4.66 | 4.27 | 4.66 | 4.81 | 3.53 | 3.32 |
| FLUX.2 [Dev] | 4.05 | 4.31 | 3.88 | 4.57 | 3.80 | 3.91 | 3.85 | 2.47 | 4.50 | 4.43 | 4.68 | 3.50 | 4.47 | 4.95 | 3.53 | 3.88 |
| Longcat-Image-Edit | 4.12 | 4.34 | 4.25 | 4.54 | 3.72 | 4.12 | 3.92 | 2.48 | 4.49 | 4.31 | 4.67 | 4.27 | 4.61 | 4.94 | 3.83 | 3.30 |
| Qwen-Image-Edit-2511 | 4.18 | 4.50 | 4.23 | 4.52 | 3.61 | 4.09 | 4.00 | 3.22 | 4.31 | 4.19 | 4.66 | 4.41 | 4.68 | 4.83 | 4.08 | 3.51 |
| FireRed-Image-Edit | 4.33 | 4.57 | 4.37 | 4.64 | 3.69 | 4.45 | 4.29 | 3.49 | 4.50 | 4.56 | 4.65 | 4.47 | 4.81 | 4.93 | 4.49 | 3.14 |
| Model | Overall | Add | Adjust | BG | Beauty | Color | Compose | Extract | Portrait | Low-level | Motion | Remove | Replace | Stylize | Text | Viewpoint |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 🔹 Proprietary Models | ||||||||||||||||
| Nano-Banana | 4.15 | 4.65 | 4.23 | 4.60 | 4.37 | 4.08 | 3.98 | 3.39 | 4.72 | 4.03 | 4.63 | 4.07 | 4.68 | 3.68 | 3.87 | 3.23 |
| Seedream4.0 | 4.18 | 4.59 | 4.12 | 4.63 | 3.89 | 4.10 | 4.14 | 2.28 | 4.77 | 4.12 | 4.73 | 4.23 | 4.56 | 4.98 | 4.21 | 3.42 |
| Seedream4.5 | 4.20 | 4.66 | 4.08 | 4.64 | 4.12 | 4.07 | 4.10 | 2.23 | 4.74 | 4.28 | 4.75 | 4.24 | 4.58 | 4.97 | 4.20 | 3.44 |
| Nano-Banana-Pro | 4.42 | 4.72 | 4.40 | 4.64 | 4.37 | 4.43 | 4.32 | 3.25 | 4.82 | 4.36 | 4.85 | 4.52 | 4.75 | 4.90 | 4.54 | 3.51 |
| 🔹 Open-source Models | ||||||||||||||||
| Qwen-Image-Edit-2509 | 3.99 | 4.47 | 4.06 | 4.49 | 3.13 | 3.98 | 3.85 | 2.91 | 4.30 | 3.71 | 4.58 | 4.40 | 4.67 | 4.77 | 3.77 | 2.85 |
| FLUX.2 [Dev] | 4.07 | 4.37 | 3.96 | 4.47 | 3.72 | 3.86 | 3.87 | 2.36 | 4.44 | 4.45 | 4.67 | 4.02 | 4.48 | 4.87 | 3.80 | 3.84 |
| LongCat-Image-Edit | 4.12 | 4.38 | 4.04 | 4.49 | 3.89 | 4.10 | 3.93 | 2.98 | 4.47 | 4.27 | 4.69 | 4.24 | 4.51 | 4.86 | 3.83 | 3.25 |
| Qwen-Image-Edit-2511 | 4.23 | 4.55 | 4.17 | 4.56 | 3.49 | 4.07 | 4.07 | 3.54 | 4.42 | 4.52 | 4.72 | 4.42 | 4.65 | 4.85 | 4.06 | 3.38 |
| FireRed-Image-Edit | 4.26 | 4.41 | 4.33 | 4.60 | 3.55 | 4.47 | 4.25 | 3.49 | 4.50 | 4.44 | 4.65 | 4.46 | 4.70 | 4.94 | 4.44 | 2.78 |
The code and the weights of FireRed-Image-Edit are licensed under Apache 2.0.
- Release FireRed-Image-Edit-1.0 model.
- Release REDEdit-Bench, a comprehensive benchmark for image editing evaluation.
- Release FireRed-Image-Edit-1.0-Distilled model, a distilled version of FireRed-Image-Edit-1.0 for few-step generation.
- Release FireRed-Image model, a text-to-image generative model.
We kindly encourage citation of our work if you find it useful.
@article{firered2026rededit,
title={FireRed-Image-Edit: A General-Purpose Image Editing Model},
author={Super Intelligence Team},
year={2026},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/xxxx.xxxxx},
}FireRed-Image-Edit has not been specifically designed or comprehensively evaluated for every possible downstream application. Users should be aware of the potential risks and ethical considerations when using this project, and should use it responsibly and in compliance with all applicable laws and regulations.
- Prohibited Use: This project must not be used to generate content that is illegal, defamatory, pornographic, harmful, or that violates the privacy, rights, or interests of individuals or organizations.
- User Responsibility: Users are solely responsible for any content generated using this project. The authors and contributors assume no responsibility or liability for any misuse of the codebase or for any consequences resulting from its use.
We would like to thank the developers of the amazing open-source projects, especially Qwen-Image for providing a powerful text-to-image foundation model, as well as Diffusers and HuggingFace.
Please contact us and join our Xiaohongshu Group if you have any questions.









