This project implements a service for generating stylized images using the DreamBooth approach. By fine-tuning a pre-trained text-to-image diffusion model on a small set of subject images (typically 3–5), the system binds a unique identifier to the subject. The service then leverages FastAPI for a robust backend and Gradio to provide an interactive user interface for generating novel, stylized images based on user-supplied references.
Key Features:
- Personalized Generation: Fine-tunes a diffusion model using a few reference images to capture subject identity.
- Interactive UI: Uses Gradio to let users upload images and specify text prompts for stylization.
- State-of-the-Art Methodology: Inspired by the DreamBooth approach, which incorporates a class-specific prior preservation loss to maintain subject fidelity while generating diverse outputs.
- Multi-Animal Support: Currently supports fine-tuning and generation for multiple animal types (dogs and ducks).
- Flexible Prediction Pipeline: Robust inference system with configurable parameters and automatic GPU selection.
The core idea behind DreamBooth is to "implant" a subject into a text-to-image diffusion model using a few images. Key highlights include:
- Unique Identifier Binding: A rare token (or unique identifier) is attached to the subject, enabling the model to generate the subject in a variety of contexts.
- Fine-Tuning: The model is fine-tuned with both subject images and corresponding prompts (e.g., "a [V] dog"), leveraging a class-specific prior preservation loss to prevent overfitting and language drift.
- Applications: This method allows for subject recontextualization, text-guided view synthesis, and artistic rendering—paving the way for creative applications like stylized image generation.
For more details, refer to the paper:
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation by Nataniel Ruiz et al. (Project Page).
- Backend: Built using FastAPI to serve REST endpoints for image generation.
- User Interface: Gradio is integrated for interactive testing—users can upload five reference images and provide text prompts to generate stylized images.
- Modular Code Structure: The project is divided into modules and functions to enhance readability and maintainability.
- Dependency Management: We use Poetry along with a
requirements.txtfile for managing dependencies and ensuring reproducibility.
GAI_course_project/
├── configs/ # Configuration files (e.g., YAML, JSON settings)
├── data/ # Datasets and data files (raw, processed, etc.)
├── deploy/ # Deployment scripts and container configurations
├── images/ # Visual assets such as logos and diagrams
├── notebooks/ # Jupyter notebooks for experiments and analysis
├── paper/ # Research paper files and related documentation
├── scripts/ # Utility and automation scripts
├── src/ # Source code for the project (modules, functions, etc.)
├── LICENSE # License file
├── pyproject.toml # Project configuration file for Poetry
├── poetry.lock # Lock file for dependency management with Poetry
├── README.md # Project overview and setup instructions
└── requirements.txt # Additional dependency list
- Python >=3.11
- Poetry for dependency management
- Additional libraries as listed in
requirements.txt
- Clone the Repository:
git clone https://github.com/IVproger/GAI_course_project.git cd GAI_course_project - Install Dependencies:
OR
# Option 1: Using env activate (recommended) poetry install poetry env use python3.11 poetry env activate # Option 2: Using shell plugin poetry plugin add poetry-shell-plugin poetry install poetry shell
python -m venv venv source venv/bin/activate # For Windows: venv\Scripts\activate pip install --upgrade pip pip install -r requirements.txt
-
Model Training:
- The system supports training models for different animal types (currently dogs and ducks)
- Training configurations are stored in
configs/training/directory - Each animal type has its specific configuration file
-
Image Generation:
from src.predict import predict from src.enums import AnimalType # Generate an image of a dog output_path = predict("a dog in space suit on the moon", AnimalType.DOG) # Generate an image of a duck output_path = predict("a duck swimming in a pond", AnimalType.DUCK)
-
Configuration:
- Inference configurations are stored in
configs/inference/directory - Each animal type has its specific configuration file (e.g.,
dog.yaml,duck.yaml) - Configurations include model paths, generation parameters, and output settings
- Inference configurations are stored in
-
Output:
- Generated images are saved in the configured output directory
- Filenames include the animal type and timestamp for easy tracking
- Ivan Golov (i.golov@innopolis.university)
- Roman Makeev
- Maxim Martyshov
This stage represents the initial proof-of-concept where the DreamBooth fine-tuning process was applied. In this phase, the model was fine-tuned on the provided reference images ( Corgi dog from DreamBooth ref dataset) to capture the unique subject characteristics, resulting in early experimental outputs.
The images below are sample outputs obtained after tuning:
Prompt:
"a xon dog in beautifyl landscape with river, forest and mountines"

Prompt:
"a xon dog in astronaut costume against moon and stars. The xon dog stands proudly on the rocky lunar surface, with its paw slightly raised as if exploring"

Prompt:
"'a xon dog in cool sunglasses sitting in the sport car, smillings and have good time"

- DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, et al.
Project Page - Additional literature on text-to-image diffusion models and generative adversarial networks.
This project is licensed under the MIT License.
We thank the course instructors, collaborators, and the open-source community for providing the tools and libraries that made this project possible.


