DreamBooth Stylized Image Generation Service

Overview

This project implements a service for generating stylized images using the DreamBooth approach. By fine-tuning a pre-trained text-to-image diffusion model on a small set of subject images (typically 3–5), the system binds a unique identifier to the subject. The service then leverages FastAPI for a robust backend and Gradio to provide an interactive user interface for generating novel, stylized images based on user-supplied references.

Key Features:

Personalized Generation: Fine-tunes a diffusion model using a few reference images to capture subject identity.
Interactive UI: Uses Gradio to let users upload images and specify text prompts for stylization.
State-of-the-Art Methodology: Inspired by the DreamBooth approach, which incorporates a class-specific prior preservation loss to maintain subject fidelity while generating diverse outputs.
Multi-Animal Support: Currently supports fine-tuning and generation for multiple animal types (dogs and ducks).
Flexible Prediction Pipeline: Robust inference system with configurable parameters and automatic GPU selection.

Methodology

DreamBooth Approach

The core idea behind DreamBooth is to "implant" a subject into a text-to-image diffusion model using a few images. Key highlights include:

Unique Identifier Binding: A rare token (or unique identifier) is attached to the subject, enabling the model to generate the subject in a variety of contexts.
Fine-Tuning: The model is fine-tuned with both subject images and corresponding prompts (e.g., "a [V] dog"), leveraging a class-specific prior preservation loss to prevent overfitting and language drift.
Applications: This method allows for subject recontextualization, text-guided view synthesis, and artistic rendering—paving the way for creative applications like stylized image generation.

For more details, refer to the paper:
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation by Nataniel Ruiz et al. (Project Page).

Technical Implementation

Backend: Built using FastAPI to serve REST endpoints for image generation.
User Interface: Gradio is integrated for interactive testing—users can upload five reference images and provide text prompts to generate stylized images.
Modular Code Structure: The project is divided into modules and functions to enhance readability and maintainability.
Dependency Management: We use Poetry along with a requirements.txt file for managing dependencies and ensuring reproducibility.

Project Structure

GAI_course_project/
├── configs/              # Configuration files (e.g., YAML, JSON settings)
├── data/                 # Datasets and data files (raw, processed, etc.)
├── deploy/               # Deployment scripts and container configurations
├── images/               # Visual assets such as logos and diagrams
├── notebooks/            # Jupyter notebooks for experiments and analysis
├── paper/                # Research paper files and related documentation
├── scripts/              # Utility and automation scripts
├── src/                  # Source code for the project (modules, functions, etc.)
├── LICENSE               # License file
├── pyproject.toml        # Project configuration file for Poetry
├── poetry.lock           # Lock file for dependency management with Poetry
├── README.md             # Project overview and setup instructions
└── requirements.txt      # Additional dependency list

Setup and Installation

Prerequisites

Python >=3.11
Poetry for dependency management
Additional libraries as listed in requirements.txt

Installation Steps

Clone the Repository:

git clone https://github.com/IVproger/GAI_course_project.git
cd GAI_course_project

Install Dependencies:

# Option 1: Using env activate (recommended)
poetry install
poetry env use python3.11
poetry env activate

# Option 2: Using shell plugin
poetry plugin add poetry-shell-plugin
poetry install
poetry shell

OR

python -m venv venv
source venv/bin/activate   # For Windows: venv\Scripts\activate
pip install --upgrade pip
pip install -r requirements.txt

Usage

Model Training:
- The system supports training models for different animal types (currently dogs and ducks)
- Training configurations are stored in configs/training/ directory
- Each animal type has its specific configuration file

Image Generation:

from src.predict import predict
from src.enums import AnimalType

# Generate an image of a dog
output_path = predict("a dog in space suit on the moon", AnimalType.DOG)

# Generate an image of a duck
output_path = predict("a duck swimming in a pond", AnimalType.DUCK)

Configuration:
- Inference configurations are stored in configs/inference/ directory
- Each animal type has its specific configuration file (e.g., dog.yaml, duck.yaml)
- Configurations include model paths, generation parameters, and output settings
Output:
- Generated images are saved in the configured output directory
- Filenames include the animal type and timestamp for easy tracking

Contributors

Ivan Golov (i.golov@innopolis.university)
Roman Makeev
Maxim Martyshov

Proof-of-Concept (POC)

This stage represents the initial proof-of-concept where the DreamBooth fine-tuning process was applied. In this phase, the model was fine-tuned on the provided reference images ( Corgi dog from DreamBooth ref dataset) to capture the unique subject characteristics, resulting in early experimental outputs.

Source image:

Tuning Output Samples

The images below are sample outputs obtained after tuning:

Target_generated_image

Prompt:
"a xon dog"

Styled_generated_image №1

Prompt:
"a xon dog in beautifyl landscape with river, forest and mountines"

Styled_generated_image №2

Prompt:
"a xon dog in astronaut costume against moon and stars. The xon dog stands proudly on the rocky lunar surface, with its paw slightly raised as if exploring"

Styled_generated_image №3

Prompt:
"'a xon dog in cool sunglasses sitting in the sport car, smillings and have good time"

References

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz, Yuanzhen Li, Varun Jampani, et al.
Project Page
Additional literature on text-to-image diffusion models and generative adversarial networks.

License

This project is licensed under the MIT License.

Acknowledgements

We thank the course instructors, collaborators, and the open-source community for providing the tools and libraries that made this project possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamBooth Stylized Image Generation Service

Overview

Methodology

DreamBooth Approach

Technical Implementation

Project Structure

Setup and Installation

Prerequisites

Installation Steps

Usage

Contributors

Proof-of-Concept (POC)

Source image:

Tuning Output Samples

Target_generated_image

Styled_generated_image №1

Styled_generated_image №2

Styled_generated_image №3

References

License

Acknowledgements

About

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 64 Commits
configs		configs
data		data
deploy		deploy
images		images
inference_outputs		inference_outputs
notebooks		notebooks
outputs		outputs
paper		paper
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

License

IVproger/GAI_course_project

Folders and files

Latest commit

History

Repository files navigation

DreamBooth Stylized Image Generation Service

Overview

Methodology

DreamBooth Approach

Technical Implementation

Project Structure

Setup and Installation

Prerequisites

Installation Steps

Usage

Contributors

Proof-of-Concept (POC)

Source image:

Tuning Output Samples

Target_generated_image

Styled_generated_image №1

Styled_generated_image №2

Styled_generated_image №3

References

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 3

Uh oh!

Languages