A fully local, web-based visual search engine for fabric design thumbnails. It uses OpenAI's CLIP for multimodal embeddings and FAISS for fast similarity search. You can search by text (e.g., "red floral fabric pattern") or by drag-and-drop image, and get the top 10 most similar thumbnails with their corresponding TIF paths and similarity scores.
- Features
- Tech Stack
- Project Structure
- Prerequisites
- Installation
- Configuration
- Building the Index
- Adding New Images Periodically
- Time filter (design age)
- Running the Application
- Web Interface
- API Reference
- Troubleshooting
- Optimization & Scaling
- License & Credits
-
Dual query modes
- Text search: Natural-language queries (e.g., "blue geometric stripes", "floral print"). Thumbnail filenames are treated as searchable metadata: if the query terms appear in the file name (e.g.
dupatta_floral.jpg,3inch_border_saree.png), those results are ranked at the top even when CLIP visual similarity is lower. - Image search: Drag-and-drop or file-pick an image to find visually similar thumbnails.
- Text search: Natural-language queries (e.g., "blue geometric stripes", "floral print"). Thumbnail filenames are treated as searchable metadata: if the query terms appear in the file name (e.g.
-
Fully local
- No cloud APIs; CLIP model and FAISS index run on your machine. Internet needed only for initial setup (downloading CLIP weights and dependencies).
-
Efficient search
- One-time indexing: all thumbnails are embedded with CLIP at index-build time.
- FAISS provides fast approximate/exact nearest-neighbor search on embeddings.
-
Flexible image input
- Handles mixed JPG/PNG thumbnails with non-uniform dimensions; CLIP preprocessing (resize/normalize) is applied automatically.
-
Clear results
- Top 10 matches with similarity scores, thumbnail previews, and paths to the original TIF files.
-
Cross-platform
- Python backend (Flask), HTML/CSS/JS frontend. Works on Linux, macOS, and Windows.
| Component | Technology |
|---|---|
| Embeddings | OpenAI CLIP (ViT-B/32 by default) |
| Vector search | FAISS (CPU; faiss-cpu) |
| Image I/O | Pillow (PIL) |
| Backend | Flask 3.x, Python 3.10+ |
| Frontend | HTML5, CSS3, vanilla JavaScript |
| Deep learning | PyTorch, torchvision |
All dependencies are open-source. Optional GPU support via PyTorch CUDA for faster indexing and querying.
TextileSearchApp/
├── README.md # This file
├── requirements.txt # Python dependencies
├── config.yaml # Main configuration (paths, index, server)
├── config.example.yaml # Template with comments
├── backend/
│ ├── app.py # Flask web app & API
│ ├── config_loader.py # Loads config.yaml + env overrides
│ ├── search_engine.py # CLIP + FAISS search engine
│ ├── indexer.py # CLI: full index build (one-time or rebuild)
│ ├── incremental_indexer.py # CLI: add new thumbnails to existing index
│ ├── download_clip_model.py # CLI: download CLIP model with retries
│ ├── data/ # Generated at index time (create if missing)
│ │ ├── faiss_index.bin # FAISS index
│ │ └── metadata.npy # Thumbnail ↔ TIF path mapping
│ └── uploads/ # Unused; image search uses a temp file (deleted after each query)
├── static/
│ └── app.js # Frontend logic (search, drag-drop, results)
└── templates/
└── index.html # Main UI
- Full indexing: Run
indexer.pyonce to build the index from all thumbnails. It writesbackend/data/faiss_index.binandbackend/data/metadata.npy. - Incremental indexing: Run
incremental_indexer.pyperiodically to add only new thumbnails without rebuilding the full index (see Adding New Images Periodically). - Serving:
app.pyloads the index at startup and serves the UI and API. Thumbnails are served from your configured thumbnails directory.
- Python: 3.10 or newer
- Disk: Enough space for PyTorch, CLIP, and the FAISS index (roughly 2–4 GB for the stack; index size depends on number of thumbnails)
- RAM: 4 GB minimum; 8 GB+ recommended for larger datasets
- Optional: NVIDIA GPU + CUDA for faster CLIP inference (use
device="cuda"when building/running)
Ensure you have the TextileSearchApp directory with backend/, static/, templates/, and requirements.txt.
cd TextileSearchApp
python3 -m venv .venv- Linux / macOS:
source .venv/bin/activate - Windows (PowerShell):
.venv\Scripts\Activate.ps1
pip install -r requirements.txtThis installs Flask, PyTorch, torchvision, FAISS (CPU), Pillow, NumPy, tqdm, and OpenAI CLIP from GitHub. The first run may download CLIP model weights; after that, the app can run offline.
mkdir -p backend/data backend/uploadsConfiguration uses a config file for defaults and environment variables for overrides—a common industry practice (e.g. 12-factor app).
Edit config.yaml in the project root (or copy from config.example.yaml). This is the single source of default values for both the web app and the indexer.
paths:
thumbnails_dir: "/path/to/thumbnails" # Your thumbnail images (JPG/PNG)
tifs_dir: "/path/to/tifs" # Optional; if empty, thumbnail filename is shown in results
index:
index_path: "backend/data/faiss_index.bin"
metadata_path: "backend/data/metadata.npy"
upload_folder: "backend/uploads"
server:
host: "0.0.0.0"
port: 8000- Thumbnails directory: All JPG/PNG thumbnails; scanned recursively. If tifs_dir is set, each thumbnail is mapped to a TIF by filename stem (e.g.
design_001.png→design_001.tif). If tifs_dir is empty, the thumbnail filename is shown in results instead. - Paths under
indexcan be relative to the project root or absolute. - If your naming or folder layout differs, adjust
_thumbnail_to_tif()inbackend/search_engine.py.
Any value can be overridden by environment variables (useful for deployment or different machines without editing the file):
| Variable | Overrides |
|---|---|
THUMBNAILS_DIR |
paths.thumbnails_dir |
TIFS_DIR |
paths.tifs_dir |
INDEX_PATH |
index.index_path |
METADATA_PATH |
index.metadata_path |
UPLOAD_FOLDER |
index.upload_folder |
CLIP_MODEL_PATH |
Local CLIP .pt file path |
CONFIG_PATH |
Path to a different config file |
SERVER_HOST / SERVER_PORT |
server.host / server.port |
Optional: put variables in a .env file in the project root; python-dotenv loads it automatically (do not commit secrets to .env if you add any later).
Config file (defaults) → .env (if present) → Environment variables (override). For the indexer, CLI arguments override config and env (e.g. --thumbnails_dir overrides THUMBNAILS_DIR and the config file).
Build the FAISS index and metadata before starting the web app. Re-run whenever you add, remove, or change thumbnails.
From the project root (TextileSearchApp/):
source .venv/bin/activate # or .venv\Scripts\Activate.ps1 on Windows
cd backendIf config.yaml has paths.thumbnails_dir and paths.tifs_dir set, you can run:
python indexer.pyOtherwise pass paths explicitly (CLI overrides config):
python indexer.py \
--thumbnails_dir "/path/to/thumbnails" \
--tifs_dir "/path/to/tifs"Indexer arguments (all optional when config is set):
| Argument | Default (from config) | Description |
|---|---|---|
--thumbnails_dir |
config paths.thumbnails_dir |
Root directory of thumbnail images (JPG/PNG). |
--tifs_dir |
config paths.tifs_dir |
Root directory of TIF files. |
--index_path |
config index.index_path |
Where to save the FAISS index. |
--metadata_path |
config index.metadata_path |
Where to save path metadata. |
--device |
auto (cuda if available) | cuda or cpu. |
--clip_model_path |
config / CLIP_MODEL_PATH |
Local CLIP .pt file. |
Example (CPU only):
python indexer.py --thumbnails_dir /data/thumbnails --tifs_dir /data/tifs --device cpuWhen finished, you should see data/faiss_index.bin and data/metadata.npy under backend/. The app reads the same paths from config.yaml.
When you add new fabric designs (new thumbnails and optionally new TIFs) to your folders, you can update the search index without rebuilding from scratch by using the incremental indexer. This keeps your faiss_index.bin and metadata.npy in sync with the latest designs.
Think of it like this:
indexer.py(full index) = build everything from zero.incremental_indexer.py(incremental) = only add the new pictures you just copied in.
You never edit faiss_index.bin manually; the scripts do it for you.
-
Copy new thumbnails
- Put new JPG/PNG thumbnails into the folder configured as
paths.thumbnails_dirinconfig.yaml.
Example (Linux / Windows WSL):paths: thumbnails_dir: "/mnt/c/Users/you/Designs/thumbs" tifs_dir: "/mnt/c/Users/you/Designs/tifs" # optional
- If you also store TIFs, copy the matching TIFs into
paths.tifs_dirusing the same stem:KD00256_2.png→KD00256_2.tif
- Put new JPG/PNG thumbnails into the folder configured as
-
Run the incremental indexer
From the project root (
TextileSearchApp/), with your virtual environment activated:# 1) Activate the virtual environment (Linux/macOS) source .venv/bin/activate # 2) Go to the backend folder cd backend # 3) Update the index with only the new thumbnails python incremental_indexer.py
On Windows PowerShell (if not using WSL), it looks like:
.venv\Scripts\Activate.ps1 cd backend python incremental_indexer.py
What this script does (in simple terms):
- Loads your existing
data/faiss_index.binanddata/metadata.npy. - Scans
paths.thumbnails_dirfor all thumbnail files. - Compares them to what is already in
metadata.npy. - For files that are new:
- Calculates their CLIP embeddings.
- Appends them to the FAISS index (
faiss_index.bin). - Adds entries to
metadata.npy(including thumbnail path, TIF path, and file date).
- Leaves already-indexed files unchanged.
If there are no new thumbnails, it prints a message and exits without modifying the index.
- Loads your existing
-
Restart the web app
The Flask app keeps a copy of the index in memory. To see the newly indexed designs:
# In backend/, where app.py lives python app.pyIf the app is already running, stop it with Ctrl+C in that terminal, then run
python app.pyagain.After restart:
- The new designs participate in text search (including filename-based matching).
- The new designs participate in image search.
- The time filter (Designs: Show all / Up to 1 week / etc.) will treat them as “new” based on their file date.
- Every time you add a batch of new thumbnails (e.g. once a day or once a week).
- It is safe to run even if there are no new files; it will simply do nothing.
To add new images automatically on a schedule (e.g. nightly), run the incremental indexer from cron (Linux/macOS) or Task Scheduler (Windows), then restart the app or use a process manager that reloads after the script runs.
Example cron job (Linux, run at 2 a.m. every day):
0 2 * * * cd /path/to/TextileSearchApp/backend && /path/to/.venv/bin/python incremental_indexer.py >> /path/to/TextileSearchApp/incremental.log 2>&1After this cron runs, make sure your app is restarted or configured to reload (for a simple setup, you can just restart python app.py manually each morning).
-
Full index (
indexer.py):- Use the first time you set up the project.
- Use when you move/rename many files, or drastically change the contents of
thumbnails_dirandtifs_dir. - Rebuilds the entire
faiss_index.binandmetadata.npyfrom what is currently on disk.
-
Incremental (
incremental_indexer.py):- Use when you add new designs but keep the old ones.
- Only processes the new thumbnails and appends them to the existing index.
- Much faster when you have thousands of existing images and only a few new ones.
You can restrict search results to designs by age using each file’s modification time at index time.
Next to Show X results, a Designs dropdown offers:
- Show all (default) — no time filter
- Up to 1 week — only designs whose file date (at index time) is within the last 7 days
- Up to 1 month, Up to 3 months, Up to 6 months, Up to 1 year — same idea for 30, 90, 180, and 365 days
Changing the dropdown re-runs the current search with the selected filter. Pagination and result count apply to the filtered set.
- At index time (full index or incremental), the indexer stores each thumbnail’s file modification time (
mtime) in the metadata. - When you choose a time range, the backend only returns results whose stored
mtimeis within that many days from “now”. - Existing indices built before this feature have no
mtimein metadata; they are treated as very old, so they appear in Show all but not in any “Up to X” filter. To use the time filter on an old index, rebuild the index once (python indexer.py) or run the incremental indexer (new entries will getmtime; old entries still won’t show in time-filtered results until you do a full rebuild).
From the project root:
source .venv/bin/activate
cd backend
python app.pyThe server starts at http://0.0.0.0:8000 (all interfaces). Open a browser to:
Ensure config.yaml has valid paths.thumbnails_dir (paths.tifs_dir is optional; if empty, results show the thumbnail filename). Build the index so data/faiss_index.bin and data/metadata.npy exist; otherwise the app will exit at startup. Host and port come from config.yaml or SERVER_HOST / SERVER_PORT.
- Text query: Type a phrase (e.g. "red floral fabric pattern" or "dupatta floral") and click Search by text or press Enter. Matches in thumbnail filenames (e.g.
dupatta_floral.jpg) are shown first; remaining results are ordered by CLIP similarity. - Image query: Drag and drop an image onto the drop zone, or click the zone to choose a file (JPG/PNG).
- Results: Matches are shown in a grid with thumbnail, similarity score, and TIF path. Use Show X results (10/20/50/100) and Designs: Show all | Up to 1 week | … | Up to 1 year to filter by design age (file date at index time). Pagination applies to the (possibly filtered) set.
- Clear: Resets the text box and results.
Thumbnails in the results are served by the Flask app from THUMBNAILS_DIR. If thumbnails do not show, check Troubleshooting (path/relative path handling).
Base URL: http://localhost:8000 (or your host/port).
- Endpoint:
POST /api/search/text - Headers:
Content-Type: application/json - Body:
{ "query": "red floral fabric pattern", "top_k": 10 } - Response (200):
Optional request fields:
{ "results": [ { "thumbnail_path": "/path/to/thumbnails/design_001.png", "thumbnail_url": "/thumbnails/design_001.png", "tif_path": "/path/to/tifs/design_001.tif", "score": 0.312 } ], "total": 1500 }offset(for pagination),top_k(1–500, default 10),max_age_days(optional; restrict to designs with file mtime within the last N days; omit or 0 for no filter). - Errors: 400 if
queryis missing/empty; 500 on server errors.
- Endpoint:
POST /api/search/image - Content-Type:
multipart/form-data - Fields:
image: image file (e.g. JPG/PNG)top_k: optional integer (default 10)offset: optional integer (for pagination)max_age_days: optional integer (restrict to designs with file mtime within the last N days; omit or 0 for no filter)
- Response: Same
resultsstructure as text search. - Errors: 400 if no file or unsupported type; 500 on server errors.
- Endpoint:
GET /thumbnails/<path:filename> - Serves files from
THUMBNAILS_DIR.filenameshould be the path relative toTHUMBNAILS_DIR(e.g. if thumbnail isTHUMBNAILS_DIR/abc/def.png, use/thumbnails/abc/def.png).
- Run the indexer first and use the same
--index_pathand--metadata_pathas inapp.py. - Ensure paths in
app.pypoint to existingfaiss_index.binandmetadata.npy(e.g.backend/data/).
- The frontend requests thumbnails via
/thumbnails/<path>. If metadata stores absolute paths, the frontend may be requesting a path the server doesn’t recognize. Store relative paths in metadata (relative toTHUMBNAILS_DIR) and usethumbnail_pathas the<path>in/thumbnails/. Adjustsearch_engine.pyinbuild_index()to saveos.path.relpath(p, self.thumbnails_dir)and ensureserve_thumbnailserves withsend_from_directory(THUMBNAILS_DIR, filename).
- To force CPU: in both
indexer.pyandapp.py, passdevice="cpu"when creatingClipSearchEngine.
- The first run downloads the CLIP model from the internet; firewalls, proxies, or unstable networks can cause
Connection reset by peerorURLError. - Fix 1 – Retry script: From
backend/run:It retries the download with a longer timeout; if it still fails, it prints manual download instructions.python download_clip_model.py
- Fix 2 – Manual download: Open this URL in a browser (or on another machine with stable internet), then save the file as
~/.cache/clip/ViT-B-32.pt(create~/.cache/clipif needed):Then run the indexer again; CLIP will use the cached file.https://openaipublic.azureedge.net/clip/models/40d365715913c9da98579312b702a82c18be219cc2a73407c4526f58eba950af/ViT-B-32.pt - Fix 3 – Local path: Download the
.ptfile anywhere, then run the indexer with:Or set the environment variable:python indexer.py ... --clip_model_path /path/to/ViT-B-32.pt
export CLIP_MODEL_PATH=/path/to/ViT-B-32.pt.
- If you pass paths in quotes, avoid a leading space (e.g. use
"/mnt/c/Users/..."not" /mnt/c/Users/..."). The indexer now strips whitespace from directory arguments.
- Activate the same venv and reinstall:
pip install -r requirements.txt. Use Python 3.10+.
- Use the same CLIP model (e.g.
ViT-B/32) for indexing and querying (default inClipSearchEngine). - Ensure images are valid (not corrupted); CLIP’s
preprocessresizes/normalizes automatically.
- Increase batch size in
search_engine.py(build_index(batch_size=128)or 256) if you have enough RAM. - Use GPU:
--device cudafor the indexer.
- Batch size: In
ClipSearchEngine.build_index(), increasebatch_size(e.g. 128–256) for faster indexing when RAM allows. - GPU: Set
device="cuda"in the indexer and inapp.pyfor faster embedding and search. - Very large corpora (e.g. >100k images): Consider switching from
IndexFlatIPto an approximate FAISS index (e.g.IndexIVFFlatorIndexHNSWFlat) for faster search at slight recall trade-off; keepMETRIC_INNER_PRODUCTfor normalized vectors. - Maintenance: For new images, run
incremental_indexer.pythen restart the app. For a full refresh, re-runindexer.py. Back updata/faiss_index.binanddata/metadata.npyif needed.
- CLIP: OpenAI CLIP (MIT-style).
- FAISS: Facebook AI Research FAISS.
- Flask: Pallets.
- PyTorch: PyTorch.
This project is for local use and does not send data to external services after initial dependency and model download.