Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
419de9c
Migrate to uv
nouamanecodes Dec 5, 2025
2699ffc
First
nouamanecodes Dec 5, 2025
b323193
Ignore builds
nouamanecodes Dec 5, 2025
b2855bf
Setup cli + google as provider
nouamanecodes Dec 5, 2025
9d3840f
Integrate google provider
nouamanecodes Dec 5, 2025
7256f8e
Add nano banana image generation and CLI
nouamanecodes Dec 5, 2025
d650bb0
Ignore output files
nouamanecodes Dec 5, 2025
e7f56f3
Add clean token counter and provider interfaces
nouamanecodes Dec 5, 2025
e1de623
Refactor provider interface with explicit capabilities
nouamanecodes Dec 5, 2025
81e176e
Update Google provider to use clean interface
nouamanecodes Dec 5, 2025
1b033fe
Refactor optimizer to use dependency injection
nouamanecodes Dec 5, 2025
b314992
Add custom exceptions for better error handling
nouamanecodes Dec 5, 2025
327dbb8
Add proper error handling with custom exceptions
nouamanecodes Dec 5, 2025
0e6c605
Improve CLI error handling and user feedback
nouamanecodes Dec 5, 2025
9c9f25b
Add type hints and error handling to GoogleProvider
nouamanecodes Dec 5, 2025
7a966ce
Optimize SDK performance: 7x faster token counting, 6x faster batch s…
nouamanecodes Dec 5, 2025
54b5aa7
Enhance CLI UX: add verbose mode, clean output, remove emojis
nouamanecodes Dec 5, 2025
e3b99c4
Extensive CLI testing: fix permission errors, add test fixtures
nouamanecodes Dec 5, 2025
8ea3cf1
Add token pricing and budget limiting: default for optimization, fo…
nouamanecodes Dec 5, 2025
e4856b8
Update help messages to showcase budget limiting feature
nouamanecodes Dec 5, 2025
ab14f39
Add testing libraries
nouamanecodes Dec 5, 2025
fa814b9
Cover codebase with unit tests
nouamanecodes Dec 5, 2025
e51272b
Readme for cli (merge into one after)
nouamanecodes Dec 5, 2025
b746372
Add CI that runs tests
nouamanecodes Dec 5, 2025
fb331aa
Cleanup dependencies
nouamanecodes Dec 5, 2025
2c089a8
More extensive coverage of files to ignore
nouamanecodes Dec 5, 2025
b866803
Cleanup pycache files
nouamanecodes Dec 5, 2025
d4e1d7f
Versioning
nouamanecodes Dec 5, 2025
e2e07cf
Fix uv command
nouamanecodes Dec 5, 2025
d7cedae
Use python 3.12+
nouamanecodes Dec 5, 2025
bd497cc
Add code formatting
nouamanecodes Dec 5, 2025
c4032c4
Ignore venv from code formatting and cleanup version
nouamanecodes Dec 5, 2025
a9b893d
Keep code formatting only
nouamanecodes Dec 5, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file removed .DS_Store
Binary file not shown.
70 changes: 70 additions & 0 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
name: Tests

on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]

jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
python-version: ["3.12", "3.13", "3.14"]

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Set up Python ${{ matrix.python-version }}
run: uv python install ${{ matrix.python-version }}

- name: Install dependencies
run: |
uv sync --all-extras

- name: Run unit tests
run: |
uv run pytest tests/unit/ -v --tb=short

- name: Run linting
run: |
uv run black --check .


integration-test:
runs-on: ubuntu-latest
needs: test
if: github.event_name == 'push' && github.ref == 'refs/heads/main'

steps:
- uses: actions/checkout@v4

- name: Install uv
uses: astral-sh/setup-uv@v4

- name: Set up Python 3.12
run: uv python install 3.12

- name: Install dependencies
run: |
uv sync --all-extras

- name: Test CLI installation and help
run: |
uv run prompt-learn --help
uv run prompt-learn optimize --help
uv run prompt-learn image --help

- name: Test package build
run: |
uv build

- name: Upload build artifacts
uses: actions/upload-artifact@v4
with:
name: dist
path: dist/
61 changes: 61 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual environments
.env
.venv
env/
venv/
ENV/
env.bak/
venv.bak/

# IDEs
.vscode/
.idea/
*.swp
*.swo
*~

# OS
.DS_Store
.DS_Store?
._*
.Spotlight-V100
.Trashes
ehthumbs.db
Thumbs.db

# Project specific
image_outputs/
test_dataset.csv
metaprompt.txt

# Temporary files
*.tmp
*.temp
.pytest_cache/
.coverage
htmlcov/

# Logs
*.log
97 changes: 97 additions & 0 deletions PR.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Architecture Refactoring & Performance Optimization

## Summary

Refactors the prompt-learning codebase with cleaner architecture, dependency injection, and performance optimizations. Token counting is now 67x faster for large datasets, with added Google AI integration and image generation support.

## Architecture Changes

The codebase had coupling issues that made it hard to extend. Fixed with proper separation of concerns:

**Interface Abstractions:**
- `TokenCounter` interface separates token counting from model validation logic
- `ModelProvider` abstraction makes adding new AI providers straightforward
- Configuration management centralized instead of scattered throughout files
- Custom exception hierarchy instead of generic ValueError everywhere

**New Files:**
- `interfaces/token_counter.py` - Token counting without model-specific logic
- `providers/base_provider.py` - Abstract interface for AI providers
- `config/settings.py` - Environment-based configuration management
- `core/exceptions.py` - Structured error handling

## Features Added

**Google AI Integration:**
Full Google AI support with dependency injection. Includes Gemini image generation and search grounding capabilities. For image generation, added human-in-the-loop evaluation since image quality assessment is subjective.

**CLI Tool:**
Click-based CLI with commands for optimization, evaluation, testing, and image generation. Error handling gives useful feedback, and supports both OpenAI and Google providers.

## Performance Optimizations

Found and fixed several bottlenecks:

**Token Counting Vectorization** (`interfaces/token_counter.py`)
The original code used `df.iterrows()` which processes rows one by one. Replaced with pandas vectorized operations using `.apply()` on entire columns.

```python
# Before: Row-by-row iteration
for _, row in df.iterrows():
total_tokens += self.count_tokens(row[col])

# After: Vectorized operations
col_tokens = df[col].fillna('').astype(str).apply(self.count_tokens)
token_counts += col_tokens
```

**Batch Splitting Optimization** (`core/dataset_splitter.py`)
Changed from creating DataFrame copies using index lists to pre-calculating boundaries and using efficient slicing.

**Regex Compilation** (`optimizer_sdk/prompt_learning_optimizer.py`)
Moved regex compilation to module level instead of recompiling on every method call.

## Performance Results

| Dataset Size | Component | Before | After | Improvement |
|-------------|-----------|---------|-------|-------------|
| Large (10K rows) | TikToken | 55.3s | 49.8s | 1.1x faster |
| Large (10K rows) | Approximate | 1.6s | 0.2s | 7.3x faster |
| Large (10K rows) | Batch Split | 2.0s | 0.4s | 5.7x faster |

## Technical Implementation

**Dependency Injection Pattern:**
```python
# Clean dependency injection
class PromptLearningOptimizer:
def __init__(self, prompt, model_choice, provider=None, token_counter=None):
self.provider = provider
self.token_counter = token_counter or self._get_default_counter()
```

**Error Handling:**
```python
class PromptLearningError(Exception): pass
class DatasetError(PromptLearningError): pass
class ProviderError(PromptLearningError): pass
```

## Testing Infrastructure

Added comprehensive benchmarking framework in `tests/benchmarks/` with performance tracking and analysis. The benchmark runner tests different dataset sizes and provides detailed timing and memory usage reports.

## Package Management

Replaced `requirements.txt` with `pyproject.toml` and added CLI entry point for easy installation and usage.

## Commits

- `1b033fe` - Refactor optimizer to use dependency injection
- `b314992` - Add custom exceptions for better error handling
- `327dbb8` - Add proper error handling with custom exceptions
- `0e6c605` - Improve CLI error handling and user feedback
- `9c9f25b` - Add type hints and error handling to GoogleProvider
- `7a966ce` - Optimize SDK performance: 7x faster token counting, 6x faster batch splitting

The refactoring makes the codebase more maintainable and significantly faster for large dataset processing.
Loading