Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Use one backend (or set TOCIFY_BACKEND=openai|cursor to force).

# OpenAI: easiest for most users — just set this and run.
OPENAI_API_KEY=
OPENAI_MODEL=gpt-4o-mini

# Cursor CLI: needs `agent` on PATH and this key.
CURSOR_API_KEY=

# Optional: openai | cursor (default: auto from which key is set)
# TOCIFY_BACKEND=
60 changes: 60 additions & 0 deletions .github/workflows/weekly-digest-cursor.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: Weekly ToC Digest (Cursor)

on:
schedule:
# Mondays 08:00 America/Los_Angeles ≈ 16:00 UTC (adjust if you like)
- cron: "00 16 * * 1"
workflow_dispatch:

permissions:
contents: write

jobs:
digest:
runs-on: ubuntu-latest
steps:
- name: Checkout
uses: actions/checkout@v4

- name: Set Python version
run: echo "PYTHON_VERSION=$(cat .python-version)" >> $GITHUB_ENV

- name: Install uv
uses: astral-sh/setup-uv@v4
with:
python-version: ${{ env.PYTHON_VERSION }}
enable-cache: true
activate-environment: true

- name: Install deps
run: uv sync

- name: Install Cursor CLI
run: |
curl https://cursor.com/install -fsS | bash
echo "$HOME/.cursor/bin" >> $GITHUB_PATH

- name: Run digest
env:
TOCIFY_BACKEND: "cursor"
CURSOR_API_KEY: ${{ secrets.CURSOR_API_KEY }}
HTTP_PROXY: ""
HTTPS_PROXY: ""
ALL_PROXY: ""
NO_PROXY: "api.openai.com"
MIN_SCORE_READ: "0.35"
LOOKBACK_DAYS: "7"
SUMMARY_MAX_CHARS: "500"
PREFILTER_KEEP_TOP: "200"
BATCH_SIZE: "50"
run: |
export PATH="$HOME/.cursor/bin:$PATH"
uv run python digest.py

- name: Commit digest.md
run: |
git config user.name "toc-digest-bot"
git config user.email "toc-digest-bot@users.noreply.github.com"
git add digest.md
git commit -m "Update weekly ToC digest" || exit 0
git push
32 changes: 21 additions & 11 deletions .github/workflows/weekly-digest.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Weekly ToC Digest
name: Weekly ToC Digest (OpenAI)

on:
schedule:
Expand All @@ -16,30 +16,40 @@ jobs:
- name: Checkout
uses: actions/checkout@v4

- name: Setup Python
uses: actions/setup-python@v5
- name: Set Python version
run: echo "PYTHON_VERSION=$(cat .python-version)" >> $GITHUB_ENV

- name: Install uv
uses: astral-sh/setup-uv@v4
with:
python-version: "3.11"
python-version: ${{ env.PYTHON_VERSION }}
enable-cache: true
activate-environment: true

- name: Install deps
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
pip install --upgrade openai httpx certifi
run: uv sync

- name: Network check (OpenAI)
run: |
python - << 'PY'
uv run python - << 'PY'
import socket
host = "api.openai.com"
print("Resolving:", host)
print(socket.gethostbyname(host))
print("OK: DNS resolve")
PY
curl -I https://api.openai.com/v1/models --max-time 20
curl -I https://api.openai.com/v1/models --max-time 20 || true

- name: Show proxy-related env (debug)
run: |
echo "HTTP_PROXY=$HTTP_PROXY"
echo "HTTPS_PROXY=$HTTPS_PROXY"
echo "ALL_PROXY=$ALL_PROXY"
echo "NO_PROXY=$NO_PROXY"

- name: Run digest
env:
TOCIFY_BACKEND: "openai"
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
HTTP_PROXY: ""
HTTPS_PROXY: ""
Expand All @@ -50,7 +60,7 @@ jobs:
SUMMARY_MAX_CHARS: "500"
PREFILTER_KEEP_TOP: "200"
BATCH_SIZE: "50"
run: python digest.py
run: uv run python digest.py

- name: Commit digest.md
run: |
Expand Down
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -200,8 +200,13 @@ cython_debug/
# refer to https://docs.cursor.com/context/ignore-files
.cursorignore
.cursorindexingignore
.cursor/

# Marimo
marimo/_static/
marimo/_lsp/
__marimo__/

# uv
uv.lock
pyproject.toml
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11
63 changes: 35 additions & 28 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,62 @@
# tocify — Weekly Journal ToC Digest (RSS → OpenAI → `digest.md`)
# tocify — Weekly Journal ToC Digest (RSS → triage → `digest.md`)

This repo runs a GitHub Action once a week (or on-demand) that:

1. pulls new items from a list of journal RSS feeds
2. uses OpenAI to triage which items match your research interests
2. triages items against your research interests (OpenAI API or Cursor CLI)
3. writes a ranked digest to `digest.md` and commits it back to the repo

It’s meant to be forked and customized.

This was almost entirely vibe-coded as an exercise (I'm pleased at how well it works!)

---

## What’s in this repo

- **`digest.py`** — the pipeline (fetch RSS → filter → OpenAI triage → render markdown)
- **`feeds.txt`** — RSS feed list (supports comments; optionally supports `Name | URL`)
- **`interests.md`** — your keywords + narrative seed (used for relevance)
- **`prompt.txt`** — the prompt template (easy to tune without editing Python)
- **`digest.py`** — pipeline (fetch RSS → filter → triage → render markdown)
- **`integrations/`** — optional Cursor CLI triage backend (default: in-file OpenAI in digest.py)
- **`feeds.txt`** — RSS feed list (comments; optional `Name | URL`)
- **`interests.md`** — keywords + narrative (used for relevance)
- **`prompt.txt`** — prompt template (used by OpenAI and Cursor backends)
- **`digest.md`** — generated output (auto-updated)
- **`.github/workflows/weekly-digest.yml`** — scheduled GitHub Action runner
- **`.github/workflows/weekly-digest.yml`** — scheduled GitHub Action
- **`requirements.txt`** — Python dependencies
- **`.python-version`** — pinned Python version (used by uv, pyenv, etc.)

---

## Quick start (fork + run)
## Environment

Python version is pinned in **`.python-version`** (e.g. `3.11`). The repo supports **[uv](https://docs.astral.sh/uv/)** for fast, reproducible installs:

### 1) Fork the repo
- Click **Fork** on GitHub to copy this repo into your account.
```bash
# Install uv (https://docs.astral.sh/uv/getting-started/installation/), then:
uv venv
uv pip install -r requirements.txt
uv run python digest.py
```

### 2) Enable OpenAI billing / credits
The OpenAI API requires an active billing setup or credits.
- Go to the OpenAI Platform and ensure billing is enabled and/or credits are available.
- If you see errors like `insufficient_quota` or `You exceeded your current quota`, this is the cause.
- I recommend putting in spending limits. This uses very little compute, but it's nice to be careful.
Alternatively use pip and a venv as usual; the GitHub workflow uses uv and reads `.python-version`.

---

### 3) Create an OpenAI API key
Create an API key in the OpenAI Platform and copy it.
## Quick start (layperson: OpenAI)

**Important:** never commit this key to the repo.
1. **Fork** the repo.
2. Set **`OPENAI_API_KEY`** (get one from platform.openai.com). Never commit it.
3. Locally: copy `.env.example` to `.env`, add your key, run `python digest.py`.
4. For GitHub Actions: add secret **`OPENAI_API_KEY`** in Settings → Secrets. The workflow will use it; no CLI needed.

### 4) Add the API key as a GitHub Actions secret
In your forked repo:
- Go to **Settings → Secrets and variables → Actions**
- Click **New repository secret**
- Name: `OPENAI_API_KEY`
- Value: paste your OpenAI API key
## Quick start (Cursor CLI)

That’s it—GitHub will inject it into the workflow at runtime.
1. **Fork** the repo.
2. Install the Cursor CLI and set **`CURSOR_API_KEY`** (Cursor settings).
3. For GitHub Actions: add secret **`CURSOR_API_KEY`** and keep the workflow’s Cursor install step.

Backend is auto-chosen from which key is set, or set **`TOCIFY_BACKEND=openai`** or **`cursor`** to force.

---

### 5) Configure your feeds
## Configure your feeds
Edit **`feeds.txt`**.

You can use comments:
Expand Down
98 changes: 10 additions & 88 deletions digest.py
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
import os, re, json, time, math, hashlib
import os, re, math, hashlib
from datetime import datetime, timezone, timedelta

import feedparser
import httpx
from dateutil import parser as dtparser
from openai import OpenAI, APITimeoutError, APIConnectionError, RateLimitError
from dotenv import load_dotenv

load_dotenv()

# ---- config (env-tweakable) ----
MODEL = os.getenv("OPENAI_MODEL", "gpt-4o")
MAX_ITEMS_PER_FEED = int(os.getenv("MAX_ITEMS_PER_FEED", "50"))
MAX_TOTAL_ITEMS = int(os.getenv("MAX_TOTAL_ITEMS", "400"))
LOOKBACK_DAYS = int(os.getenv("LOOKBACK_DAYS", "7"))
Expand All @@ -19,34 +18,6 @@
MIN_SCORE_READ = float(os.getenv("MIN_SCORE_READ", "0.65"))
MAX_RETURNED = int(os.getenv("MAX_RETURNED", "40"))

SCHEMA = {
"type": "object",
"additionalProperties": False,
"properties": {
"week_of": {"type": "string"},
"notes": {"type": "string"},
"ranked": {
"type": "array",
"items": {
"type": "object",
"additionalProperties": False,
"properties": {
"id": {"type": "string"},
"title": {"type": "string"},
"link": {"type": "string"},
"source": {"type": "string"},
"published_utc": {"type": ["string", "null"]},
"score": {"type": "number"},
"why": {"type": "string"},
"tags": {"type": "array", "items": {"type": "string"}},
},
"required": ["id", "title", "link", "source", "published_utc", "score", "why", "tags"],
},
},
},
"required": ["week_of", "notes", "ranked"],
}


# ---- tiny helpers ----
def load_feeds(path: str) -> list[dict]:
Expand Down Expand Up @@ -83,12 +54,6 @@ def load_feeds(path: str) -> list[dict]:
def read_text(path: str) -> str:
with open(path, "r", encoding="utf-8") as f:
return f.read()

def load_prompt_template(path: str = "prompt.txt") -> str:
if not os.path.exists(path):
raise RuntimeError("prompt.txt not found in repo root")
with open(path, "r", encoding="utf-8") as f:
return f.read()

def sha1(s: str) -> str:
return hashlib.sha1(s.encode("utf-8")).hexdigest()
Expand Down Expand Up @@ -181,61 +146,17 @@ def hits(it):
return matched[:keep_top]


# ---- openai ----
def make_openai_client() -> OpenAI:
key = os.environ.get("OPENAI_API_KEY", "").strip()
if not key.startswith("sk-"):
raise RuntimeError("OPENAI_API_KEY missing/invalid (expected to start with 'sk-').")
http_client = httpx.Client(
timeout=httpx.Timeout(connect=30.0, read=300.0, write=30.0, pool=30.0),
http2=False,
trust_env=False,
headers={"Connection": "close", "Accept-Encoding": "gzip"},
)
return OpenAI(api_key=key, http_client=http_client)

def call_openai_triage(client: OpenAI, interests: dict, items: list[dict]) -> dict:
lean_items = [{
"id": it["id"],
"source": it["source"],
"title": it["title"],
"link": it["link"],
"published_utc": it.get("published_utc"),
"summary": (it.get("summary") or "")[:SUMMARY_MAX_CHARS],
} for it in items]

template = load_prompt_template()

prompt = (
template
.replace("{{KEYWORDS}}", json.dumps(interests["keywords"], ensure_ascii=False))
.replace("{{NARRATIVE}}", interests["narrative"])
.replace("{{ITEMS}}", json.dumps(lean_items, ensure_ascii=False))
)

last = None
for attempt in range(6):
try:
resp = client.responses.create(
model=MODEL,
input=prompt,
text={"format": {"type": "json_schema", "name": "weekly_toc_digest", "schema": SCHEMA, "strict": True}},
)
return json.loads(resp.output_text)
except (APITimeoutError, APIConnectionError, RateLimitError) as e:
last = e
time.sleep(min(60, 2 ** attempt))
raise last

def triage_in_batches(client: OpenAI, interests: dict, items: list[dict], batch_size: int) -> dict:
# ---- triage (backend-agnostic batch loop) ----
def triage_in_batches(interests: dict, items: list[dict], batch_size: int, triage_fn) -> dict:
"""triage_fn(interests, batch) -> dict with keys notes, ranked (and optionally week_of)."""
week_of = datetime.now(timezone.utc).date().isoformat()
total = math.ceil(len(items) / batch_size)
all_ranked, notes_parts = [], []

for i in range(0, len(items), batch_size):
batch = items[i:i + batch_size]
print(f"Triage batch {i // batch_size + 1}/{total} ({len(batch)} items)")
res = call_openai_triage(client, interests, batch)
res = triage_fn(interests, batch)
if res.get("notes", "").strip():
notes_parts.append(res["notes"].strip())
all_ranked.extend(res.get("ranked", []))
Expand Down Expand Up @@ -308,9 +229,10 @@ def main():
print(f"Sending {len(items)} RSS items to model (post-filter)")

items_by_id = {it["id"]: it for it in items}
client = make_openai_client()

result = triage_in_batches(client, interests, items, batch_size=BATCH_SIZE)
from integrations import get_triage_backend
triage_fn = get_triage_backend()
result = triage_in_batches(interests, items, BATCH_SIZE, triage_fn)
md = render_digest_md(result, items_by_id)

with open("digest.md", "w", encoding="utf-8") as f:
Expand Down
Loading