# 1. Navigate to project
cd ~/dev/python-llama-demo/python-ai-crawler-scraper
# 2. Activate virtual environment
source ../venv/bin/activate
# 3. Run setup
./setup.sh
# 4. Create basic .env (if needed)
echo "SEED_URLS=https://example.com" > .env
# 5. Run your first crawl!
python main.py --seeds https://example.com --max-pages 5 --skip-llm# Small test crawl (no LLM, fast)
python main.py --seeds https://example.com --max-pages 5 --skip-llm
# Full crawl with LLM enhancement
python main.py --seeds https://docs.python.org --max-pages 25 --max-depth 2
# Resume interrupted crawl
python main.py --resume
# Domain-restricted crawl
python main.py --seeds https://example.com --allowed-domains example.comAfter crawling, find your results in:
- Database:
crawler.db(SQLite with all pages and links) - Obsidian Vault:
obsidian_vault/directory with.mdfiles
obsidian_vault/
├── example-com-homepage.md
├── about-us.md
├── contact-page.md
└── ... (one .md file per page)
Each file contains:
- YAML frontmatter with metadata
- Clean Markdown content
- Wiki-links to other pages
- Backlinks list
- Full docs: See
README.md - Architecture: See
SUMMARY.md - Test modules:
python <module>.py - Configuration: Edit
.envfile
LLM not working?
python main.py --skip-llmToo slow?
# Increase delay in .env
REQUEST_DELAY=2.0Want more control?
python main.py --helpThat's it! You're ready to crawl. See README.md for advanced features.