How Generative AI Concentrates Rather Than Degrades Technical Knowledge Communities
A natural experiment using Stack Overflow data to test whether ChatGPT's launch degraded the cognitive quality of process automation questions. ICIS 2026 submission.
python -m venv venv
venv/Scripts/pip install -r requirements.txt # Windows
# or: venv/bin/pip install -r requirements.txt # Unix
# Copy .env.example to .env and add your API keys
cp .env.example .envThe project runs in 5 stages. Each stage is independently executable.
# Stage 1: Download SO questions via Stack Exchange API
python -m src.data_acquisition.main fetch-data
# Stage 2: Stratified sampling + feature engineering
python -m src.sampling.main run-all
# Stage 3: Rate questions with Claude API (LLM-as-judge)
python -m src.rating_pipeline.main rate --mode batch --yes
# Stage 4: DiD regression + robustness checks + figures
python -m src.analysis.main run-all
# Stage 5: Generate LaTeX paper
python -m src.paper.main run-allsrc/
data_acquisition/ # Stage 1: SO API client, CSV processing, validation
sampling/ # Stage 2: Stratified sampling, feature engineering
rating_pipeline/ # Stage 3: Claude API batch rating, JSON parsing
analysis/ # Stage 4: DiD regression, robustness, figures
paper/ # Stage 5: LaTeX paper generation
data/
raw/ # Downloaded CSVs (gitignored)
processed/ # Parquet files (gitignored)
ratings/ # LLM rating outputs (gitignored)
figures/ # Publication-quality PNGs
paper/ # Generated LaTeX paper (gitignored)
No significant decline in cognitive quality across any dimension (all p > 0.36), despite a 55% drop in question volume. ChatGPT acts as a filter -- absorbing routine questions -- rather than eroding the quality of those that remain. A significant increase in minimal reproducible examples (p = 0.036) supports this interpretation.