From fce28ad64bbb8d435aaadce4ebc36e142eb2986d Mon Sep 17 00:00:00 2001 From: Vishal Motwani Date: Mon, 19 Jan 2026 12:37:32 -0500 Subject: [PATCH 1/2] chore: add contributors --- README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/README.md b/README.md index f76957b..5038b08 100644 --- a/README.md +++ b/README.md @@ -327,6 +327,12 @@ To cite dataset used in this work, please use: } ``` +## Contributors + +- **Vishal Motwani** — Founding Head of Product, Numbers Station AI +- **Sen Wu** — Co-founder, Numbers Station AI +- **Laurel Orr** — Principal Developer, Numbers Station AI + ## Acknowledgement We are appreciative to the work done by the all authors for those datasets that made this project possible. \ No newline at end of file From 4be1eeb8653e630e17b6f760f101458c9a9ab4d1 Mon Sep 17 00:00:00 2001 From: Vishal Motwani Date: Sun, 25 Jan 2026 11:50:14 -0500 Subject: [PATCH 2/2] Eval Results --- README.md | 103 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 102 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5038b08..75cde79 100644 --- a/README.md +++ b/README.md @@ -327,12 +327,113 @@ To cite dataset used in this work, please use: } ``` +## Benchmark Results + +### Spider Benchmark Performance + +NSQL-llama-2-7B achieves near-parity with GPT-4 on overall execution accuracy while being **~250× smaller**, and **significantly outperforms GPT-4 on complex queries**. + +#### Overall Results + +| Model | Parameters | Execution Acc | Matching Acc | +|-------|------------|---------------|--------------| +| **NSQL-llama-2-7B** | **7B** | **75.0%** | **66.3%** | +| GPT-4 | ~1.8T | 76.2% | 41.9% | +| GPT-3.5 | — | 72.8% | 44.2% | +| Llama-2-7B (base) | 7B | 29.1% | 19.3% | + +#### Performance by Query Complexity + +| Query Type | NSQL-7B | GPT-4 | Δ | +|------------|---------|-------|---| +| **Join** | **53.7%** | ~37.6% | **+43%** | +| **Nested** | **57.2%** | ~37.1% | **+54%** | +| Simple | 91.4% | Higher | — | + +**Key insight:** NSQL outperforms GPT-4 on complex queries (joins, nested) while achieving near-parity overall at 250× smaller size. + +### Why This Matters + +- ✅ **+43% better on Join queries** — critical for multi-table enterprise data +- ✅ **+54% better on Nested queries** — complex business logic +- ✅ **+24.4 points on Matching Accuracy** — structurally correct SQL +- ✅ **250× smaller** — runs locally, no API calls, complete privacy +- ✅ **Near-parity overall** — 75.0% vs 76.2% execution accuracy + +### Full Model Comparison + +
+Click to expand full benchmark table + +#### Open Models + +| Model | Size | Spider Exec | Spider Match | GeoQuery Exec | GeoQuery Match | +|-------|------|-------------|--------------|---------------|----------------| +| NSQL-350M | 350M | 51.7% | 45.6% | 18.4% | 4.3% | +| NSQL-2B | 2B | 59.3% | 53.2% | 18.4% | 15.2% | +| NSQL-6B | 6B | 63.6% | 57.4% | 26.5% | 15.2% | +| **NSQL-llama-2-7B** | **7B** | **75.0%** | **66.3%** | **26.5%** | **30.4%** | +| Llama-2-7B | 7B | 29.1% | 19.3% | 0% | 8.7% | +| Llama-2-13B | 13B | 44.9% | 32.2% | 12.2% | 26.1% | +| Llama-2-70B | 70B | 61.5% | 35.4% | 12.2% | 28.3% | +| StarCoder | 15.5B | 57.7% | 36.0% | 12.2% | 13.0% | + +#### Closed Models + +| Model | Size | Spider Exec | Spider Match | GeoQuery Exec | GeoQuery Match | +|-------|------|-------------|--------------|---------------|----------------| +| Text-Davinci-003 | 175B | 72.0% | 49.9% | 24.5% | 41.3% | +| GPT-3.5 Chat | — | 72.8% | 44.2% | 46.9% | 39.1% | +| **GPT-4** | **~1.8T** | **76.2%** | **41.9%** | **55.1%** | **39.1%** | + +
+ +### Query Type Breakdown (NSQL Models) + +| Model | Join | Nested | Simple | +|-------|------|--------|--------| +| NSQL-6B | 37.5% | 37.1% | 83.5% | +| **NSQL-llama-2-7B** | **53.7%** | **57.2%** | **91.4%** | + +--- + +## Evaluation + +**Benchmark:** Spider (Yu et al., 2018) — the standard academic evaluation for Text-to-SQL + +**Metrics:** +- **Execution Accuracy:** Does the SQL return correct results? +- **Matching Accuracy:** Does the SQL structure match ground truth? + +**Query Categories:** +- **Join:** Queries requiring multiple table joins +- **Nested:** Queries with subqueries +- **Simple:** Single-table queries + +--- + + ## Contributors -- **Vishal Motwani** — Founding Head of Product, Numbers Station AI +- **Vishal Motwani** — Founding Product Engineer, Numbers Station AI - **Sen Wu** — Co-founder, Numbers Station AI - **Laurel Orr** — Principal Developer, Numbers Station AI +--- + +## Citation + +If you use NSQL in your research, please cite: + +```bibtex +@misc{nsql2024, + title={NSQL: An Open-Source Text-to-SQL Model Family}, + author={Numbers Station AI}, + year={2024}, + url={https://huggingface.co/NumbersStation} +} +``` + ## Acknowledgement We are appreciative to the work done by the all authors for those datasets that made this project possible. \ No newline at end of file