🧠 Parallel Entropy-Minimized Text Generation

This repository implements a multi-agent, entropy-driven decoding system for large language models (LLMs), using parallel sampling and self-evaluation to generate coherent long-form text.

Rather than sampling from a single decoding configuration, multiple LLM workers propose next tokens in parallel under different temperature / top-k settings. At each step, the system selects the lowest-entropy proposal — the one the model is most confident about — to extend the text. A secondary “critic” model periodically evaluates the coherence of the generated text and rolls back recent tokens if quality degrades.

🚀 Concept

Traditional LLM decoding (greedy, sampling, top-k, nucleus, etc.) commits to a single temperature and sampling strategy. This project explores a parallel confidence-based generation loop, where each decoding job acts as an agent exploring a slightly different probabilistic regime.

Key idea:

“Let multiple minds propose, but let entropy decide.”

Each agent produces its candidate next token, entropy (uncertainty), and temperature/top-k metadata. The system selects the token with the lowest entropy — the one representing the most decisive prediction — while still benefiting from diverse search branches.

To maintain global coherence, a secondary evaluator periodically reviews the text and reverts to a stable checkpoint when coherence drops.

🧩 Method Overview

Parallel Generation Multiple threads run on multiple GPUs (cuda:0, cuda:1), each loading a quantized version of the base model (e.g., Qwen/Qwen3-0.6B).
Entropy-Based Selection Each worker:
- Computes the softmax probabilities for the next token.
- Samples under its (temperature, top_k) configuration.
- Returns the sampled token and its entropy.
- The controller picks the lowest-entropy token.
Self-Evaluation Loop Every few steps, another instance of the same base model (used as a critic) judges text quality with a simple prompt:
```
Text:
...
Question: Is this text coherent and not nonsense?
Reply GOOD or BAD.
```
If it replies “BAD,” the system rolls back several tokens and re-generates.
Logging Each step’s results (tokens, entropy, chosen candidate, evaluator verdicts) are written to generation_log.txt.

⚙️ Configuration

Parameter	Description	Default
`MODEL_ID`	Base model to use	`Qwen/Qwen3-0.6B`
`NUM_JOBS`	Number of decoding configurations	`8`
`MAX_PARALLEL`	Max threads per step	`8`
`MAX_TOKENS`	Maximum output length	`25000`
`ROLLBACK`	Tokens to roll back after a “BAD” verdict	`5`
`CHECK_EVERY`	Evaluate after this many steps	`5`
`DEVICES`	List of CUDA devices	`["cuda:0", "cuda:1"]`

🧠 Example Flow

python parallel_entropy_gen.py

Example log snippet:

Step 42: chosen=the (entropy=1.283)
Evaluator after step 45: GOOD
Step 50: chosen=universe (entropy=0.993)
Evaluator after step 55: BAD → rolling back 5 tokens

📊 Output

Final text: printed to console and appended at the end of the log file.
Log file (generation_log.txt): includes per-step tokens, entropy values, chosen configuration, and evaluator feedback.

🧩 Future Work

Implement beam-style entropy fusion across time steps.
Add semantic evaluators using larger critic models (e.g., GPT-based reward model).
Support dynamic agent allocation based on entropy variance.
Visualize entropy progression and rollback patterns.

🧪 Requirements

pip install torch transformers accelerate

Optional (for multi-GPU / quantized inference):

pip install bitsandbytes

🧭 Summary

This project introduces a self-correcting, entropy-aware decoding framework that balances diversity and coherence without fine-tuning. It can be extended for:

Reinforcement-learning-free text quality improvement,
Multi-agent model orchestration,
Online adaptive temperature control.

In essence: multiple sampling heads act as explorers, entropy acts as a judge, and a critic model ensures sanity.

Would you like me to include mathematical formalism (e.g., expected entropy minimization equation and system diagram) for the README as well? It could make it suitable for an academic GitHub repository.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
generation_log.txt		generation_log.txt
multimodel_inference.py		multimodel_inference.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Parallel Entropy-Minimized Text Generation

🚀 Concept

🧩 Method Overview

⚙️ Configuration

🧠 Example Flow

📊 Output

🧩 Future Work

🧪 Requirements

🧭 Summary

About

Uh oh!

Releases

Packages

Languages

NeuralDreamResearch/ParallelEntropyMinimizedTextGeneration

Folders and files

Latest commit

History

Repository files navigation

🧠 Parallel Entropy-Minimized Text Generation

🚀 Concept

🧩 Method Overview

⚙️ Configuration

🧠 Example Flow

📊 Output

🧩 Future Work

🧪 Requirements

🧭 Summary

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages