Skip to content

NeuralDreamResearch/ParallelEntropyMinimizedTextGeneration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

🧠 Parallel Entropy-Minimized Text Generation

This repository implements a multi-agent, entropy-driven decoding system for large language models (LLMs), using parallel sampling and self-evaluation to generate coherent long-form text.

Rather than sampling from a single decoding configuration, multiple LLM workers propose next tokens in parallel under different temperature / top-k settings. At each step, the system selects the lowest-entropy proposal — the one the model is most confident about — to extend the text. A secondary “critic” model periodically evaluates the coherence of the generated text and rolls back recent tokens if quality degrades.


🚀 Concept

Traditional LLM decoding (greedy, sampling, top-k, nucleus, etc.) commits to a single temperature and sampling strategy. This project explores a parallel confidence-based generation loop, where each decoding job acts as an agent exploring a slightly different probabilistic regime.

Key idea:

“Let multiple minds propose, but let entropy decide.”

Each agent produces its candidate next token, entropy (uncertainty), and temperature/top-k metadata. The system selects the token with the lowest entropy — the one representing the most decisive prediction — while still benefiting from diverse search branches.

To maintain global coherence, a secondary evaluator periodically reviews the text and reverts to a stable checkpoint when coherence drops.


🧩 Method Overview

  1. Parallel Generation Multiple threads run on multiple GPUs (cuda:0, cuda:1), each loading a quantized version of the base model (e.g., Qwen/Qwen3-0.6B).

  2. Entropy-Based Selection Each worker:

    • Computes the softmax probabilities for the next token.
    • Samples under its (temperature, top_k) configuration.
    • Returns the sampled token and its entropy.
    • The controller picks the lowest-entropy token.
  3. Self-Evaluation Loop Every few steps, another instance of the same base model (used as a critic) judges text quality with a simple prompt:

    Text:
    ...
    Question: Is this text coherent and not nonsense?
    Reply GOOD or BAD.
    

    If it replies “BAD,” the system rolls back several tokens and re-generates.

  4. Logging Each step’s results (tokens, entropy, chosen candidate, evaluator verdicts) are written to generation_log.txt.


⚙️ Configuration

Parameter Description Default
MODEL_ID Base model to use Qwen/Qwen3-0.6B
NUM_JOBS Number of decoding configurations 8
MAX_PARALLEL Max threads per step 8
MAX_TOKENS Maximum output length 25000
ROLLBACK Tokens to roll back after a “BAD” verdict 5
CHECK_EVERY Evaluate after this many steps 5
DEVICES List of CUDA devices ["cuda:0", "cuda:1"]

🧠 Example Flow

python parallel_entropy_gen.py

Example log snippet:

Step 42: chosen=the (entropy=1.283)
Evaluator after step 45: GOOD
Step 50: chosen=universe (entropy=0.993)
Evaluator after step 55: BAD → rolling back 5 tokens

📊 Output

  • Final text: printed to console and appended at the end of the log file.
  • Log file (generation_log.txt): includes per-step tokens, entropy values, chosen configuration, and evaluator feedback.

🧩 Future Work

  • Implement beam-style entropy fusion across time steps.
  • Add semantic evaluators using larger critic models (e.g., GPT-based reward model).
  • Support dynamic agent allocation based on entropy variance.
  • Visualize entropy progression and rollback patterns.

🧪 Requirements

pip install torch transformers accelerate

Optional (for multi-GPU / quantized inference):

pip install bitsandbytes

🧭 Summary

This project introduces a self-correcting, entropy-aware decoding framework that balances diversity and coherence without fine-tuning. It can be extended for:

  • Reinforcement-learning-free text quality improvement,
  • Multi-agent model orchestration,
  • Online adaptive temperature control.

In essence: multiple sampling heads act as explorers, entropy acts as a judge, and a critic model ensures sanity.


Would you like me to include mathematical formalism (e.g., expected entropy minimization equation and system diagram) for the README as well? It could make it suitable for an academic GitHub repository.

About

Parallel Entropy-Minimized Text Generation with Multi-LLM

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages