NVIDIA · aerdem4 · Jul 3, 2025 · Jun 24, 2025 · Jun 27, 2025 · Jun 27, 2025
diff --git a/README.md b/README.md
@@ -18,7 +18,7 @@ pip install logits-processor-zoo
 ## Supported Frameworks
 * transformers
 * vLLM
-* TensorRT-LLM
+* TensorRT-LLM (>=0.20.0)
 
 ## Usage
 
@@ -87,3 +87,5 @@ One common use case is to force writing python code just after thinking:
 trigger_python = TriggerPhraseLogitsProcessor(phrase="\n```python", trigger_token_phrase="</think>", 
                                               tokenizer=tokenizer, trigger_count=1, trigger_after=True)
 ```
+### PreventHallucinationLogitsProcessor
+A logits processor that mitigates hallucinated model outputs by enforcing a predefined fallback phrase when token confidence falls below a specified threshold.
diff --git a/example_notebooks/transformers/trigger_phrase_logits_processor.ipynb b/example_notebooks/transformers/trigger_phrase_logits_processor.ipynb
@@ -28,11 +28,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
-      "  warnings.warn(\n",
-      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
-      "  warnings.warn(\n"
+      "Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.\n"
      ]
     }
    ],
@@ -70,14 +66,9 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:392: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.\n",
-      "  warnings.warn(\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:397: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.\n",
-      "  warnings.warn(\n",
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:407: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `None` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.\n",
-      "  warnings.warn(\n",
       "The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n",
-      "Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.\n"
+      "Setting `pad_token_id` to `eos_token_id`:151643 for open-end generation.\n",
+      "The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.\n"
      ]
     },
     {
@@ -113,9 +104,9 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
       "</think>\n",
@@ -215,39 +206,64 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
-      ",,,\n",
+      "Wait, but in the problem statement, it says to make it recursive. So, the function as written is recursive, but it's not optimized. So, I think this should be the solution.\n",
+      "Wait, but the problem says to make it recursive, so perhaps the function should handle larger n efficiently. But without memoization, it's not efficient. So, maybe the function should be written with memoization or an iterative approach.\n",
+      "\n",
+      "Alternatively, I can implement it iteratively, which is more efficient. Let's think about that.\n",
+      "\n",
+      "An iterative approach would start from the bottom and build up the Fibonacci numbers. For example:\n",
+      "\n",
+      "def fibonacci(n):\n",
+      "    if n == 0:\n",
+      "        return 0\n",
+      "    a, b = 0, 1\n",
+      "    for _ in range(2, n+1):\n",
+      "        a, b = b, a + b\n",
+      "    return b\n",
+      "\n",
+      "Wait, let's test this. For n=5, the loop runs from 2 to 5, which is 4 iterations. Let's see:\n",
+      "\n",
+      "Initial a=0, b=1.\n",
+      "\n",
+      "After first iteration (i=2): a becomes 1, b becomes 0+1=1.\n",
+      "\n",
+      "After second iteration (i=3): a=1, b=1+1=2.\n",
+      "\n",
+      "After third iteration (i=4): a=2, b=1+2=3.\n",
+      "\n",
+      "After fourth iteration (i=5): a=3, b=2+3=5.\n",
+      "\n",
+      "So, return b=5, which is correct.\n",
+      "\n",
+      "This approach is more efficient as it runs in O(n) time and uses O(1) space, which is better for large n.\n",
       "\n",
-      "Wait, but the problem says to make it recursive. So, the function should call itself with smaller arguments. The approach I have is correct and recursive.\n",
+      "But the problem says to make it recursive. So, perhaps the recursive approach is acceptable, but the iterative approach is better for larger n.\n",
       "\n",
-      "So, the final function is as I wrote above.\n",
-      ",,,\n",
+      "But since the problem doesn't specify, I'll proceed with the recursive approach as it's straightforward.\n",
       "\n",
-      "Wait, but in the function, for n=2, it's F(1)+F(0) = 1+0=1, which is correct. For n=3, F(2)+F(1)=1+1=2. So, the function works as expected.\n",
+      "Wait, but the problem says to make it recursive. So, perhaps the function should be written recursively, but without memoization. So, the function as I wrote earlier is correct.\n",
       "\n",
-      "I think this should solve the problem.\n",
+      "Alternatively, I can write it with memoization using a dictionary to store previously computed values. But that's more advanced and perhaps beyond the scope of this problem.\n",
+      "\n",
+      "So, the function I wrote earlier is correct and meets the requirements.\n",
       "</think>\n",
       "\n",
       "To solve this problem, we need to generate the nth Fibonacci number using a recursive approach. The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones, starting from 0 and 1. \n",
       "\n",
       "### Approach\n",
-      "The approach to solve this problem involves using recursion, which is a method where a function calls itself with a modified parameter to achieve the desired result. Here's a step-by-step breakdown of the approach:\n",
+      "The Fibonacci sequence is defined as follows:\n",
+      "- F(0) = 0\n",
+      "- F(1) = 1\n",
+      "- F(n) = F(n-1) + F(n-2) for n >= 2\n",
       "\n",
-      "1. **Base Cases**: \n",
-      "   - If `n` is 0, return 0.\n",
-      "   - If `n` is 1, return 1.\n",
-      "   \n",
-      "2. **Recursive Case**:\n",
-      "   - For any `n` greater than 1, the nth Fibonacci number is the sum of the (n-1)th and (n-2)th Fibonacci numbers. This is achieved by recursively calling the function with `n-1` and `n-2` and adding their results.\n",
-      "\n",
-      "This approach ensures that each Fibonacci number is computed by breaking down the problem into smaller subproblems, which are then solved recursively.\n",
+      "Given the requirement to use a recursive approach, we can define a function that calls itself with smaller values of n until it reaches the base cases. The function will handle the base cases directly and use recursion for the general case.\n",
       "\n",
       "### Solution Code\n",
-      "\n",
       "```python\n",
       "def fibonacci(n):\n",
       "    if n == 0:\n",
@@ -259,10 +275,16 @@
       "```\n",
       "\n",
       "### Explanation\n",
-      "- **Base Cases**: The function first checks if `n` is 0 or 1. If `n` is 0, it returns 0. If `n` is 1, it returns 1. These are the simplest cases of the Fibonacci sequence.\n",
-      "- **Recursive Case**: For any `n` greater than 1, the function calls itself with `n-1` and `n-2`, and returns the sum of these two recursive calls. This builds up the solution by solving smaller subproblems and combining their results.\n",
+      "The function `fibonacci` takes an integer `n` as input and returns the nth Fibonacci number. \n",
+      "\n",
+      "1. **Base Cases**:\n",
+      "   - If `n` is 0, the function returns 0.\n",
+      "   - If `n` is 1, the function returns 1.\n",
+      "\n",
+      "2. **Recursive Case**:\n",
+      "   - For `n >= 2`, the function calls itself with `n-1` and `n-2` and returns the sum of these two recursive calls. This builds up the Fibonacci sequence from the bottom up, ensuring that each value is computed only once.\n",
       "\n",
-      "This approach is straightforward and leverages the divide-and-conquer strategy inherent in recursion, making it easy to understand and implement. However, it's important to note that this approach has a time complexity of O(2^n) due to the exponential number of function calls, which is not efficient for large values of `n`. For larger values, an iterative approach or memoization would be more efficient.\n",
+      "This approach is straightforward and leverages the recursive nature of the Fibonacci sequence, making it easy to understand and implement. However, it's important to note that for very large values of `n`, this approach can be inefficient due to repeated calculations. For larger values, an iterative approach or memoization would be more efficient.\n",
       "-----END-----\n",
       "\n"
      ]
@@ -332,9 +354,9 @@
       "\n",
       "Let me test this function with some examples. For n=0, it returns 0. For n=1, returns 1. For n=2, it's F(1)+F(0) = 1+0=1. For n=3, F(2)+F(1)=1+1=2. That looks correct.\n",
       "\n",
-      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which matches the standard definition. So, the function should work regardless of the starting point as long as the base cases are correct.\n",
+      "Wait, but sometimes people define the Fibonacci sequence starting with F(1)=1, F(2)=1, F(3)=2, etc. So, if the function is called with n=5, it should return 5. Let me see: F(5) is 5, which is correct.\n",
       "\n",
-      "Another thing to consider is the base cases. If the function is called with n=0, it returns 0, which is correct. For n=1, returns 1. For n=2, returns 1, which is correct. So, the function should handle all non-negative integers correctly.\n",
+      "Another test case: n=5. Let's compute it step by step. F(0)=0, F(1)=1, F(2)=1, F(3)=2, F(4)=3, F(5)=5. So the function should return 5 for n=5.\n",
       "\n",
       "I think this should work. So, the function is straightforward. It's a simple recursive implementation, but it's not the most efficient for large n. However, for the purpose of this problem, it's acceptable.\n",
       "</think>\n",
@@ -348,7 +370,7 @@
       "        return fibonacci(n-1) + fibonacci(n-2)\n",
       "```\n",
       "\n",
-      "This function calculates the nth Fibonacci number using a recursive approach. It handles the base cases where n is 0 or 1 and recursively computes the value for larger n by summing the two preceding Fibonacci numbers.\n",
+      "This function calculates the nth Fibonacci number using a recursive approach. It handles the base cases where n is 0 or 1 and for other values, it recursively calculates the sum of the two preceding Fibonacci numbers. While this implementation is straightforward, it's not the most efficient for large values of n due to repeated calculations.\n",
       "-----END-----\n",
       "\n"
      ]

diff --git a/example_notebooks/trtllm/README.md b/example_notebooks/trtllm/README.md
@@ -2,12 +2,52 @@
 
 ## Quick Start
 
-Follow this guide to create an engine:
-https://nvidia.github.io/TensorRT-LLM/quick-start-guide.html
+It's recommended to use [TensorRT-LLM release containers](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/tensorrt-llm/containers/release/tags) (>= 0.20.0) that has TensorRT-LLM pre-installed.
+Alternatively, please follow [this documentation](https://nvidia.github.io/TensorRT-LLM/installation/linux.html) to install it in [NGC PyTorch containers](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch/tags) (>=25.04).
 
 ## Examples
 
+### GenLengthLogitsProcessor
+A logits processor that adjusts the likelihood of the end-of-sequence (EOS) token based on the length of the generated sequence, encouraging or discouraging shorter answers.
+```
+python example_notebooks/trtllm/gen_length_logits_processor.py 
+```
+
+### CiteFromPromptLogitsProcessor
+A logits processor which boosts or diminishes the likelihood of tokens present in the prompt (and optionally EOS token) to encourage the model to generate tokens similar to those seen in the prompt or vice versa.
+```
+python example_notebooks/trtllm/cite_prompt_logits_processor.py -p "Retrieved information:
+    Pokémon is a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media. 
+    The franchise takes place in a shared universe in which humans co-exist with creatures known as Pokémon, a large variety of species endowed with special powers. 
+    The franchise's target audience is children aged 5 to 12, but it is known to attract people of all ages.
+
+    Can you shortly describe what Pokémon is?"
+```
+
+### ForceLastPhraseLogitsProcessor
+A logits processor which forces LLMs to use the given phrase before they finalize their answers. Most common use cases can be providing references, thanking user with context etc.
+```
+python example_notebooks/trtllm/last_phrase_logits_processor.py
+```
+
+### MultipleChoiceLogitsProcessor
+A logits processor to answer multiple choice questions with one of the choices.
+```
+python example_notebooks/trtllm/multiple_choice_logits_processor.py -p "I am getting a lot of calls during the day. What is more important for me to consider when I buy a new phone?
+0. Camera
+1. Screen resolution
+2. Operating System
+3. Battery"
+```
+
+### TriggerPhraseLogitsProcessor
+A logits processor which triggers phrases when it encounters a given token.
+```
+python example_notebooks/trtllm/trigger_phrase_logits_processor.py -p "Generate a python function to calculate nth fibonacci number. Make it recursive. Keep thinking short."
+```
+
+### PreventHallucinationLogitsProcessor
+A logits processor that mitigates hallucinated model outputs by enforcing a predefined fallback phrase when token confidence falls below a specified threshold.
+```
+python example_notebooks/trtllm/prevent_hallucination_logits_processor.py -p "What are Nobel Prizes? Name the winners in 1977"
 ```
-python example_notebooks/trtllm/gen_length_logits_processor.py --engine_path ../TensorRT-LLM/examples/llama/llama-engine/ --tokenizer_path ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/x/
-python example_notebooks/trtllm/multiple_choice_logits_processor.py --engine_path ../TensorRT-LLM/examples/llama/llama-engine/ --tokenizer_path ~/.cache/huggingface/hub/models--meta-llama--Llama-2-7b-chat-hf/snapshots/x/ --prompt "Which one is heavier?\n1. 1 kg\n2. 100 kg\n3. 10 kg\nAnswer:"
-```
diff --git a/example_notebooks/trtllm/cite_prompt_logits_processor.py b/example_notebooks/trtllm/cite_prompt_logits_processor.py
@@ -5,10 +5,12 @@
 
 if __name__ == "__main__":
     args = get_parser()
-    beam_width = 1
 
-    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_path)
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    llm_tester = TRTLLMTester(args.model_name)
 
-    lp = CiteFromPromptLogitsProcessor(tokenizer, [args.prompt], boost_factor=1.0)
+    lp = CiteFromPromptLogitsProcessor(tokenizer, boost_factor=1.0, boost_eos=False, conditional_boost_factor=3.0)
+    llm_tester.run([args.prompt], logits_processor=lp)
 
-    TRTLLMTester(lp, tokenizer, args).run(args.prompt, beam_width)
+    lp = CiteFromPromptLogitsProcessor(tokenizer, boost_factor=-1.0, boost_eos=False, conditional_boost_factor=-1.0)
+    llm_tester.run([args.prompt], logits_processor=lp)
diff --git a/example_notebooks/trtllm/gen_length_logits_processor.py b/example_notebooks/trtllm/gen_length_logits_processor.py
@@ -5,10 +5,12 @@
 
 if __name__ == "__main__":
     args = get_parser()
-    beam_width = 1
 
-    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_path)
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    llm_tester = TRTLLMTester(args.model_name)
 
     lp = GenLengthLogitsProcessor(tokenizer, boost_factor=1.0, complete_sentences=True)
+    llm_tester.run([args.prompt], logits_processor=lp)
 
-    TRTLLMTester(lp, tokenizer, args).run(args.prompt, beam_width)
+    lp = GenLengthLogitsProcessor(tokenizer, boost_factor=-1.0, p=0, complete_sentences=True)
+    llm_tester.run([args.prompt], logits_processor=lp)
diff --git a/example_notebooks/trtllm/last_phrase_logits_processor.py b/example_notebooks/trtllm/last_phrase_logits_processor.py
@@ -5,12 +5,11 @@
 
 if __name__ == "__main__":
     args = get_parser()
-    beam_width = 1
 
-    tokenizer = AutoTokenizer.from_pretrained(args.tokenizer_path)
+    tokenizer = AutoTokenizer.from_pretrained(args.model_name)
+    llm_tester = TRTLLMTester(args.model_name)
 
     phrase = "\n\nThanks for trying our application! If you have more questions about"
+    lp = ForceLastPhraseLogitsProcessor(phrase, tokenizer)
 
-    lp = ForceLastPhraseLogitsProcessor(phrase, tokenizer, batch_size=1)
-
-    TRTLLMTester(lp, tokenizer, args).run(args.prompt, beam_width)
+    llm_tester.run([args.prompt], logits_processor=lp)