NVIDIA
diff --git a/‎example_notebooks/transformers/cite_prompt_logits_processor.ipynb‎
Lines changed: 28 additions & 19 deletions b/‎example_notebooks/transformers/cite_prompt_logits_processor.ipynb‎
Lines changed: 28 additions & 19 deletions
diff --git a/‎example_notebooks/vllm/cite_prompt_logits_processor.ipynb‎
Lines changed: 32 additions & 55 deletions b/‎example_notebooks/vllm/cite_prompt_logits_processor.ipynb‎
Lines changed: 32 additions & 55 deletions
@@ -23,7 +23,15 @@
    "execution_count": 2,
    "id": "a85f8503",
    "metadata": {},
-   "outputs": [],
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.\n"
+     ]
+    }
+   ],
    "source": [
     "from example_notebooks.transformers.utils import LLMRunner\n",
     "from logits_processor_zoo.transformers import CiteFromPromptLogitsProcessor\n",
@@ -136,7 +144,12 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "The user seems to have mixed feelings about the price of the product. They find it expensive, but they also appreciate its softness, colorfulness, and style.\n",
+      "The user's opinion about the product's price is mixed. They describe it as \"expensive,\" which could be interpreted in two ways:\n",
+      "\n",
+      "1. The user might consider the price to be high for its quality and features.\n",
+      "2. Alternatively, they may appreciate the product's price, considering its stylish design or unique qualities.\n",
+      "\n",
+      "Without more context, it's difficult to determine the user's opinion definitively.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -149,7 +162,7 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "A Pokémon is a fictional creature in the Pokémon franchise, which is a Japanese media franchise consisting of video games, animated series, films, a trading card game, and other related media. The franchise takes place in a shared universe where humans coexist with Pokémon, a large variety of species endowed with special powers. The franchise's target audience is children aged 5 to 12, but it is known to attract people of all ages.\n",
+      "A Pokémon is a fictional creature in the Pokémon franchise, a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media. The franchise takes place in a shared universe in which humans co-exist with creatures known as Pokémon, a large variety of species endowed with special powers. The franchise's target audience is children aged 5 to 12, but it is known to attract people of all ages.\n",
       "-----END-----\n",
       "\n"
      ]
@@ -158,7 +171,8 @@
    "source": [
     "runner.generate_response(\n",
     "    example_prompts,\n",
-    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=2.0, boost_eos=False)]\n",
+    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=2.0, boost_eos=False,\n",
+    "                                   conditional_boost_factor=2.0)]\n",
     ")"
    ]
   },
@@ -187,17 +201,9 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "The reviewer seems to have mixed feelings towards the pricing of the product:\n",
-      "\n",
-      "- They describe it as \"very soft\" and \"colorful\", suggesting that they appreciate these qualities.\n",
+      "The reviewer seems to have mixed feelings about the pricing of the product. They describe it as \"expensive\" and \"deserves its price,\" which suggests that they find the high cost justified by the quality or value of the item. The use of words like \"stylish\" further emphasizes their positive impression of the style and design of the product.\n",
       "\n",
-      "- They also mention that it is \"expensive,\" which might be seen as negative if you're looking for an affordable option or if this was their first time buying something like this.\n",
-      "\n",
-      "- However, they state that it \"deserves its price,\" indicating that they believe the high cost reflects on quality or value.\n",
-      "\n",
-      "Overall, while they seem satisfied with the overall experience and don't mind paying more for what they perceive as good-quality materials and design, they may feel that the price point could be higher than expected for everyday use or budget-conscious shoppers.\n",
-      "\n",
-      "So in summary, they find the item to be well-made and aesthetically pleasing despite feeling that it might not be suitable for everyone due to being too pricey for some people's budgets. The reviewer seems generally positive toward the purchase decision itself rather than just the specific item.\n",
+      "So in summary, while the reviewer finds the price somewhat high, they believe it is worth the investment due to the overall quality and style of the item. This indicates an average-to-good level of satisfaction with the purchase experience.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -210,11 +216,13 @@
       "    \n",
       "\n",
       "LLM response:\n",
-      "A Pokémon is an imaginary creature that exists within the fictional world of the Pokémon franchise. These creatures have unique abilities or characteristics that allow them to interact with their environment or engage in battles against each other.\n",
+      "A Pokémon is an imaginary creature that exists within the fictional world of the Pokémon franchise. These creatures have unique abilities or characteristics that allow them to interact with their environment and engage in various activities.\n",
+      "\n",
+      "Pokémon can be found throughout the vast landscapes depicted in the franchise, including forests, mountains, rivers, cities, and even outer space. They come in different sizes, shapes, colors, and types (e.g., water-type, fire-type). Each Pokémon has its own distinct personality and backstory.\n",
       "\n",
-      "Pokémon can be found throughout various regions on Earth (or \"Gym Leaders\") where trainers can catch them using Poké Balls or capture them through battle encounters. Each type of Pokémon has specific attributes such as strength, speed, or ability to use certain moves.\n",
+      "The concept of Pokémon originated from Japan in the early 1990s when Satoshi Tajiri developed the first generation of the Pokémon Red and Blue games for Nintendo Entertainment System (NES). Since then, numerous generations of Pokémon games have been released across multiple platforms, expanding the world of Pokémon into anime, manga, movies, TV shows, books, trading cards, and more.\n",
       "\n",
-      "The concept behind Pokémon revolves around the idea that these magical beings exist alongside human society, providing entertainment through gameplay experiences like battling or collecting cards representing different types of Pokémon. This blend of fantasy elements combined with interactive gaming mechanics makes the Pokémon franchise appealing across diverse age groups due to its accessibility for young audiences while also attracting older fans who appreciate the depth and complexity of the character designs and storylines.\n",
+      "In summary, Pokémon are magical beings that exist alongside human characters in the Pokémon universe, each possessing unique traits and abilities that make them fascinating subjects for storytelling and gaming experiences.\n",
       "-----END-----\n",
       "\n"
      ]
@@ -223,7 +231,8 @@
    "source": [
     "runner.generate_response(\n",
     "    example_prompts,\n",
-    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=-2.0, boost_eos=False)]\n",
+    "    [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=-1.0, boost_eos=False,\n",
+    "                                  conditional_boost_factor=-1.0)]\n",
     ")"
    ]
   },
@@ -252,7 +261,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.13"
+   "version": "3.10.17"
   }
  },
  "nbformat": 4,
 
@@ -28,45 +28,22 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "WARNING 04-30 15:00:30 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead. See https://pypi.org/project/pynvml for more information.\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "/home/aerdem/projects/LLM/llmenv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
-      "  warnings.warn(\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "WARNING 04-30 15:00:33 config.py:1563] Casting torch.bfloat16 to torch.float16.\n",
-      "INFO 04-30 15:00:33 llm_engine.py:184] Initializing an LLM engine (v0.5.5) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, use_v2_block_manager=False, enable_prefix_caching=False)\n"
-     ]
-    },
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.\n"
-     ]
-    },
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "INFO 04-30 15:00:34 model_runner.py:879] Starting to load model Qwen/Qwen2.5-1.5B-Instruct...\n",
-      "INFO 04-30 15:00:34 weight_utils.py:236] Using model weights format ['*.safetensors']\n",
-      "INFO 04-30 15:00:35 weight_utils.py:280] No model.safetensors.index.json found in remote.\n"
+      "INFO 05-22 14:12:47 [__init__.py:239] Automatically detected platform cuda.\n",
+      "WARNING 05-22 14:12:50 [config.py:2972] Casting torch.bfloat16 to torch.float16.\n",
+      "INFO 05-22 14:12:55 [config.py:717] This model supports multiple tasks: {'reward', 'generate', 'classify', 'score', 'embed'}. Defaulting to 'generate'.\n",
+      "WARNING 05-22 14:12:55 [cuda.py:93] To see benefits of async output processing, enable CUDA graph. Since, enforce-eager is enabled, async output processor cannot be used\n",
+      "INFO 05-22 14:12:55 [llm_engine.py:240] Initializing a V0 LLM engine (v0.8.5.post1) with config: model='Qwen/Qwen2.5-1.5B-Instruct', speculative_config=None, tokenizer='Qwen/Qwen2.5-1.5B-Instruct', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=auto, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=True, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='auto', reasoning_backend=None), observability_config=ObservabilityConfig(show_hidden_metrics=False, otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=None, served_model_name=Qwen/Qwen2.5-1.5B-Instruct, num_scheduler_steps=1, multi_step_stream_outputs=True, enable_prefix_caching=None, chunked_prefill_enabled=False, use_async_output_proc=False, disable_mm_preprocessor_cache=False, mm_processor_kwargs=None, pooler_config=None, compilation_config={\"splitting_ops\":[],\"compile_sizes\":[],\"cudagraph_capture_sizes\":[],\"max_capture_size\":0}, use_cached_outputs=False, \n",
+      "INFO 05-22 14:12:56 [cuda.py:292] Using Flash Attention backend.\n",
+      "INFO 05-22 14:12:57 [parallel_state.py:1004] rank 0 in world size 1 is assigned as DP rank 0, PP rank 0, TP rank 0\n",
+      "INFO 05-22 14:12:57 [model_runner.py:1108] Starting to load model Qwen/Qwen2.5-1.5B-Instruct...\n",
+      "INFO 05-22 14:12:57 [weight_utils.py:265] Using model weights format ['*.safetensors']\n",
+      "INFO 05-22 14:12:58 [weight_utils.py:315] No model.safetensors.index.json found in remote.\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "e9c350b056a04694bf4f2eade35244ba",
+       "model_id": "d29121d7259a47f5923ef4d1b3fa3138",
        "version_major": 2,
        "version_minor": 0
       },
@@ -81,8 +58,14 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "INFO 04-30 15:00:36 model_runner.py:890] Loading model weights took 2.8875 GB\n",
-      "INFO 04-30 15:00:38 gpu_executor.py:121] # GPU blocks: 37541, # CPU blocks: 9362\n"
+      "INFO 05-22 14:12:58 [loader.py:458] Loading weights took 0.57 seconds\n",
+      "INFO 05-22 14:12:58 [model_runner.py:1140] Model loading took 2.8876 GiB and 1.613375 seconds\n",
+      "INFO 05-22 14:13:00 [worker.py:287] Memory profiling takes 1.73 seconds\n",
+      "INFO 05-22 14:13:00 [worker.py:287] the current vLLM instance can use total_gpu_memory (23.66GiB) x gpu_memory_utilization (0.90) = 21.29GiB\n",
+      "INFO 05-22 14:13:00 [worker.py:287] model weights take 2.89GiB; non_torch_memory takes 0.06GiB; PyTorch activation peak memory takes 2.02GiB; the rest of the memory reserved for KV Cache is 16.32GiB.\n",
+      "INFO 05-22 14:13:00 [executor_base.py:112] # cuda blocks: 38207, # CPU blocks: 9362\n",
+      "INFO 05-22 14:13:00 [executor_base.py:117] Maximum concurrency for 32768 tokens per request: 18.66x\n",
+      "INFO 05-22 14:13:02 [llm_engine.py:437] init engine (profile, create kv cache, warmup model) took 3.24 seconds\n"
      ]
     }
    ],
@@ -93,8 +76,7 @@
     "\n",
     "example_prompts =[\n",
     "    \"\"\"\n",
-    "    A user review: very soft, colorful, expensive but deserves its price.\n",
-    "    I would like to wear it in my friend's wedding.\n",
+    "    A user review: very soft, colorful, expensive but deserves its price, stylish.\n",
     "    \n",
     "    What is the user's opinion about the product's price?\n",
     "    \"\"\",\n",
@@ -130,12 +112,11 @@
      "output_type": "stream",
      "text": [
       "Prompt: \n",
-      "    A user review: very soft, colorful, expensive but deserves its price.\n",
-      "    I would like to wear it in my friend's wedding.\n",
+      "    A user review: very soft, colorful, expensive but deserves its price, stylish.\n",
       "    \n",
       "    What is the user's opinion about the product's price?\n",
       "    \n",
-      "The user's opinion about the product's price is that it is expensive, but they believe it is worth the price.\n",
+      "The user's opinion about the product's price is that it is expensive, but they believe it is worth the price due to its softness, colorfulness, and stylish design.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -175,12 +156,11 @@
      "output_type": "stream",
      "text": [
       "Prompt: \n",
-      "    A user review: very soft, colorful, expensive but deserves its price.\n",
-      "    I would like to wear it in my friend's wedding.\n",
+      "    A user review: very soft, colorful, expensive but deserves its price, stylish.\n",
       "    \n",
       "    What is the user's opinion about the product's price?\n",
       "    \n",
-      "The user's opinion about the product's price is that it is expensive, but the user is willing to pay the price to wear it in a friend's wedding.\n",
+      "The user's opinion about the product's price is that it is expensive but deserves its price, stylish.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -191,15 +171,16 @@
       "    \n",
       "    Can you shortly describe what Pokémon is?\n",
       "    \n",
-      " Pokémon is a Japanese media franchise consisting of video games, animated series, and films. The franchise takes place in a shared universe in which humans co-exist with Pokémon, a large variety of species endowed with special powers. The franchise's target audience is children aged 5 to 12, but it is known to attract people of all ages.\n",
+      "Pokémon is a Japanese media franchise consisting of video games, animated series and films, a trading card game, and other related media. The franchise takes place in a shared universe in which humans co-exist with creatures known as Pokémon, a large variety of species endowed with special powers. The franchise's target audience is children aged 5 to 12, but it is known to attract people of all ages.\n",
       "-----END-----\n",
       "\n"
      ]
     }
    ],
    "source": [
     "runner.generate_response(example_prompts,\n",
-    "                         [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=5.0, boost_eos=False)])"
+    "                         [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=1.0, boost_eos=False,\n",
+    "                                                        conditional_boost_factor=3.0)])"
    ]
   },
   {
@@ -221,16 +202,11 @@
      "output_type": "stream",
      "text": [
       "Prompt: \n",
-      "    A user review: very soft, colorful, expensive but deserves its price.\n",
-      "    I would like to wear it in my friend's wedding.\n",
+      "    A user review: very soft, colorful, expensive but deserves its price, stylish.\n",
       "    \n",
       "    What is the user's opinion about the product's price?\n",
       "    \n",
-      "The user's opinion about the product's price seems to be mixed. They appreciate that the product is \"very soft\" and \"colorful,\" indicating that these features contribute positively to their satisfaction with the item. However, they also mention that the product is \"expensive,\" which suggests that they feel the price is justified based on the quality they perceive.\n",
-      "\n",
-      "The phrase \"deserves its price\" implies that the user believes the cost of the product is appropriate for what they have received. This indicates that they find value in the product and feel that they are getting good value for their money.\n",
-      "\n",
-      "In summary, while the user appreciates the product's qualities and finds them worth the price, they also acknowledge that the cost is higher than they might have expected for such features. This suggests that they view the product as a good investment for their needs and preferences.\n",
+      "The user's opinion about the product's price is that it is expensive, but they believe it is worth the cost due to its softness, colorfulness, and style.\n",
       "-----END-----\n",
       "\n",
       "Prompt: \n",
@@ -241,15 +217,16 @@
       "    \n",
       "    Can you shortly describe what Pokémon is?\n",
       "    \n",
-      "Pokémon is a popular Japanese media franchise that features a world where humans live alongside magical creatures called Pokémon. These Pokémon have unique abilities that allow them to fight alongside humans in various adventures. The franchise includes video games, animated series, films, trading cards, and other forms of media aimed at children aged 5 to 12, though it has also gained popularity among adults.\n",
+      "Pokémon is a popular Japanese media franchise that features a world where humans live alongside Pokémon, mythical creatures with unique abilities. It targets children aged 5 to 12 but has broad appeal across all age groups.\n",
       "-----END-----\n",
       "\n"
      ]
     }
    ],
    "source": [
     "runner.generate_response(example_prompts,\n",
-    "                         [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=-2.0, boost_eos=False)])"
+    "                         [CiteFromPromptLogitsProcessor(runner.tokenizer, boost_factor=-1.0, boost_eos=False,\n",
+    "                                                        conditional_boost_factor=-1.0)])"
    ]
   },
   {