enable llava on torchchat by Gasoonjia · Pull Request #1183 · pytorch/torchchat

Gasoonjia · 2024-09-24T03:33:12Z

This PR enable llava1.5 on torchchat, which is the first multi-modality model on torchchat.

How to play?

You can use --prompt as the flag for text input, and --image-prompt as image input.
e.g.

(torchchat) [ ~/torchchat (9e4350d7b)]$ python torchchat.py generate llava-1.5 --prompt "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image> What are the things I should be cautious about when I visit here? ASSISTANT:" --image-prompt ../view.jpg
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 5.16 seconds
-----------------------------------------------------------
Image prompts ['../view.jpg']
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER: <image> What are the things I should be cautious about when I visit here? ASSISTANT: When visiting this vibrant and fascinating place, one should be cautious about the potential for the area to be crowded or filled with tourists. This might lead to overcrowding and a loss of personal space. Additionally, the vibrant design and colorful patterns might be visually stimulating, so it's essential to be cautious while taking photographs, ensuring not to bump into other people or unintentionally obstruct their views. It's also important to be aware of your surroundings and belongings, as the brightness of the colors and intricate patterns can make them harder to spot. Lastly, it's always a good idea to respect the cultural and artistic value of such places, by refraining from touching or interacting with the artwork without permission or being mindful of your surroundings.2024-09-23:20:24:10,645 INFO     [generate.py:1031] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 185 tokens                 
Time for inference 1: 11.4913 sec total                 
Time to first token: 0.6280 sec with parallel prefill.                

      Total throughput: 16.1861 tokens/sec, 0.0618 s/token                 
First token throughput: 1.5923 tokens/sec, 0.6280 s/token                 
 Next token throughput: 17.0298 tokens/sec, 0.0587 s/token                     
2024-09-23:20:24:10,645 INFO     [generate.py:1042] 
Bandwidth achieved: 228.66 GB/s
2024-09-23:20:24:10,645 INFO     [generate.py:1046] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================

It can also handle input without image input:

(torchchat) [ ~/torchchat (9e4350d7b)]$ python torchchat.py generate llava-1.5 --prompt "A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER:  What are the things I should be cautious about when I visit Canada? ASSISTANT:"
Using device=cuda NVIDIA PG509-210
Loading model...
Time to load model: 5.50 seconds
-----------------------------------------------------------
A chat between a curious human and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the human's questions. USER:  What are the things I should be cautious about when I visit Canada? ASSISTANT: There are several things you should be cautious about when visiting Canada, including:

1. Health and safety: Canada has a generally safe environment, but as with any country, you should be mindful of your surroundings and take precautions to stay safe. This includes being cautious of pickpockets in crowded areas, watching out for traffic when crossing the street, and avoiding potential hazards in public spaces.
2. Weather: Canada has a varied climate, with different regions experiencing different weather conditions. Be prepared for unexpected changes in weather and dress appropriately for the climate you will be visiting.
3. Customs regulations: When bringing items into Canada, you must declare any goods that are subject to customs duty or tax. There are also restrictions on bringing certain items into the country, such as food and plants.
4. Language: While English and French are the official languages of Canada, not all2024-09-23:20:25:49,785 INFO     [generate.py:1031] 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~                
Generated 199 tokens                 
Time for inference 1: 13.8391 sec total                 
Time to first token: 0.5637 sec with parallel prefill.                

      Total throughput: 14.4518 tokens/sec, 0.0692 s/token                 
First token throughput: 1.7741 tokens/sec, 0.5637 s/token                 
 Next token throughput: 14.9901 tokens/sec, 0.0667 s/token                     
2024-09-23:20:25:49,786 INFO     [generate.py:1042] 
Bandwidth achieved: 204.16 GB/s
2024-09-23:20:25:49,786 INFO     [generate.py:1046] *** This first iteration will include cold start effects for dynamic import, hardware caches. ***

========================================

pytorch-bot · 2024-09-24T03:33:15Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1183

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit 937e7ed with merge base 2cf4016 ():

NEW FAILURES - The following jobs have failed:

pull / test-mps / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1
pull / test-mps-dtype / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Jack-Khuu

Reminder to test that Flamingo and 3.1 still work as expected

Also reminder that to test convert_hf_checkpoint you need to delete your download/conversion and rerun

Jack-Khuu · 2024-09-24T04:16:41Z

torchchat/cli/convert_hf_checkpoint.py


 from torchchat.model import ModelArgs

+def remap_llava_checkpoint(llava_ckpt):


Was this written inhouse?

I'm not pretty following your question.
This function is consumed by convert_llava_checkpoint to get remapped checkpoint.
I made this as an individual function to simply the logic

Jack-Khuu · 2024-09-24T04:18:01Z

torchchat/cli/convert_hf_checkpoint.py

@@ -21,9 +24,176 @@

 from torchchat.model import ModelArgs



Suggested change

"""

Llava Conversion Code

"""

Code comment blocks to help us move things around later

Jack-Khuu · 2024-09-24T04:18:40Z

torchchat/cli/convert_hf_checkpoint.py

+    tokenizer_path = model_dir / "tokenizer.model"
+    shutil.copy(tokenizer_files[0], tokenizer_path)
+



Suggested change

"""

Text-Only Conversion Code

"""

Jack-Khuu · 2024-09-24T04:22:48Z

torchchat/generate.py

+        if batch and self.model.config.model_type == ModelType.Llava:
+            context_len, next_token = next_token
+        else:
+            context_len, next_token = T, next_token


Suggested change

context_len, next_token = T, next_token

context_len = T

Jack-Khuu · 2024-09-24T04:24:04Z

torchchat/generate.py

+                encoded = batch["tokens"]
+            elif self.model.config.model_type == ModelType.Llava:
+                #TODO: double check the tokenizer.
+                def find_subtensor(tensor, target):


Jack-Khuu · 2024-09-24T04:37:26Z

torchchat/model.py

+    """Applies Rotary Position Embedding to the query and key tensors.
+
+    Args:
+        q (`torch.Tensor`): The query tensor.
+        k (`torch.Tensor`): The key tensor.
+        cos (`torch.Tensor`): The cosine part of the rotary embedding.
+        sin (`torch.Tensor`): The sine part of the rotary embedding.
+        unsqueeze_dim (`int`, *optional*, defaults to 1):
+            The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
+            sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
+            that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
+            k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
+            cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
+            the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
+    Returns:
+        `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.


Outdated comment?

Jack-Khuu · 2024-09-24T04:38:35Z

torchchat/utils/preprocessors.py

@@ -0,0 +1,80 @@
+import torch
+import torchvision as tv


lintrunner ordering

Jack-Khuu · 2024-09-24T04:39:19Z

torchchat/utils/preprocessors.py

+    padding with median RGB value to make a square, scaling, and normalizing.
+
+    Args:
+        img_address (str): Address of the local image file will be forwarded to the model.


Autogen'd comment?

Jack-Khuu · 2024-09-24T04:41:37Z

torchchat/model.py

@@ -919,6 +937,58 @@ def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor:
    return x_out2.type_as(x)


nit: Can move apply_rotary_emb so that it is sequentially after hf_apply_rotary_emb?

Mainly for keeping concepts together

I'd like to keep the current structure, with all HF rotary embedding functions grouped together and all previous embedding functions in a separate section.

Jack-Khuu · 2024-09-24T04:44:22Z

torchchat/generate.py

-            encoded = batch["tokens"]
+            assert len(images) == 1, "Only one image prompt is supported for now"
+
+            #TODO: updated encoded variable for multi-modality models to include image tokens.


Can you explain this to me?

Gasoonjia added 6 commits September 22, 2024 18:06

llava e2e 1/n

f52007e

2/n llava e2e init

32d969e

3/n llava e2e init

9e4350d

n/n llava e2e

72d7b96

remove extra debug files

dfe37b8

merge main

1834696

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 24, 2024

Gasoonjia added 2 commits September 23, 2024 20:34

remove extra debug code

8ecc2fa

remove extra print

a70d7b5

Gasoonjia requested review from Jack-Khuu, byjlw and vmpuri September 24, 2024 03:37

Gasoonjia changed the title ~~onboarding llava on torchchat~~ enable llava on torchchat Sep 24, 2024

remove input pos keyword in generate.py

937e7ed

Jack-Khuu approved these changes Sep 24, 2024

View reviewed changes


		from torchchat.model import ModelArgs

		def remap_llava_checkpoint(llava_ckpt):

+"""
+Llava Conversion Code
+"""

		tokenizer_path = model_dir / "tokenizer.model"
		shutil.copy(tokenizer_files[0], tokenizer_path)

+"""
+Text-Only Conversion Code
+"""

		@@ -919,6 +937,58 @@ def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor:
		return x_out2.type_as(x)

Conversation

Gasoonjia commented Sep 24, 2024

Uh oh!

pytorch-bot bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1183

❌ 2 New Failures

Uh oh!

Jack-Khuu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Sep 24, 2024 •

edited

Loading