Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/torchchat/1183
Note: Links to docs will display an error until the docs builds have been completed. ❌ 2 New FailuresAs of commit 937e7ed with merge base 2cf4016 ( NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
Jack-Khuu
left a comment
There was a problem hiding this comment.
Reminder to test that Flamingo and 3.1 still work as expected
Also reminder that to test convert_hf_checkpoint you need to delete your download/conversion and rerun
|
|
||
| from torchchat.model import ModelArgs | ||
|
|
||
| def remap_llava_checkpoint(llava_ckpt): |
There was a problem hiding this comment.
Was this written inhouse?
There was a problem hiding this comment.
I'm not pretty following your question.
This function is consumed by convert_llava_checkpoint to get remapped checkpoint.
I made this as an individual function to simply the logic
| @@ -21,9 +24,176 @@ | |||
|
|
|||
| from torchchat.model import ModelArgs | |||
|
|
|||
There was a problem hiding this comment.
| """ | |
| Llava Conversion Code | |
| """ |
There was a problem hiding this comment.
Code comment blocks to help us move things around later
| tokenizer_path = model_dir / "tokenizer.model" | ||
| shutil.copy(tokenizer_files[0], tokenizer_path) | ||
|
|
||
|
|
There was a problem hiding this comment.
| """ | |
| Text-Only Conversion Code | |
| """ |
| if batch and self.model.config.model_type == ModelType.Llava: | ||
| context_len, next_token = next_token | ||
| else: | ||
| context_len, next_token = T, next_token |
There was a problem hiding this comment.
| context_len, next_token = T, next_token | |
| context_len = T |
| encoded = batch["tokens"] | ||
| elif self.model.config.model_type == ModelType.Llava: | ||
| #TODO: double check the tokenizer. | ||
| def find_subtensor(tensor, target): |
| """Applies Rotary Position Embedding to the query and key tensors. | ||
|
|
||
| Args: | ||
| q (`torch.Tensor`): The query tensor. | ||
| k (`torch.Tensor`): The key tensor. | ||
| cos (`torch.Tensor`): The cosine part of the rotary embedding. | ||
| sin (`torch.Tensor`): The sine part of the rotary embedding. | ||
| unsqueeze_dim (`int`, *optional*, defaults to 1): | ||
| The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and | ||
| sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note | ||
| that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and | ||
| k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes | ||
| cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have | ||
| the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2. | ||
| Returns: | ||
| `tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding. |
| @@ -0,0 +1,80 @@ | |||
| import torch | |||
| import torchvision as tv | |||
| padding with median RGB value to make a square, scaling, and normalizing. | ||
|
|
||
| Args: | ||
| img_address (str): Address of the local image file will be forwarded to the model. |
| @@ -919,6 +937,58 @@ def apply_rotary_emb(x: Tensor, freqs_cis: Tensor) -> Tensor: | |||
| return x_out2.type_as(x) | |||
There was a problem hiding this comment.
nit: Can move apply_rotary_emb so that it is sequentially after hf_apply_rotary_emb?
Mainly for keeping concepts together
There was a problem hiding this comment.
I'd like to keep the current structure, with all HF rotary embedding functions grouped together and all previous embedding functions in a separate section.
| encoded = batch["tokens"] | ||
| assert len(images) == 1, "Only one image prompt is supported for now" | ||
|
|
||
| #TODO: updated encoded variable for multi-modality models to include image tokens. |
There was a problem hiding this comment.
Can you explain this to me?
This PR enable llava1.5 on torchchat, which is the first multi-modality model on torchchat.
How to play?
You can use
--promptas the flag for text input, and--image-promptas image input.e.g.
It can also handle input without image input: