I think I found 2 mistakes

### First mistake
When we want to use kwargs like this, this code doesn't take into account kwargs and just uses some default kwargs.
```python
def calculate_flops(model,
                    input_shape=None,
                    transformer_tokenizer=None,
                    args=[],   
                    kwargs={
                       'input_ids': input_ids,
                    },
                    forward_mode="forward",
                    include_backPropagation=False,
                    compute_bp_factor=2.0,         
                    print_results=True,
                    print_detailed=True,
                    output_as_string=True,
                    output_precision=2,
                    output_unit=None,
                    ignore_modules=None):
```
To fix this, I needed to change the code in this file `calflops\flops_counter.py`: From line 147:
```python
        if transformer_tokenizer:
            kwargs = generate_transformer_input(input_shape=None,
                                                model_tokenizer=transformer_tokenizer,
                                                device=device)
```
Here we create new kwargs and we forgot about our kwargs from the parameters of calculate_flops. I did the following:
```python
if transformer_tokenizer:
            pass
```
And then instead of:
```python
    if kwargs:
        for key, value in kwargs.items():
            kwargs[key] = value.to(device)
```
I wrote:
```python
    if kwargs:
        for key, value in kwargs.items():
            print(value)
            try:
                kwargs[key] = value.to(device)
            except AttributeError:
                pass
```
Now it seems to be OK.

### Second mistake
In `gpt2-small` and `gpt2-xl`, this code worked fine. But in any LLaMa model, it gives errors.
```python
flops_single, macs_single, params_single = calculate_flops(model=model,
                    kwargs={
                       'input_ids': input_ids_single,
                       'attention_mask': attention_mask_single,
                       'position_ids': position_ids_single,
                       'max_new_tokens': 10,
                    },
                    forward_mode='generate',
                    transformer_tokenizer=tokenizer,
                    include_backPropagation=False,
                    compute_bp_factor=2.0,
                    print_results=True,
                    print_detailed=True,
                    output_as_string=True,
                    output_precision=2,
                    output_unit=None,
                    ignore_modules=None)
```
So, we should use it like this, namely, without `'attention_mask'` and `'position_ids'`:
```python
flops_single, macs_single, params_single = calculate_flops(model=model,
                    kwargs={
                       'input_ids': input_ids_single,
                       # 'attention_mask': attention_mask_single,
                       # 'position_ids': position_ids_single,
                       'max_new_tokens': 10,
                    },
                    forward_mode='generate',
                    transformer_tokenizer=tokenizer,
                    include_backPropagation=False,
                    compute_bp_factor=2.0,
                    print_results=True,
                    print_detailed=True,
                    output_as_string=True,
                    output_precision=2,
                    output_unit=None,
                    ignore_modules=None)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

I think I found 2 mistakes #53

First mistake

Second mistake

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

I think I found 2 mistakes #53

Description

First mistake

Second mistake

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions