Can I use tensor_parallel to inference for a GPTQ quantized model?

What should I do if I want to use tensor_parallel for a GPTQ quantized model([Llama-2-7b-Chat-GPTQ](https://huggingface.co/4bit/Llama-2-7b-Chat-GPTQ) for examlpe) to inference on 2 or more GPUs?

Currently, I am using AutoGPTQ to load the quantized model, and then use tp.tensor_parallel to make tensors distribute on diffenrence devices. But I am getting the following error: **TypeError: cannot pickle 'module' object**

Do you have any suggentions on this? Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Can I use tensor_parallel to inference for a GPTQ quantized model? #131

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions