-
Notifications
You must be signed in to change notification settings - Fork 775
Description
I’m trying to run Depth-Anything-V2 with xFormers on my system (CPU only).
I get the following error:
NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 1531, 6, 64) (torch.float32)
key : shape=(1, 1531, 6, 64) (torch.float32)
value : shape=(1, 1531, 6, 64) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
fa3F@2.8.3-133-gde1584b is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})
It seems that memory-efficient attention in xFormers requires CUDA.
My environment:
- PyTorch 2.5.7 / 2.8.3
- xFormers installed from pip
- Running on CPU only (no GPU available)
- Python 3.10, Ubuntu 22.04
Question:
Is there a way to run Depth-Anything-V2 on CPU without a GPU, or do I have to disable memory-efficient attention? How can I fix this error on CPU?
Below is My code
#If this file give error of importing package then run it in this directory /testing_model/depth_models/src/Depth-Anything-V2
import cv2
import torch
import sys
sys.path.append('/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2')
from depth_anything_v2.dpt import DepthAnythingV2
model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]}
}
encoder = 'vitb' # or 'vits', 'vitb'
dataset = 'hypersim' # 'hypersim' for indoor model, 'vkitti' for outdoor model
max_depth = 20 # 20 for indoor model, 80 for outdoor model
model = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})
model = DepthAnythingV2(**{**model_configs[encoder]})
model.load_state_dict(torch.load(f'/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2/metric_depth/checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth', map_location='cpu'))
model.eval()
raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW depth map in meters in numpy