Skip to content

xFormers memory-efficient attention not supported on CPU  #310

@Wasiq1123

Description

@Wasiq1123

I’m trying to run Depth-Anything-V2 with xFormers on my system (CPU only).
I get the following error:

NotImplementedError: No operator found for memory_efficient_attention_forward with inputs:
query : shape=(1, 1531, 6, 64) (torch.float32)
key : shape=(1, 1531, 6, 64) (torch.float32)
value : shape=(1, 1531, 6, 64) (torch.float32)
attn_bias : <class 'NoneType'>
p : 0.0
fa3F@2.8.3-133-gde1584b is not supported because:
device=cpu (supported: {'cuda'})
dtype=torch.float32 (supported: {torch.bfloat16, torch.float16})

It seems that memory-efficient attention in xFormers requires CUDA.

My environment:

  • PyTorch 2.5.7 / 2.8.3
  • xFormers installed from pip
  • Running on CPU only (no GPU available)
  • Python 3.10, Ubuntu 22.04

Question:
Is there a way to run Depth-Anything-V2 on CPU without a GPU, or do I have to disable memory-efficient attention? How can I fix this error on CPU?

Below is My code

#If this file give error of importing package then run it in this directory /testing_model/depth_models/src/Depth-Anything-V2

import cv2
import torch
import sys
sys.path.append('/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2')
from depth_anything_v2.dpt import DepthAnythingV2

model_configs = {
'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]},
'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]}
}

encoder = 'vitb' # or 'vits', 'vitb'
dataset = 'hypersim' # 'hypersim' for indoor model, 'vkitti' for outdoor model
max_depth = 20 # 20 for indoor model, 80 for outdoor model

model = DepthAnythingV2(**{**model_configs[encoder], 'max_depth': max_depth})

model = DepthAnythingV2(**{**model_configs[encoder]})
model.load_state_dict(torch.load(f'/home/wasiq/testing_model/depth_models/src/Depth-Anything-V2/metric_depth/checkpoints/depth_anything_v2_metric_{dataset}_{encoder}.pth', map_location='cpu'))
model.eval()

raw_img = cv2.imread('your/image/path')
depth = model.infer_image(raw_img) # HxW depth map in meters in numpy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions