TL;DR
When I construct an OpenVINO model consisting of just a single, moderately sized square matmul operation (in addition to the parameter and result ops), in order to launch on the NPU, the compile step takes really long.
The problem
I construct the model as follows:
def make_model (n):
dtype = ov.Type ('float16')
size = (n, n)
A = ops.parameter (size, dtype, name = "A")
B = ops.parameter (size, dtype, name = "B")
C = ops.matmul (A, B, False, False)
res = ops.result (C)
return ov.Model ([res], [A, B], "matmul")
And I compile it as follows:
compiled_model = core.compile_model (make_model (11264), "NPU")
Some numbers:
- n=11264 (i.e. matmul with 11264x11264 matrices) takes 10 minutes to compile
- n=12288 takes 23 minutes
- n=13312 takes 114 minutes
After compilation, inference (i.e. running the matmul on random inputs) is as quick as I'd expect it to be. Subsequent runs for a single matrix size spend no time on compilation, probably since they are satisfied by the NPU model cache.
I've tried raising the thread limit for the compiler, but it still runs on a single thread.
System info
TL;DR
When I construct an OpenVINO model consisting of just a single, moderately sized square matmul operation (in addition to the parameter and result ops), in order to launch on the NPU, the compile step takes really long.
The problem
I construct the model as follows:
And I compile it as follows:
Some numbers:
After compilation, inference (i.e. running the matmul on random inputs) is as quick as I'd expect it to be. Subsequent runs for a single matrix size spend no time on compilation, probably since they are satisfied by the NPU model cache.
I've tried raising the thread limit for the compiler, but it still runs on a single thread.
System info