Rather than creating the Vec in the inner loop, you can create it in the outer loop using with_capacity with capacity of batch size so it will not allocate in inner loop, then just reuse the Vec.
Once the execute is used, can just Vec clear to clear the items without deallocating memory. I wonder how much faster will it make.
Rather than creating the Vec in the inner loop, you can create it in the outer loop using with_capacity with capacity of batch size so it will not allocate in inner loop, then just reuse the Vec.
Once the execute is used, can just Vec clear to clear the items without deallocating memory. I wonder how much faster will it make.