I trained my model last year, with ML Model Builder and ML.NET 1.7 with a set of images annotated with VOTT. The training was in Azure because at that time it wasn't possible to train locally. Then I produced the console sample application that loads my ONNX MLModel1.zip. With a reference image, after the first load, it takes 0.58s on my current machine to detect the objects. My computer has GPU but I assume it's not used at this stage.
I retrained my model now, this time locally, with my GPU, with the same image set and annotations, and a file MLModel1.mlnet is created, and the sample console app uses TorchSharp instead of Onnx. If I test (after first load) with mlContext.GpuDeviceId = null; mlContext.FallbackToCpu = true; now it takes 2.1s If I test (after first load) with mlContext.GpuDeviceId = 0; mlContext.FallbackToCpu = false; now it takes 0.68s
The older version, even without using GPU, is faster.
The new version, with TorchSharp, is slower, with CPU and GPU, and the resulting application size is huge, compared to the ONNX version.
What am I missing or doing wrong?