I did ONNX runtime inference on runpod.io so we pay per seconds. I know it is theoretically possible to cut the cost much more, but I am struggling with the amount of experiments I can do.
I wonder if there is anyone who could help me figure out low level GPU nvdidia optimisation stuff?
Please leave a DM here if you feel like you have expertise and can help! https://x.com/karmedge