So I built willitrun, a small CLI that tries to answer that upfront.
It checks whether a model is likely to fit and run on a given device. When benchmark data exists, it uses that first; otherwise it falls back to a lightweight estimate. Currently covers 482 benchmarks across 88 devices (desktop GPUs, server hardware, Apple Silicon, and NVIDIA Jetson) with HuggingFace model name resolution built in.
Right now the goal is not to be perfect, but to be useful enough to avoid obviously bad choices before spending time downloading or testing models manually. It's also useful for edge devices like a Jetson Orin because you can check performance without physically accessing the hardware.
Most public benchmarks focus on LLMs, but out of personal interest I tried to include other categories as well.
I would be very interested in feedback, especially around cases where the estimates are off or where benchmark coverage is missing.