We recently did a major rework of Paddler (pretty much rewrote the llama-server from llama.cpp to make it distributed and scalable). It also supports live model swapping, custom chat templates, and other useful stuff.
Show HN: Paddler – open-source LLMOps platform for hosting AI in your own infra | Heykuki News