vLLM-mlx – 65 tok/s LLM inference on Mac with tool calling and prompt caching | Heykuki News