Apple predicted the rise of local LLMs, hence the M2 Ultra

2 points

3 years ago

When Mac Studio was announced, my first question was: Why cram so much power in a device when no software needs it? I'm talking about the 192GB RAM—when was the last time you needed that?

Apple was (and is) not a player in desktop gaming and it shows—the M series are amazing in terms of power use, but not comparable to dedicated nvidia GPUs, so, again, why 192GB RAM?

It was until recently when I read this [0] that I realized Apple might have very well predicted the rise of local LLMs and most likely has its own in-house LLM in the works. In [0], the GG in GGML and GGUF explains that "The M2 Ultra is the absolute best personal LLM inference node that you can buy today."

And remember: llama.cpp is not even taking full advantage of the modern APIs Apple announced. Still, it achieves amazing inference speeds. Imagine the day when llama.cpp goes all Apple and integrates their API...

Apple well knew that local LLMs are the future, and they predicted that running a small model (up to 34B) should not require much processing power (so the M series need not be comparable to nvidia RTX GPUs) and this translates directly to much lower power consumption. But the bottleneck that Apple foresaw was the amount of graphic memory that these models are going to need. So it's no wonder that you see high RAM and low GPU power in the M series.

[0]: https://github.com/ggerganov/llama.cpp/discussions/3026

3 comments