- API solutions: I tried https://openrouter.ai to get access to llama-2-70b-chat models but it was so slow (high latency) that I gave up.
- On my MacBook Pro with M1 Pro chip, I can only run models up to 34B, but the inference speed is not great.
- The Mac Studio with M2 Ultra costs around $7000 after tax.
> It's not upgradable but I think it's quite future-proof already with 192GB unified memory, no?
> I won't be able to run games on it but I'm not much of a gamer anyway.
> It weighs almost 8 pounds, meaning that I can carry it to work if I want to.
> It's energy-efficient and doesn't make me hate electricity...
> It's mostly compatible with llama.cpp, so no CUDA support (no exl2 or GPTQ).
> I might want to finetune/train models in the future. Is it possible to do LORA/QLORA on Mac?
- On the other hand, a PC:
> Is upgradable, but the question is: at what cost? If I want to add more VRAM I'll have to buy GPUs that cost between $1000-$2000.
> Draws so much power, esp. with multiple GPUs, so I'll have to keep it at work and SSH into it.
> The case will be heavy and I can't just carry it to places.
> I get to run games on it if I want.
> But even with 2x4090s I get 48GB VRAM, way less than 192GB on the Mac.
> I get full CUDA support for ML and finetuning.
> More hassle to setup, configure, and maintain (esp. if I use Linux) compared to Mac which works OOTB.
- I've also tried cloud GPUs but the costs quickly add up. A100s are basically gone, and the rest are so-so. Since I can't let the VM run 24/7, I have to configure the VM every single time I want to run something on GPU, which takes around 30-40 minutes (including downloading the 70B models...)
I appreciate any comments you have about what I should do... Thanks!