* CCX13: Dedicated vCPU, 2 VCPU, 8 GB RAM
* CX32: Shared vCPU, 4 VCPU, 8 GB RAM
Now there are multiple options for deploying and serving LLMs:
* lmdeploy
* text-generation-inference
* TensorRT-LLM
* vllm
There are more and more new frameworks for this. I am a bit lost. Would you suggest the best option for deploying the above-listed model (No-GPU hardware)?
[1] https://huggingface.co/MoritzLaurer/roberta-large-zeroshot-v2.0-c
[2] https://www.hetzner.com/cloud/