Other notable features include many more downloadable models, support for App Intents (Siri, Apple Shortcuts), on-device grammar correction, summarization etc with macOS services and an iOS version (universal app), also with many smaller downloadable models and support for App Intents. There's a small community of users building and sharing LLM based shortcuts on the App's discord.
Last week, I also shipped support for the bilingual Yi-34B Chat model, which consumes ~18GB of RAM. iOS users and users with low memory Macs can download the related Yi-6B Chat model.
Unlike most popular offline LLM apps out there, this app uses mlc-llm for inference and not llama.cpp. Also, all models in the app are quantized with OmniQuant[4] quantization and not RTN quantization.
[1]: https://privatellm.app/
[2]: https://apps.apple.com/us/app/private-llm-local-ai-chatbot/id6448106860
[3]: https://www.youtube.com/watch?v=4AE8yXIWSAA
[4]: https://arxiv.org/abs/2308.13137