You may have read the news of [Raspberry Pi AI Kit](https://www.raspberrypi.com/news/raspberry-pi-ai-kit-available-now-at-70/) (Hailo-8L) with 13 TOPS. There is another [blog](https://hacks.mozilla.org/2024/05/experimenting-with-local-alt-text-generation-in-firefox-nightly/) from Mozilla saying that new Firefox would have local AI ("182M parameters model using a Distilled version of GPT-2 alongside a Vision Transformer (ViT) image encoder.") JetBrains also stated somewhere that they have 100M-parameter model in the IDEs(?).
I read here that Phi-3 seems to have really good performance despite its small paramter size. I understand that VRAM or RAM often seem to be a bottleneck or an issue when it comes to generative AI, such as local LLM. So I was wondering that now or in the near future, would it be possible for edge devices, such as the RPi AI kit, to run language models of relatively small size, such as the distilled GPT-2, with vision and/or audio functionalities?
Also the Microsfot AI-PC with the Snapdragon X Elite seems to have 40-45 TOPS, while the Hailo-8L only has 13TOPS. From your experience, around how many TOPS are necessary and/or sufficient to run local AI, whether it's vision, audio, or NLP, to have good-enough speed?
Many thanks!