Ask HN: Do we need 100B+ parameters in a large language model?

4 points

3 years ago

Cerebras-GPT, DataBricks's Dolly performs reasonably well on many instruction-based tasks while being significantly smaller than GPT-3, challenging the notion that is big always better!

From my personal experience, the quality of the model depends a lot on the fine-tuning data as opposed to just the sheer size. If you choose your retraining data correctly, you can fine-tune your smaller model to perform better than the state-of-the-art GPT-X. The future of LLMs might look more open-source than imagined 3 months back!

Would love to hear everyone's opinions on how they see the future of LLMs evolving? Will it be few players (OpenAI) cracking the AGI and conquering the whole world or a lot of smaller open-source models which ML engineers fine-tune for their use-cases?

P.S. I am kinda betting on the latter and building UpTrain (https://github.com/uptrain-ai/uptrain), an open-source project which helps you collect that high quality fine-tuning dataset

1 comment