The resulting InstructGPT models are much better at following instructions than GPT-3. They also make up facts less often, and show small decreases in toxic output generation. Our labelers prefer outputs from our 1.3B InstructGPT model over outputs from a 175B GPT-3 model, despite having more than 100x fewer parameters. At the same time, we show that we don’t have to compromise on GPT-3’s capabilities, as measured by our model’s performance on academic NLP evaluations.https://openai.com/blog/instruction-following/
It seems like the a 1.3B parameter model wouldn't require massive resources.
I understand training data could be hard to come by but is it in the realm of possiblity