Ask HN: If you finetune GPT-3, are you cheating?

2 points

5 years ago

In its annual Build Conference, Microsoft announced its first GPT-3-powered application, a code generator for the Power Fx data query language.

According to the Microsoft Blog, “For instance, the new AI-powered features will allow an employee building an e-commerce app to describe a programming goal using conversational language like... A fine-tuned GPT-3 model then offers choices for transforming the command into a Microsoft Power Fx formula, the open source programming language of the Power Platform.”

The question is, wasn't GPT-3 supposed to be an all-purpose language model that can perform multiple tasks without being finetuned? From the GPT-3 paper's abstract: “Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art finetuning approaches.”

There isn't much detail on what Microsoft means by "fine-tuned GPT-3 model," but a few scenarios come to mind:

1- GPT-3 can't perform code generation on par with models that have been specially trained for this purpose.

2- The vanilla GPT-3 can perform quality code generation, but it is computationally expensive, wastes too much resources, and can't turn in profits.

This led me to the following questions:

1- Can the fine-tuned GPT-3 model perform other language-related tasks (text generation, question answering, etc.)?

2- If not, do you really need GPT-3 for this in the first place? I mean, there are several other Transformer-based architectures that can perform specialized tasks.

3- And finally, if you finetune GPT-3, are you cheating in the sense that you're defying the original goal of the model?

You can find my full analysis here:

https://bdtechtalks.com/2021/05/31/microsoft-gpt-3-and-the-future-of-openai/

Microsoft Blog: https://blogs.microsoft.com/ai/from-conversation-to-code-microsoft-introduces-its-first-product-features-powered-by-gpt-3/