Ask HN: Kickstarter for Data Science?

1 point

7 years ago

With OpenAI's decision not to release the fully-trained model for their GPT-2 NLP model, some ambitious people have attempted to 1) build a similar dataset (40G of text), and 2) find funds to train the model (~$40,000).

1) https://github.com/jcpeterson/openwebtext

2) https://www.reddit.com/r/MachineLearning/comments/av7s38/d_i_won_20000_maybe_120000_of_google_cloud_credit/

Over the past year I've noticed that the best ML work requires scale that is impractical to individuals working without corporate funding. If this trend grows, it seems to me that innovation will be stifled. OpenAI wasn't wrong to be cautious about releasing an OP NLP model, because innovation is intrinsically amoral. A fine-tuned GPT-2 model for spear-phishing attacks could cause havoc. However, a GPT-2 model for identifying spear-phishing attacks could be inversely useful. But the question of distributing power verses centralizing power is settled among Hackers: distributing power (i.e., solutions) is always better, because "no problem should be solved twice" and "drudgery is evil".

Which is why I offer this question to Hacker News.

Would a "Kickstarter for Data Science" work? What would it have to look like to work? And by "work", I mean that models like the GPT-2 would be efficiently funded, trained, and released to the public.

1 comment