Ask HN: How do I stop companies from scraping my site?

44 points

3 years ago

This article says that one of the datasets for chatGPT was obtained by scraping all links with reddit with more than 2 upvotes: https://www.searchenginejournal.com/how-to-block-chatgpt-fro...

I don't want big companies to scrape my content and then sell it on their platform.

Novelty of LLM output may be an open question, but input is just someone else's stuff. I assumed that default copyright protects from this kind of bullshitttery. That it says that work can not be used, adapted, copied without creators permission. (I can only guess that it was allowed to happen, because that's the first time someone stole IP in this particular manner on this scale?) But now that we know that it's a thing, how can we maintain ownership of the inputs legally and engineering wise?

56 comments