My idea consists of implementing a web application tailored for displaying posts and their associated comments in a general way with a public API matching the existing Reddit one so that existing third-party clients can trivially switch to it; and an underlying broker that can redirect incoming requests either to the internal DB, or to user-specified data sources. It would not store/scrape any Reddit content to avoid any legal action. Instead, users could specify data sources they wish to follow and those will provide the requested data. These services implement a common API protocol.
So ideally the legal background would be similar to that of torrent clients - it is the user that might violate Reddit's TOS, not this website.
A possible data source would be a Reddit scraper - it could use the existing APIs while possible and later switch to scraping the HTML content. Of course this will be a constant cat-and-mouse game, but I think it is absolutely feasible to make it work.
Scrapers are hosted completely independently of the main website (also self-host). They do their own caching, at most the main app stores some metadata on which has which data readily available (is a Bloom filter good for this finally?)
## Growth
While that is nothing original so far, what came to mind is the EU's GDPR laws and thus the mandated downloadability of user data. Reddit has to (and does) have an option to download all the associated data for a given user, who I believe is the owner of said data. So they should be free to upload their own data to a new service. This could be parsed and be hosted on the main site, as this is at this point completely legal (?).
Of course new content can also be posted directly to this new social media, both in the form of creating a new post, or commenting on existing, scraped reddit data. The two sources will be seamlessly merged.
## User, content management
User accounts will be handled independently of existing reddit users. Reddit accounts can be claimed and linked to the former through some validation (for example, messaging a specific reddit bot) and that link might be public or hidden.
Reddit's moderation leaves a lot to be desired, but anyone who had some experience moderating forums know that unmoderated content will quickly lose its value. Since in the initial phase we simply return Reddit's content, we serve already censored data. For content generated on this new site I think having moderation is essential, but editing history should be public so that mod power-trips can be handled. One exception where previous data will be removed would be severely illegal content (e.g. CP), but the fact of removal will still be public. (Ideally I think mods should only be able to mark such content which would block the content from being served temporarily, and only after admin approval would the content be removed).
## Feasibility
Of course, this whole thing is just a naive train of thought on my end, and the reason I'm posting it here is to have criticism. But if Reddit really does practically kill off their third-party clients, and if they dare touch their NSFW content - and this approach could reuse these third-party clients as is, I really do believe that a critical mass could be achieved.
## Open questions
legality of operation Probably this is the loosest end of this idea. Of particular note are the scrapers - is fetching data on the behalf of the user on the server side legal? Is there some plausible denyability that the site only serves data from a given data source, and it is up to the user to confirm its legality? If that is not an option then the architecture could be altered so that it returns scraper urls to the client, and they will request that for the actual data, but that would mandate changes from third-parties.