- I think it'd be cool to download the full Reddit dataset and apply NLP techniques. Somebody out there who finds that interesting? Maybe with multiple TB of hard disk, or fast internet, or willing to setup a ipython notebook server, or experience with RNN, or just general interest and ideas?
- I've been wondering if you can efficiently find two nodes in Wikidata, and automatically extract insights (like "hmmm, this politician comes from the same city as this real estate tycoon and he entered office the same year that such controversial mall was built, soon after that forest fire burned the same area"). So, who would find that interesting?
- I'd like to take part in the Kaggle competition to detect Diabetic Retinopathy. Somebody who is a doctor dipping into data analysis?
- If I had a dump of the position of all planes each minute for the last month Somebody would find that interesting, and wants to propose an analysis? Like, detecting erratic flight patterns, or potentially forbidden areas?
- I find HFT (High Frequency Trading) interesting, so I listened for the raw transaction socket from multiple bitcoin exchanges, and I run some basic analysis. Somebody would like to extend it?
- Clickbait is both eerie and exciting. How such headlines play with our basic instincts is worth an analysis. Inspired by [1] I dumped thousands of clickbait ads, and I'm trying to find the patterns they follow.
If there's none I guess I'll build it myself, as another side project :-).
[1] https://www.theawl.com/2015/06/a-complete-taxonomy-of-internet-chum