Well organized data can be very valuable. When the data are so valuable that companies providing them have revenues in the billions [1], the costs of distribution are negligible, and there is a lot of competition in such fields. However, there must be many not-as-valuable data sets, the sale of which would cover gathering and organization costs, but the distribution costs of which make it not worthwhile (hosting, coding of payment processing backend, keeping track of legal issues, etc.).
If so, would it make sense to create a website which lets people scrape their own data streams (for example, tagged and organized texts of political speeches from around the world), focusing on the quality of the data, and letting the site take care of hosting and distribution? At the least, it would be a searchable repository of organized data sets. Optimistically, it could be the search engine of the semantic web... ;)
What do you think, ladies and gents?
1. The Thomson Corporation had revenues of $6.6 billion in 2006: http://en.wikipedia.org/wiki/The_Thomson_Corporation