For one, we have robot.txt: http://en.wikipedia.org/wiki/Robot.txt
Is that the only thing that prevents(or prohibits) a robot/spider from scraping a site?
Can there be copyrighted material that is not allowed to be scraped?
I ask because I want to search/scrape a few hundred pages with similar content and present those results all the way to the buying of the product. I.e. not just like if you google you get a page that has the information. I want to present them with the options(maybe in a ranked order) where one click is needed to choose the product to buy, pretty much. Or one to search, one to choose and get to the productsite where you obviously have to do some more confirmation but you get the point...