1) It has a queue of domains that I have pre-processed. For the initial purposes I've restricted it to pages that I think are ecommerce based on $ signs, add to cart/basket type links etc
2) There is a visual tool that I then use to select certain parts of the page - eg price, product, image etc. I save these out as xpaths
3) Once I have done one URL I send a crawler to that domain and extract other pages that fit the profile of an ecommerce page and try to use the same mapping as number 2 above to extract the data
I have done a small video to show it in action:
http://www.screencast.com/t/riB3iiVMiSk
I'm not sure if I'm doing this the right way. If a site/page changes structure then I may have to re-map the data. I was hoping that someone would have some pointers for me in terms of any other ways to do this. Also with Javascript-heavy sites I've had some problems
If anyone has any knowledge of screen scraping, where it can be done more automatically, I'd really appreciate a steer!
Thanks
Ade