How long would I need to run the experiment to get a reasonably decent idea of which version is statistically better? (I have no statistical background/training.)
The wikipedia page (http://en.wikipedia.org/wiki/A/B_Testing) is very light on details.
Any help would be greatly appreciated.