Here's what I came up with: http://gist.github.com/123400
Right now I'm only running Bonnie64 tests and only for small and large instances of Linux. There's also no fancy reporting, just Bonnie's txt output. My plan is to add other tests and then figure out something simple for reporting (Google Charts, probably). After that I'll add Solaris and other distros / machines sizes.
Do you have any suggestions for other tests/benchmarks to use? What do you think about the approach?