I've wanted to figure out Lucene but never got around to it (the Lucene book is very outdated and none of the example code works, for instance) but today I did something simpler, a little experiment in indexing.
I have a directory of about 3.2 GB of XML documents (medical journal papers downloaded from ftp://ftp.ncbi.nlm.nih.gov/pub/pmc/articles.tar.gz -- it's about a 700 MB file). I wondered how long it would take the simple disk-based Lucene demo using default settings (http://lucene.apache.org/java/2_3_1/demo.html).
System stats: 7200 RPM 300 GB disk; Windows XP SP 2, Quad Core 2.4 ghz Core 2, 2 GB DDR2-800 RAM.
It took 23 minutes, the last 5 of which were merely flattening index chunks into a single file so that searches run faster.
So about 20 minutes for 3 gigs of text. The final index file was about 1/5 the size of the original source text at 646 MB.
Memory usage was very reasonable - it hovered around 30-40 MB (unlike, say, Java IDEs which use up 200 MB or so).
Ultimately a benchmark like this is disk-bound, but that's still fast as shit in my opinion. I had to whip together a ghetto homegrown indexing system at work several months ago (I've never had time to optimize it), and this blows away what I created.