You can index all kind of files (i.e doc, docx, xls, ppt, pdf, txt, html even ORC pdfs) and then search them using very advanced queries like "always contain X", "never contain X", "X near Y", wildcard search, proper stemming support etc.
We're using it on my work where we have hundreds of thousands of doc/docx/pdf files and it works flawlessly!