If you haven't solved this problem yet, maybe this link will help.
http://lucene.apache.org/solr/api-4_0_0-ALPHA/doc-files/tutorial.html[
^]
We use Solr for our sites search and we have a daily cron job that indexes all of our html, pdf, xml, docx, pptx, and other files. We could run it more often, but for our needs have decided once per day is enough.