Click here to Skip to main content
15,745,620 members
Please Sign up or sign in to vote.
0.00/5 (No votes)

I'm developing a tool that searches the keyword entered by the user on a given site. My problem is, it searches the keyword only on html/web pages but not on the PDF/MS-Word files found on the site.

Can anyone suggest me some api/tool or provide the code that can search text from the online PDF/MS-Word/Text file? I need to download only those files that contain a particular keyword (text).

1 solution

Is this site elsewhere, or is it your site? If you are searching on your site then you can use the Index Server to search through the documents. If it's on a different server then you will have to adopt a more brute force approach.

With the brute force approach, you will need to copy the document onto your system and then use the IFilter COM API to get the text from the document. Have a look at this[^] article for more detail on how to use it.
Share this answer

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900