Click here to Skip to main content
14,641,739 members
Rate this:
Please Sign up or sign in to vote.
Hello All,

Firstly, am new to this coding and learning step by step.

I have been searching for a logic to "Search a word in multiple PDF Documents", but in vain. Can anyone please share the thoughts or logic to obtain said process, it will be greatly helpful.

Let me elaborate my operation with an example.
Well, i have multiple PDF files and i need to check a particular word/number within it, if the word appears in the PDF's then it should return True if not throw an error as False with the statement of error containing file.

Example:

1. ABS.pdf has "123" (Contains the number)
2. CCC.pdf has "123" (Contains the number)
3. XYZ.pdf has "145" (Doesn't contain the number)

Well, if the above files is searched using the keyword "123" then the application should return "XYZ.pdf" Doesn't contain "123" number.

Note: This operation should be done on bulk/multiple PDF's in one go.

What I have tried:

I stumbled upon the DLL called iTextSharp but lacking the logic, how to segregate the code.

Any help will be greatly appreciated.

Thanks
Saikrishna
Posted
Updated 20-Mar-16 10:28am
v4
Comments
LLLLGGGG 20-Mar-16 7:41am
   
If you have a method that searches a word in a single PDF file, you can just loop through all the PDFs you have and if at least one does not contain the entry you're looking for, simply return false, otherwise true.

LG
Richard MacCutchan 20-Mar-16 9:21am
   
The iTextSharp library will allow you to read the content of the PDF files. You can then search within the content for the word(s) you are looking for. However, be aware that PDFs have quite a complex structure and their content is not simple text.

1 solution

Rate this:
Please Sign up or sign in to vote.

Solution 1

You can use PDF IFilter text extractors to get at the pdf internal text.

If you want to search then start at this article : hOOt - full text search engine[^]
   

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 503-250 Ferrand Drive Toronto Ontario, M3C 3G8 Canada +1 416-849-8900 x 100