Click here to Skip to main content
14,828,676 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
Hi there,

Utilizing Excel or Python, I am trying to determine the # of word documents in which a single word occurs. For example, let's say I have 100 word documents. Of these documents, I would like to find out how many contain the word "Excel" at least one time.

I don't need to know how many times the word occurs in each document, just if it occurs or not. Does anyone know how to do this?

What I have tried:

I've tried looking for ways to do this online, but they all contain tutorials which require I use their documentation, and they only tell you how to count the number of times a single word occurs in one file.
Updated 27-Sep-20 20:22pm
F-ES Sitecore 27-Sep-20 14:20pm
If you have code that says how many of times the word appears in the file if the count is one or more then that's a result, so loop that code for all your files and count up how many returned a result of > 0

That's going to be complicated, as you will need to be able to read Word documents, extract the text, check for an instance of your word, and then open the next and repeat the process.

If you have code to read a file and count words, then all you need to do is repeat that process for each file, using a cut down version of your count software that stops after the first instance.

We can't do that for you - we have no idea what soft waste you have found or how it works!
For each document, search for the requested word and exit reporting success as soon as you find it. That's all.
The trickiest part is how-to-perform-such-search. For instance, if you stick to Python and your targets are Word documents, then python-docx[^] could help, see for instance How to find and replace text in a Word document using Python - Quora[^].

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900