Click here to Skip to main content
15,566,625 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I am trying to read a file and get the count of occurence of a string irrespective of case(upper/lower). But my code is not giving desired results.

Why is it so? Also how can I make my search case insensitive?
code is:
Python
import os,re


fileName_path = input ("Please input the file name with location: ")
directory = os.path.dirname(fileName_path)
os.chdir(directory)

fileName = os.path.basename(fileName_path)
openFile = open(fileName ,"r")

cnt = 0

with openFile as readFile:
    for searchpattern in readFile:
        if 'tempCharSearch' in searchpattern:
            cnt += 1

openFile.close()
print (cnt)


In the text file there are 14 tempCharSearch, but the result is showing only 3, why is it so?
The text file attached here with:

CSS
Lorem Ipsum is simply dummy text of tempCharSearch:='100-111-875' printing and typesetting industry. Lorem Ipsum has been tempCharSearch:='100-111-875' industry's standard dummy text ever since tempCharSearch:='100-111-875' 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

It has survived not only five centuries, but also tempCharSearch:='100-111-875' leap into electronic typesetting, remaining essentially unchanged. It was popularised in tempCharSearch:='100-111-875' 1960s with tempCharSearch:='100-111-875' release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.

tempCharSearch:='100-111-875're are many variations of passages of Lorem Ipsum available, but tempCharSearch:='100-111-875' majority have suffered alteration in some form, by injected humour, or randomised words which don't look even slightly believable. If you are going to use a passage of Lorem Ipsum, you need to be sure tempCharSearch:='100-111-875're isn't anything embarrassing hidden in tempCharSearch:='100-111-875' middle of text. All tempCharSearch:='100-111-875' Lorem Ipsum generators on tempCharSearch:='100-111-875' Internet tend to repeat predefined chunks as necessary, making this tempCharSearch:='100-111-875' first true generator on tempCharSearch:='100-111-875' Internet. It uses a dictionary of over 200 Latin words, combined with a handful of model sentence structures, to generate Lorem Ipsum which looks reasonable. tempCharSearch:='100-111-875' generated Lorem Ipsum is tempCharSearch:='100-111-875'refore always free from repetition, injected humour, or non-characteristic words etc.
Posted

1 solution

Your code is not counting the number of occurrences of "tempCharSearch' in the file, but the number of lines, in which the pattern occurs. As your input file appears to have just three lines, each one containing multiple occurrences, your result is 3.

Use Python's built in string count method to count all occurrences in a line:

C++
cnt += searchpattern.count ('tempCharSearch');


If you want to compare case insensitive then convert both the line string and your search pattern to lower-case before running the count, for example:

C++
for line in readFile:
    cnt += line.lower().count ('tempcharsearch');
 
Share this answer
 

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900