Click here to Skip to main content
14,875,773 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
the function count words should remove all the stop words but when running the code, I am getting a list of strings with the stop words so I was wondering where I am going wrong.


The first few lines of the text file are:
and the evening and the morning were the first day.
and god said let there be a firmament in the midst of the waters and let it divide the waters from the waters.
and god made the firmament and divided the waters which were under the firmament from the waters which were above the firmamenin the beginning god created the heaven and the earth.
and the earth was without form and void; and darkness was upon the face of the deep.
and the spirit of god moved upon the face of the waters.


What I have tried:

import re
filename="bibleSentences.15.txt"

def getData(filename):
  with open(filename,'r') as f:
    #converting to list where each element is an individual line of text file
    lines=[line.rstrip() for line in f]
    return lines
getData(filename)

def normalize(filename):
    #converting all letters to lowercase
    lowercase_lines=[x.lower() for x in getData(filename)]
    #strip out all non-word or tab or space characters(remove punts)
    stripped_lines=[re.sub(r"[^\w \t]+", "", x) for x in lowercase_lines]
    print(stripped_lines)
    return stripped_lines
normalize(filename)

import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
stopwords.words('english')
stopwords=set(stopwords.words('english'))

def countwords(filename):
  output_array=[]
  for sentence in normalize(filename):
    temp_list=[]
    for word in sentence.split():
      if word.lower() not in stopwords:
        temp_list.append(word)
    output_array.append(''.join(temp_list))
    print(output_array)
    return output_array
output=countwords(filename)
print(output)
countwords(filename)
Posted
Comments
Richard MacCutchan 11-Oct-20 4:01am
   
The structure of your program is a bit random. Put all the functions at the beginning and the main code at the bottom. Also do not put function calls after each function for no reason.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)




CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900