Click here to Skip to main content
15,904,156 members
Please Sign up or sign in to vote.
1.00/5 (1 vote)
See more:
Python
import nltk
import random
#from nltk.corpus import movie_reviews
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from sklearn.naive_bayes import MultinomialNB, BernoulliNB
from sklearn.linear_model import LogisticRegression, SGDClassifier
from sklearn.svm import SVC, LinearSVC, NuSVC
from nltk.classify import ClassifierI
from statistics import mode
from nltk.tokenize import word_tokenize



class VoteClassifier(ClassifierI):
    def __init__(self, *classifiers):
        self._classifiers = classifiers

    def classify(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)
        return mode(votes)

    def confidence(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)

        choice_votes = votes.count(mode(votes))
        conf = choice_votes / len(votes)
        return conf
    
short_pos = open("positive.txt","r").read()
short_neg = open("negative.txt","r").read()



I am getting this error:


UnicodeDecodeError: 'ascii' codec can't decode byte 0xf3 in position 4645: ordinal not in range(128)


How can i fix this?

What I have tried:

I have tried changing the file, and it works.However, with positive.txt it's not working.
Posted
Updated 27-Jan-18 23:58pm
Comments
Kornfeld Eliyahu Peter 28-Jan-18 5:11am    
Work with unicode or local?
Member 13647869 28-Jan-18 5:21am    
I am not quite sure what your question means. Could you please explain?

The ASCII character set only accepts values in the range 0 to 127 inclusive: your byte value of 0xF3 - a hexadecimal value equivelant to 243 in decimal - is outside that range and cannot be translated to an ASCII character.

You are trying to read data as text, but the file does not contain the "right data" - I'd suggest you check the file content, and probably read it as binary data instead of text: Working with Binary Data in Python | DevDungeon[^]
 
Share this answer
 
Comments
Member 13647869 28-Jan-18 5:38am    
The file contains text and numbers, so if I read it as binary data, it wouldn't work, would it? As in, i cant use the split lines function and so on
OriginalGriff 28-Jan-18 5:59am    
But "straight text" is not all that it holds - you need to look at it closely and find out if it's your assumptions (that it's just text) or the wrong encoding. As Peter says, if it's Unicode you need to read it as that or it defaults to the much more limited ASCII set. But ... character 0xF3 in Unicode is ó - an accented "o" - and given where you are that's probably not a likely character to get in a string!

Look at the data files, and work out what you need to do with them.
Member 13647869 28-Jan-18 6:02am    
It's not just text, as I said it contains text AND numbers. What i am trying to do is use this file to create classifiers for a sentiment analysis
OriginalGriff 28-Jan-18 6:12am    
That's not a distinction you should be making. You are assuming that "Number 98765" is "text and numbers" and it isn't - it's all text, as the number 98765 is stored in the file as a sequence of readable characters. If it was stored in the file as a number, it would be stored as four bytes: 0x00, 0x01, 0x81, oxCD - each occupying a "single character space".

"Text" is "anything human readable".
"Binary" is "anything machine readable, but probably not immediately human readable".

You need to examine your files, and see exactly what they contain. Starting with a Hex Editor is a good beginning - anything under 0x20 or above 0x7F may indicate it's a binary file (except for 0x0A and 0x0D)
Member 13647869 28-Jan-18 6:17am    
That was my original assumption, which is why in the code its reading the file, r (short_pos = open("positive.txt","r").read()), but then the ascii gave me that error.
I am very confused. As every person I ask gives me a different answer.
The problem is that you try to open a file that contains text encoded not in ASCII... Without telling to Python how to open the file it will try to open it as ASCII (the default encoding of Python) and will fail...
Add the 'encoding' param to your open function to solve the problem...
2. Built-in Functions — Python 3.6.4 documentation[^]
 
Share this answer
 
Comments
Member 13647869 28-Jan-18 5:59am    
Ok, please bare with me, I am new to all of this =z so as you said I have to use the encode parameter, but the thing is I thought I should encode it using utf-8, but it didn't work:

i wrote this:
short_pos = open("positive.txt", "r",encoding='utf-8').read()
short_neg = open("negative.txt","r",encoding='utf-8').read()

and got this error:
short_neg = open("negative.txt","r",encoding='utf-8').read()
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/codecs.py", line 321, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x97 in position 3118: invalid start byte
Kornfeld Eliyahu Peter 28-Jan-18 6:06am    
If I understand correctly 'positive.txt' opens with utf-8, but 'negative.txt' does not?!
It seems you files are encoded differently (maybe from different sources...)...
As there is no fool-proof way to determine the encoding of a text file, you have to resolve to try-catch...
You have to set a list of possible encodings and try each of them until success or finish...
Member 13647869 28-Jan-18 6:09am    
the files were given to me, as in I do not know what encoding that is used. I changed the positive just to see if the difference, as in compare my assumption with the original one. I wanted to see what would happen if I encode positve.txt with utf-8 would it work? so I left negative.txt the same
Member 13647869 28-Jan-18 7:39am    
Thank you, Peter, for your help!!!!!!
Kornfeld Eliyahu Peter 28-Jan-18 7:51am    
You are welcome!

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900