Click here to Skip to main content
15,891,852 members
Please Sign up or sign in to vote.
0.00/5 (No votes)
See more:
I'm currently trying to extract cosine similarity values to compare two different texts, using TF-IDF values. This is the code I'm using:

def cosine_sim(text1, text2):
    tfidf = vectorizer.fit_transform(text1, text2)
    return ((tfidf * tfidf.T).A)[0,1]

negitsimilarity = []
    for c,p in zip(cnegitlist,panegitlist):
       cosinesimnegit = cosine_sim(c,p)
       negitsimilarity.append(cosinesimnegit)


However, whenever I run this code, I keep getting this error:
IndexError: index 1 is out of bounds for axis 1 with size 1


This function worked for my other dataset, so I'm not sure why it didn't work for this one. I've tried looking through the array sizes for the TF-IDF values, but haven't been able to find anything unusual. Does anyone have any advice?

What I have tried:

- Since this was an index error, I've tried looking through the array sizes for the TF-IDF values, but haven't been able to find anything unusual.
- As stated before, I've also tried using it with other datasets, and it worked. I'm not sure what is different about this dataset.
Posted
Comments
Richard MacCutchan 16-Feb-21 3:40am    
Look at the line of code where the error occurs and it should identify what the data is that causes the error.
Maciej Los 16-Feb-21 3:40am    
This line:
return ((tfidf * tfidf.T).A)[0,1]

produces IndexError: index 1 is out of bounds for axis 1 with size 1

So, before you return a value from an array, you have to check its size.

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900