I'm currently trying to extract cosine similarity values to compare two different texts, using TF-IDF values. This is the code I'm using:
def cosine_sim(text1, text2):
tfidf = vectorizer.fit_transform(text1, text2)
return ((tfidf * tfidf.T).A)[0,1]
negitsimilarity = []
for c,p in zip(cnegitlist,panegitlist):
cosinesimnegit = cosine_sim(c,p)
negitsimilarity.append(cosinesimnegit)
However, whenever I run this code, I keep getting this error:
IndexError: index 1 is out of bounds for axis 1 with size 1
This function worked for my other dataset, so I'm not sure why it didn't work for this one. I've tried looking through the array sizes for the TF-IDF values, but haven't been able to find anything unusual. Does anyone have any advice?
What I have tried:
- Since this was an index error, I've tried looking through the array sizes for the TF-IDF values, but haven't been able to find anything unusual.
- As stated before, I've also tried using it with other datasets, and it worked. I'm not sure what is different about this dataset.