Click here to Skip to main content
15,347,383 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
I have to write a NLP algorithm for grouping words from dictionary into different clusters eg. technology, food, color etc. and then able to classify new words into these clusters. what could be possible code for this simple algorithm? I am new to NLP please help me out.

What I have tried:

I tried K-means clustering but was unable to do so as i am new to NLP.
Updated 20-May-21 3:41am
Dave Kreskowiak 20-May-21 10:37am
If you think someone is going to just hand over code, you are sorely mistaken.

The only code you're going to get is the code YOU write. In order to do that, you have to understand NLP. That's why you got the link from Richard that you did.
Member 14844003 20-May-21 11:02am
obviously i know no one is gonna give me the code i just wanted to know the approach as to weather i have to prepare training data to the different clusters or is there any direct approach.
Jarek Szczegielniak 20-May-21 11:33am
You have a few options here, but in any case you will need to start with converting the words into numeric vectors (and one-hot encoding will not work in this case).
Some approaches you can try are (from the simplest one):
1. Rely on some pre-calculated embeddings (GloVe, word2vec, Bert, etc.), taking advantage of the fact, that words with similar meaning should be relatively "close" together (in some dimensions at least). Anyway, having these embeddings, you can try some classic unsupervised (e.g. clustering) or supervised (e.g. classification) algorithms on such vectors (depending if you have labels in your dataset).
2. Alternatively, you can attempt to obtain a definition of each word (e.g. from Wikipedia / online dictionary), and then use a pertained model (e.g. Bert) to calculate sentence embeddings for each definition. With such vectors, the rest is the same as in option 1 above.
3. Finally, if you have a labeled dataset, you can try to train end-to-end classification model using a pre-trained model (e.g. Bert) on it.

If you are completely new to NLP, I recommend to use a library like SpaCy ( or SparkNLP, which let you work with multiple models, including GloVe, word-2-vec and transformers networks (such as Bert). Note, that while transformers networks are really good, they are very, very slow (unless you have a GPU/TPU cluster at your disposal of course ;-)).
Member 14844003 20-May-21 11:47am
Thank you so much!!!

1 solution


This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900