I worked on a predictive (classification) model. I used Word2Vec to convert the data is textual columns to numeric, following which I ran the machine learning algorithms.
I have the following doubts regarding the working of Word2Vec:
What I have tried:
1) When I check the vector representation of each word of a sentence, I get an array of 100 numbers/vectors. What do all these numbers mean? I know that each number corresponds to a dimension, but what is a dimension in this context (with regard to the vector space)?
2) When training the Word2Vec model on a 'Neural Network', each word in a sentence is fed as input to the input layer & the words are one-hot encoded. So the vector representation of the words being fed would be something like, [1 0 0 0 0 0 0] & [0 0 1 0 0 0 0].
These vectors are initialized with random weights. The weighted sum of inputs is transmitted to the next layer (Hidden Layer). My doubt is, what is the point of assigning random weights to these word vectors when the weights that are being multiplied with the 0s will anyways remain 0?
How is the neural network transmitting information across with sparse data?