“Just as Van Gogh’s painting of sunflowers is a two-dimensional mixture of oil on canvas that represents vegetable matter in a three-dimensional space in Paris in the late 1880s, so 500 numbers arranged in a vector can represent a word or group of words.” –DL4J
Word2Vec can guess a word’s association with other words, or cluster documents and define them by topic. It makes qualities into quantities, and similar things and ideas are shown to be “close” in its 500-dimension vectorspace.
Word2Vec is not classified as “deep learning” because it is only a 2-layer neural net.
Input → text corpus Output → set of vectors, or neural word embeddings
Rome - Italy = Beijing - China, so Rome - Italy + China = Beijing
king : queen :: man : woman
house : roof :: castle : [dome, bell_tower, spire, crenellations, turrets]
China : Taiwan :: Russia : [Ukraine, Moscow, Moldova, Armenia]
knee - leg = elbow - arm
knee is to leg as elbow is to arm
knee : leg :: elbow : arm
Word2Vec can be implemented in DL4J, TensorFlow