Efficient Estimation of Word Representations in Vector Space (2013) thesis points

Papers Link: https://arxiv.org/pdf/1301.3781.pdf

reference:

A Neural Probabilistic Language Model (2003) paper points   https://www.cnblogs.com/yaoyaohust/p/11310774.html

 

- linear law linear regularities: "king - man = queen - woman"

- syntax and semantics law syntactic and semantic regularities

 

1986 Hinton et al distributed representation.

A typical session:

3-50 wheel, one billion level sample, sliding window width N = 10, vector dimension D = 50-200, hidden layer width H = 500-1000, dictionary dimension | V | = 10 ^ 6

Depending on the complexity of the hidden layer to the output layer, i.e., H * | V |

hierarchical softmax, Huffman coding output layer, the computational complexity | V | -> log | V |

Consider removing the hidden layer.

 

Two ways CBOW and Skip-gram

 

For more data, higher-dimensional vector:

Google News: 60 Yi tokens, 100 Wan common words, a very common word 30000

Three iterations, learning rate and 0.025 decay over time.

 

Guess you like

Origin www.cnblogs.com/yaoyaohust/p/11310905.html