Papers Link: https://arxiv.org/pdf/1301.3781.pdf
reference:
A Neural Probabilistic Language Model (2003) paper points https://www.cnblogs.com/yaoyaohust/p/11310774.html
- linear law linear regularities: "king - man = queen - woman"
- syntax and semantics law syntactic and semantic regularities
1986 Hinton et al distributed representation.
A typical session:
3-50 wheel, one billion level sample, sliding window width N = 10, vector dimension D = 50-200, hidden layer width H = 500-1000, dictionary dimension | V | = 10 ^ 6
Depending on the complexity of the hidden layer to the output layer, i.e., H * | V |
hierarchical softmax, Huffman coding output layer, the computational complexity | V | -> log | V |
Consider removing the hidden layer.
Two ways CBOW and Skip-gram
For more data, higher-dimensional vector:
Google News: 60 Yi tokens, 100 Wan common words, a very common word 30000
Three iterations, learning rate and 0.025 decay over time.