Deep learning sentence translation

sequence to sequence model

Insert image description here
Language models and translation models. the model:
Insert image description here
machine translation includes two steps: green encoding and purple deconding.
Green encodes and memorizes the entered sentence, and then enters purple.

1.beam search algorithm

The specific process is as follows:

  1. Initialization: Determine the value of the number of candidates k, create an initial candidate sequence, mark the general revelation with a special revelation symbol, and set the initialization probability = 0.
  2. Generate candidate sequences: select the most likely words at the next time step, and based on probability, select the k ones with the highest probability.
  3. Pruning: For the retained k words, generate the next k candidate words for each word. Then select the k ones with the highest probability from these k^2 sequences as new candidates.
  4. Termination: The preset maximum sequence length is reached.
  5. Choose the best sequence.

advantage:

  • Compared with greedy search, it is more efficient and can limit the size of the search space;
  • It is highly space efficient, takes up less space, and can perform sequence generation under resource constraints.

shortcoming:

  • There is a local optimal solution, leading to a higher probability of sequence elimination.
    (Here I will give you the details: 1. Early pruning: only retain the k candidate sequences with the best scores, and the discarded sequences may be the best translations. 2. Cumulative errors: Although the selection at each time step is Based on the current local optimum, but errors will accumulate and cannot be corrected later.)
  • Lack of diversity: also caused by pruning
  • Need to pre-disadvantage k value
  • The multiplication of the probabilities of each sequence results in a very small value, that is, a numerical underflow.

2. Improved beam search algorithm

Change probability multiplication to log probability multiplication.
Insert image description here

3. Attention model

The longer the statement, the worse the performance and the lower the translation score. How to solve it?
attention model:
Insert image description here

  1. Encoder: RNN or variant LSTM, GRU. Convert each word into a fixed-length vector.
  2. decoder: RNN, gradually output sequence
  3. Attention mechanism: The hidden state of the decoder at the current time step is operated with the hidden state of the encoder at each time step to obtain a similarity score. Allows focusing on specific parts of the input sequence when decoding, helping to generate accurate output.
  4. Compute attention weights: usually using a feedforward neural network or a simple dot product operation.
  5. The attention weights are multiplied with the encoder output at each time step, resulting in a weighted summary of the encoder output.
  6. The weighted summarized encoder output is fed into the decoder’s prediction layer along with the decoder’s hidden state at the current time step, which is used to predict the next output word or token.

Specific examples:

4. Voice recognition

audio clip -》 transcipt
sonogram-》Attention mechanism network (bidirectional LSTM)

Guess you like

Origin blog.csdn.net/qq_53982314/article/details/131110775