[Machine] HMM learning and CRF

CRF and HMM

Hidden Markov

principle

Code

 

Conditional Random Fields

principle

CRFs is the probability of an undirected graph (Markov Random Field) obtained extended, no probability distribution $ P (Y) $ may be a function of the potential on the probability map of the maximum of all the groups to $ C $ $ joint probability map \ Psi_ {C} (Y_ {C}) $ represents the product form, $ Y_ {C} $ $ C $ is a random variable corresponding to, i.e.,

$$P(Y)=\frac{1}{Z} \prod_{C} \Psi_{C}(Y_{C})$$

Wherein the Z $ $ normalizing factor is to ensure that $ P (Y) $ is a probability distribution. Function $ \ Psi_ {C} (Y_ {C}) $ to ensure a strictly positive, and is generally defined as an exponential function.

$$\Psi_{C}(Y_{C})=exp\left( -E(Y_{C}) \right)$$

Under conditions of the airport is given the random variable X-$ $ conditions, the Y random variable $ a $ MRF, presented here is defined on the condition random linear chain, denoted in question, $ $ X-denotes an input the observation sequence, $ $ represents the Y output corresponding state sequence or marker sequence, as shown below.

 

Conditional probability model Conditional Random Fields are as follows:

$$P(y|x)=\frac{1}{Z(x)}exp \left( \sum_{i,k} \lambda_{k}t_{k}(y_{i-1},y_{i},x,i) + \sum_{i,l}\mu_{l}s_{l}(y_{i},x,i) \right)$$

$ t_ {k} $ is defined in the edge characteristic function, called the transfer of the features, depending on the current location and previous location, $ s_ {l} $ are defined at the junction of the characteristic function, as states characterized , depending on the current position. Typically, the characteristic function $ t_ {k} $ and $ s_ {l} $ value is 1 or 0 , $ \ lambda_ and {K} $ $ \ mu_ {l} $ is the characteristic function corresponding to the weights.

CRFs are actually defined in the time-series data of log-linear model

 

Code

 

Interview Questions

Hidden three basic questions:

(1) probability calculation

Given the model and the observed sequence, solving appear observation sequence $ O $ in the model $ \ lambda $ conditions, the probability $ P (O | \ lambda) $. In other words, how to assess the degree of match between the model and the observed sequence?

(2) learning problems

Given the observation sequence $ O = (o_ {1}, o_ {2}, ..., o_ {T}) $, estimate the model parameters $ \ lambda = (A, B, \ pi) $, $ P such that ( O | \ lambda) $ maximum. In other words, how to train the model so that it can best describe the observed data.

(3) prediction problem

Also known as decoding problem. Known model $ \ observation sequence and the lambda $ $ O $, solving the most likely sequence of states $ I = (i_ {1}, i_ {2}, ... i_ {T}) $. In other words, how to estimate the hidden state sequence based on the observation sequence.

 

The basic algorithm to solve three problems:

(1) front to back to the algorithm

(2) Baum-Welch algorithm (EM)

(3) Viterbi algorithm

 

Relations HMM, CRF and the LR

(1) CRF and LR are log-linear model, CRFs are serialized version of logistic regression. Logistic regression is log-linear models for classification, CRFs for the linear model is a logarithmic sequence of labels.

(2) HMM model is a formula, LR and CRF are discriminative models

(3) HMM can do the task sequence labeling. Each HMM model can be represented by the CRF, the CRF may define additional quantity, wherein a broader range of functions, may be used for any feature function weights. CRF stronger than HMM.

Guess you like

Origin www.cnblogs.com/4PrivetDrive/p/12141639.html
HMM