CRF and HMM
Hidden Markov
principle
Code
Conditional Random Fields
principle
CRFs is the probability of an undirected graph (Markov Random Field) obtained extended, no probability distribution $ P (Y) $ may be a function of the potential on the probability map of the maximum of all the groups to $ C $ $ joint probability map \ Psi_ {C} (Y_ {C}) $ represents the product form, $ Y_ {C} $ $ C $ is a random variable corresponding to, i.e.,
$$P(Y)=\frac{1}{Z} \prod_{C} \Psi_{C}(Y_{C})$$
Wherein the Z $ $ normalizing factor is to ensure that $ P (Y) $ is a probability distribution. Function $ \ Psi_ {C} (Y_ {C}) $ to ensure a strictly positive, and is generally defined as an exponential function.
$$\Psi_{C}(Y_{C})=exp\left( -E(Y_{C}) \right)$$
Under conditions of the airport is given the random variable X-$ $ conditions, the Y random variable $ a $ MRF, presented here is defined on the condition random linear chain, denoted in question, $ $ X-denotes an input the observation sequence, $ $ represents the Y output corresponding state sequence or marker sequence, as shown below.
Conditional probability model Conditional Random Fields are as follows:
$$P(y|x)=\frac{1}{Z(x)}exp \left( \sum_{i,k} \lambda_{k}t_{k}(y_{i-1},y_{i},x,i) + \sum_{i,l}\mu_{l}s_{l}(y_{i},x,i) \right)$$
$ t_ {k} $ is defined in the edge characteristic function, called the transfer of the features, depending on the current location and previous location, $ s_ {l} $ are defined at the junction of the characteristic function, as states characterized , depending on the current position. Typically, the characteristic function $ t_ {k} $ and $ s_ {l} $ value is 1 or 0 , $ \ lambda_ and {K} $ $ \ mu_ {l} $ is the characteristic function corresponding to the weights.
CRFs are actually defined in the time-series data of log-linear model
Code
Interview Questions
Hidden three basic questions:
(1) probability calculation
Given the model and the observed sequence, solving appear observation sequence $ O $ in the model $ \ lambda $ conditions, the probability $ P (O | \ lambda) $. In other words, how to assess the degree of match between the model and the observed sequence?
(2) learning problems
Given the observation sequence $ O = (o_ {1}, o_ {2}, ..., o_ {T}) $, estimate the model parameters $ \ lambda = (A, B, \ pi) $, $ P such that ( O | \ lambda) $ maximum. In other words, how to train the model so that it can best describe the observed data.
(3) prediction problem
Also known as decoding problem. Known model $ \ observation sequence and the lambda $ $ O $, solving the most likely sequence of states $ I = (i_ {1}, i_ {2}, ... i_ {T}) $. In other words, how to estimate the hidden state sequence based on the observation sequence.
The basic algorithm to solve three problems:
(1) front to back to the algorithm
(2) Baum-Welch algorithm (EM)
(3) Viterbi algorithm
Relations HMM, CRF and the LR
(1) CRF and LR are log-linear model, CRFs are serialized version of logistic regression. Logistic regression is log-linear models for classification, CRFs for the linear model is a logarithmic sequence of labels.
(2) HMM model is a formula, LR and CRF are discriminative models
(3) HMM can do the task sequence labeling. Each HMM model can be represented by the CRF, the CRF may define additional quantity, wherein a broader range of functions, may be used for any feature function weights. CRF stronger than HMM.