语音识别WFST核心算法讲解(3. Determinization)

本文介绍WFST中Determinization这一操作。

首先介绍什么是Determinization。在Openfst官网中介绍如下：The result will be an equivalent FST that has the property that no state has two transitions with the same input label.
这里写图片描述
deteministic的意思是说从某个state出发的所有转移，不能有相同的input label。
注意：如果一个WFST是determinizable的，这个WFST必须是Functional的。
A transducer is functional if each input string is transduced to a unique output string. There may be multiple paths, however, that contain this input and output string pair.
所以一个WFST要想determinizable，对于所有successful path，如果有相同input string，对应的输出必须严格不同，但允许有多条相同完全一样的successful path。

Ok，到此为止了解了Determinization，下面介绍这一操作的算法流程，伪代码如下：
这里写图片描述

Determinization算法比composition略复杂，依旧line by line 介绍。

composition中的状态对为三元组分别为T1，T2，F的state, 在Determinization中，状态对为三元组分别(state，leftover label sequence，residual weight)。初始化一个三元祖集合，只有一个元素，为（start state，epsilon， initial weight），将其放入队列S中（line1-3）。当S不为空，取出一个三元祖集合，对于该三元祖集合中所有state中出发的所有transition中的input label = x的所有转移做以下操作， line8 得到最大公共前缀也就是prefix作为y′ （如果无最大公共前缀，取epsilon）。line9 对于每条转移上将该转移上的weight和residual weight求和，再取当中的最小值，注意此处仍为Tropical Semiring。line10将转移的下一个状态，去除最大公共前缀的leftover label sequence，去除w′ 的residual weight 作为新的三元祖（state，leftover label sequence，residual weight）的集合q′ ，这里的q′可看作determinize后的一个新的state，在determinize后的图中增加到q′ 的转移，如果q′中的所有state都是属于final state，则q′也是新的final state（line12 - 13）， final weight为residual weight和final state weight的和（line 14 -16），这里注意还有残留leftover sequence（z），有很多种处理方式，KALDI中是将其展开。最后如果这个三元祖集合没出现过，放入队列中（line18）。
以下给出一个例子：
这里写图片描述

语音识别WFST核心算法讲解(3. Determinization)

猜你喜欢