Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: zz-x2580
Information Extraction
Connectionist Temporal Classification
Consider the task of recognizing an M length sequence of tokens, , from a T length input
xT1 . The CTC objective function is one objective for such sequence transduction tasks which
works by converting into an alignment sequence of the same length as xT1 by using an
additional symbol to fill the extra space. Note that it assumes represents
the set of all possible T-length alignments, , for the sequence .
(1)
We will consider that is obtained by normalizing the outputs of a neural network,
φ. Element, xT1 , of the matrix, φ, can be used to compute the CTC objective where
, (2)
and |S| is the number of output units in the neural network.
1. WFST Representation of the CTC Objective
Consider the task of modeling the word “all” using characters as the tokens. In this case
M = 3. For this question, we will model words using both the HMM, and CTC objective
functions and examine the different unweighted (i.e., the weights on each arc are 1.0)
WFST topologies used for each. Remember that final states are indicated by using a
double circle instead of a single circle around the state label. Mark all the input and
output labels.
(a) Draw the WFST corresponding using a 1-state HMM for each letter and uniform
state transitions. Assume the output labels could be fed into a pronunciation
lexicon to recover the word, i.e., a - l - l → all
(b) Draw the WFST corresponding to the CTC topology. Recall that it should be able
to accept strings starting or ending with .
(c) Enumerate all possible length 5 alignments of the word “all” accepted by the HMM
topology where the HMMs for each letter are strung together with null arcs.
(d) Enumerate all possible length 5 alignments of the word “all” accepted by the CTC
topology
(e) Draw the HMM trellis, often called a lattice, corresponding to all possible length 5
sequences as a WFST. (It should still look very similar to a normal trellis). Ignore
2
the weights on the arcs. Do not draw any unnecessary (unused) nodes. Please
include input and output labels.
(f) Repeat (e) but for the CTC trellis, corresponding to all possible length 5 sequences
as a WFST.
2. WFST Composition
The outputs of a neural network, φ, can themselves be represented as a WFST. Imagine
the neural network has 4 outputs. In other words, it outputs vectors φt ∈ R1x4. The first
output corresponds to the symbol, the second symbol corresponds to a, the third
symbol to b and the fourth to l.
For this problem consider the matrix of scores, φ, where each column corresponds to
a time index, and each row corresponds to one of the network outputs. These are like
to observation probabilities in HMMs. We can draw this matrix as a WFST with 6 states
corresponding to the length of the neural network output (5+1, where +1 is for a start
state). Between each node there are as many arcs as there are output symbols (i.e., rows
in the matrix). The input and output labels for this WFST will be the same (i.e., it is a
WFSA), and the weights on the arcs correspond to the scores of those symbols at that
position in time. Use the following matrix.
−0.3938
−
7.7980
−1.5925
−8.8880
−3.4938
−2.6980
−4.6925
−10.3880
−7.7938
−0.0980
−3.6925
−6.4880
−8.7938
−3.6980
−0.5925
−3.1880
−1.2938
−7.3980
−1.5925
−3.1880
−3.8938
−8.5980
−5.0925
−0.0880
(3)
(a) Draw this WFST. Feel free to use some shorthand notation provided it is logicalif
this process seems tedious. We will call it Φ.
(b) Use the WFST, Φ, from part (a) and compose it, with the CTC WFST, whichwe will
call T, from problem (1.) using the log-semiring, i.e., Φ ◦ T. Assume unweighted
arcs all had a weight of 1. Recall that in the log-semiring, aLb = logea + eb, aNb = a
+ b, 1 = 0, and 0 = ∞. Show each step of the composition by denoting states with
the pairs of labels on states used in Φ and T, like (Φi,Tj). Feel free to use any
computational aid, i.e. a program or calculator, to help. Remember to remove any
branches that don’t end in final states. A final states occurs when the both states
in the pair of WFSTs are final.
(c) Compare the resulting WFST with the CTC trellis computed in problem (1.f) and
comment.
(d) Use the forward algorithm on the result of part (b) to compute the CTC
objectivefor this utterance. Again feel free to use any computational aid.
3
3. Prove that the posterior distribution modeled by CTC is globally normalized, i.e.,
theprobability of a sequence is computed as the score of the sequence normalized by
the sum of scores over all possible sequences and can be expressed as
. (4)
Make sure to state any assumptions used.
4. Using the result of problem (3.), show that maximizing the CTC objective maximizes
alower bound on the mutual information, I (X;Y ), between input and output
sequences.