A Simple Introduction to Hidden Markov Models
Wesam Elshamy @ Diagnoss
Motivating Example*
- How to estimate average temp 1,000 years ago?
- Can't measure directly ... no records too
- Solution: Use indirect evidence

* A Revealing Introduction to Hidden Markov Models -- Mark Stamp 2018
- Each year, a tree grows new cells over old ones
- Creating one light and one dark growth rings
- Air temp affects size of growth rings

- High temps (usually) lead to larger rings
- Other factors affect ring size
- Ring size is observed
- Temp is hidden

Emission probabilities
- Temp effect on ring size is not deterministic
- Only Hot and Cold temps
- Only Small, Medium and Large ring sizes
$$ \begin{array}{ccc}
& & \begin{array}{ccc} S & \quad M \quad & L \end{array}\\
& \begin{array}{c} H\\ C\end{array} &
\left[
\begin{array}{ccc}
0.1 & 0.4 & 0.5\\
0.7 & 0.2 & 0.1
\end{array}
\right]
\end{array}
$$
State Transition Matrix
We also know probability of:
- Hot year → Hot year = 0.7
- Cold year → Cold year = 0.6
$$\begin{array}{cc}
& & \begin{array}{ccc} H & \quad C\end{array}\\
& \begin{array}{c} H\\ C\end{array} &
\left[
\begin{array}{cc}
0.7 & 0.3\\
0.4 & 0.6
\end{array}
\right]\\ \\
& \pi= & \left[\begin{array}{cc}0.6 & 0.4\end{array}\right]
\end{array}
$$
HMM Notation
$$\pi= \left[\begin{array}{cc}0.6 & 0.4\end{array}\right] \quad \text{Initial state} \\
A = \left[
\begin{array}{cc}
0.7 & 0.3\\
0.4 & 0.6
\end{array}
\right] \quad \text{State transition} \\
B = \left[
\begin{array}{ccc}
0.1 & 0.4 & 0.5\\
0.7 & 0.2 & 0.1
\end{array}
\right] \quad \text{Emission}
$$

HMM Problem
- Given tree rings $$\mathcal{O} = (S, M, S, L)$$
- What were the temps on those four years?

Probability temps HHCC generating rings SMSL
$$ P(HHCC) = 0.6(0.1)(0.7)(0.4)(0.3)(0.7)(0.6)(0.1) = 0.000212 $$
$$\pi= \left[\begin{array}{cc}0.6 & 0.4\end{array}\right] \quad \text{Initial state} \\
A = \left[
\begin{array}{cc}
0.7 & 0.3\\
0.4 & 0.6
\end{array}
\right] \quad \text{State transition} \\
B = \left[
\begin{array}{ccc}
0.1 & 0.4 & 0.5\\
0.7 & 0.2 & 0.1
\end{array}
\right] \quad \text{Emission}
$$

Probability of each state sequence generating SMSL
$$
\begin{array}{ll}
\text{state} & \text{probability}\\
\hline
HHHH & .042787\\
HHHC & .003635\\
HHCH & .073320\\
HHCC & .022017\\
HCHH & .005193\\
HCHC & .000415\\
HCCH & .031364\\
HCCC & .009451\\
CHHH & .114031\\
CHHC & .009762\\
CHCH & .195451\\
CHCC & .058573\\
CCHH & .048811\\
CCHC & .004154\\
CCCH & .293073\\
CCCC & .087963\\
\hline
\end{array}
$$
Method 1
State sequence with max probability $$ P(CCCH)= .293073$$
Method 2
State with max probability at each step
$$
\begin{array}{lllll}
\hline
& 0 & 1 & 2 & 3\\
\hline
P(H) & 0.188182 & 0.519576 & 0.228788 & 0.804029\\
P(C) & 0.811818 & 0.480424 & 0.771212 & 0.195971\\
\hline
\end{array}
$$
Most likely sequence of states
CHCH
Method depends on application
Problem 1
$$
\begin{array}{rl}
\text{Given:} & \begin{array}{rl}
\pi & \text{Initial states prop}\\
A & \text{State transition prop}\\
B & \text{Emission prop}\\
\mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence}
\end{array}\\
\hline
\text{Find:} & \text{Probability of given observation sequence}
\end{array}
$$

Solution
$$
P(\mathcal{O|\lambda}) = \sum_X \pi_{x_0}b_{x_0}(\mathcal{O}_0)
a_{x_0,x_1}b_{x_1}(\mathcal{O}_1)
\dots a_{x_{T-2},x_{T-1}}b_{x_{T-1}}(\mathcal{O}_{T-1})\\
2TN^T \text{ multiplications} \qquad N: \text{ Number of states}
$$
- T=100, N=10, that will be 2E+102 multiplications
- Computationally infeasible for practical problems
- Instead, use forward algorithm to compute
- Computes partial observations and reuses them (DP)
- Requires TN2 mults instead (1E+4 vs 2E+102)

Problem 2
$$
\begin{array}{rl}
\text{Given:} & \begin{array}{rl}
\pi & \text{Initial states prop}\\
A & \text{State transition prop}\\
B & \text{Emission prop}\\
\mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence}
\end{array}\\
\hline
\text{Find:} & \text{Optimal state sequence (we did that)}
\end{array}
$$

Solution
$$
\operatorname*{argmax}_X P(\mathcal{O}|\lambda)
$$
- Maximize expected number of correct states (Method 2)
- Not state sequence with highest probability
- Use backward algorithm, also TN2 complexity
- Starts at end of path and works back to beginning

Problem 3
$$
\begin{array}{rl}
\text{Given:} & \begin{array}{rl}
\mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence}\\
N & \text{Number of states}\\
M & \text{Number of observation symbols}\\
\end{array}\\
\hline
\text{Find:} & \begin{array}{rl}
\pi & \text{Initial states prop}\\
A & \text{State transition prop}\\
B & \text{Emission prop}\\
\end{array}
\end{array}
$$

Solution
$$
\operatorname*{argmax}_{\pi,A,B} P(\mathcal{O}|\lambda)
$$
- Tune π, A, B to max probability of observations
- Initialize π, A, B with best guess (random?)
- Use forward-backward algorithm to compute observation props
- Update π, A, B using new prop values
- Repeat if P(O|λ) increased

Second order Markov chains
- States have longer memory
- Each state depends on two preceding states
- Could improve performance
- Complicates problem solving

When to use HMM
Time series problems with hidden, discrete states
Speech recognition
- Given: audio recordings of words
- Find: hidden states (words: yes, no)
When to use HMM
Time series problems with hidden, discrete states
Next word prediction
- Given: typed words
- Find: hidden sentence states
- Needs higher order models for good performance
Questions
