Introduction to Hidden Markov Models

A Simple Introduction to Hidden Markov Models

Wesam Elshamy @ Diagnoss

Motivating Example*

How to estimate average temp 1,000 years ago?
Can't measure directly ... no records too
Solution: Use indirect evidence

* A Revealing Introduction to Hidden Markov Models -- Mark Stamp 2018

Each year, a tree grows new cells over old ones
Creating one light and one dark growth rings
Air temp affects size of growth rings

High temps (usually) lead to larger rings
Other factors affect ring size
Ring size is observed
Temp is hidden

Emission probabilities

Temp effect on ring size is not deterministic
Only Hot and Cold temps
Only Small, Medium and Large ring sizes

$$ \begin{array}{ccc} & & \begin{array}{ccc} S & \quad M \quad & L \end{array}\\ & \begin{array}{c} H\\ C\end{array} & \left[ \begin{array}{ccc} 0.1 & 0.4 & 0.5\\ 0.7 & 0.2 & 0.1 \end{array} \right] \end{array} $$

State Transition Matrix

We also know probability of:

Hot year → Hot year = 0.7
Cold year → Cold year = 0.6

$$\begin{array}{cc} & & \begin{array}{ccc} H & \quad C\end{array}\\ & \begin{array}{c} H\\ C\end{array} & \left[ \begin{array}{cc} 0.7 & 0.3\\ 0.4 & 0.6 \end{array} \right]\\ \\ & \pi= & \left[\begin{array}{cc}0.6 & 0.4\end{array}\right] \end{array} $$

HMM Notation

$$\pi= \left[\begin{array}{cc}0.6 & 0.4\end{array}\right] \quad \text{Initial state} \\ A = \left[ \begin{array}{cc} 0.7 & 0.3\\ 0.4 & 0.6 \end{array} \right] \quad \text{State transition} \\ B = \left[ \begin{array}{ccc} 0.1 & 0.4 & 0.5\\ 0.7 & 0.2 & 0.1 \end{array} \right] \quad \text{Emission} $$

HMM Problem

Given tree rings $$\mathcal{O} = (S, M, S, L)$$
What were the temps on those four years?

Probability temps HHCC generating rings SMSL

$$ P(HHCC) = 0.6(0.1)(0.7)(0.4)(0.3)(0.7)(0.6)(0.1) = 0.000212 $$

Probability of each state sequence generating SMSL

$$ \begin{array}{ll} \text{state} & \text{probability}\\ \hline HHHH & .042787\\ HHHC & .003635\\ HHCH & .073320\\ HHCC & .022017\\ HCHH & .005193\\ HCHC & .000415\\ HCCH & .031364\\ HCCC & .009451\\ CHHH & .114031\\ CHHC & .009762\\ CHCH & .195451\\ CHCC & .058573\\ CCHH & .048811\\ CCHC & .004154\\ CCCH & .293073\\ CCCC & .087963\\ \hline \end{array} $$

Method 1

State sequence with max probability $$ P(CCCH)= .293073$$

Method 2

State with max probability at each step

$$ \begin{array}{lllll} \hline & 0 & 1 & 2 & 3\\ \hline P(H) & 0.188182 & 0.519576 & 0.228788 & 0.804029\\ P(C) & 0.811818 & 0.480424 & 0.771212 & 0.195971\\ \hline \end{array} $$

Most likely sequence of states CHCH

Method depends on application

Problem 1

$$ \begin{array}{rl} \text{Given:} & \begin{array}{rl} \pi & \text{Initial states prop}\\ A & \text{State transition prop}\\ B & \text{Emission prop}\\ \mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence} \end{array}\\ \hline \text{Find:} & \text{Probability of given observation sequence} \end{array} $$

Solution

$$ P(\mathcal{O|\lambda}) = \sum_X \pi_{x_0}b_{x_0}(\mathcal{O}_0) a_{x_0,x_1}b_{x_1}(\mathcal{O}_1) \dots a_{x_{T-2},x_{T-1}}b_{x_{T-1}}(\mathcal{O}_{T-1})\\ 2TN^T \text{ multiplications} \qquad N: \text{ Number of states} $$

T=100, N=10, that will be 2E+102 multiplications
Computationally infeasible for practical problems
Instead, use forward algorithm to compute
Computes partial observations and reuses them (DP)
Requires TN² mults instead (1E+4 vs 2E+102)

Problem 2

$$ \begin{array}{rl} \text{Given:} & \begin{array}{rl} \pi & \text{Initial states prop}\\ A & \text{State transition prop}\\ B & \text{Emission prop}\\ \mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence} \end{array}\\ \hline \text{Find:} & \text{Optimal state sequence (we did that)} \end{array} $$

Solution

$$ \operatorname*{argmax}_X P(\mathcal{O}|\lambda) $$

Maximize expected number of correct states (Method 2)
Not state sequence with highest probability
Use backward algorithm, also TN² complexity
Starts at end of path and works back to beginning

Problem 3

$$ \begin{array}{rl} \text{Given:} & \begin{array}{rl} \mathcal{O}_0, \mathcal{O}_1, \dots & \text{Observation sequence}\\ N & \text{Number of states}\\ M & \text{Number of observation symbols}\\ \end{array}\\ \hline \text{Find:} & \begin{array}{rl} \pi & \text{Initial states prop}\\ A & \text{State transition prop}\\ B & \text{Emission prop}\\ \end{array} \end{array} $$

Solution

$$ \operatorname*{argmax}_{\pi,A,B} P(\mathcal{O}|\lambda) $$

Tune π, A, B to max probability of observations
Initialize π, A, B with best guess (random?)
Use forward-backward algorithm to compute observation props
Update π, A, B using new prop values
Repeat if P(O|λ) increased

Second order Markov chains

States have longer memory
Each state depends on two preceding states
Could improve performance
Complicates problem solving

When to use HMM

Time series problems with hidden, discrete states

Speech recognition

Given: audio recordings of words
Find: hidden states (words: yes, no)

When to use HMM

Time series problems with hidden, discrete states

Next word prediction

Given: typed words
Find: hidden sentence states
Needs higher order models for good performance

Questions

These slides at: https://welshamy.github.io/intro-to-hidden-markov-models/
Example from: A Revealing Introduction to Hidden Markov Models - Mark Stamp 2018
For more: A Tutorial on Hidden Markov Models and. Selected Applications in Speech Recognition - Rabiner 1989