Hidden Markov Models (Unsupervised Learning Algorithms)

It is one of the more elaborate ML algorithms – a statical model that analyzes the features of data and groups it accordingly.

The HMM is based on augmenting the Markov chain.
– A Markov chain is a model that tells us something about the probabilities of sequences of random variables, states, each of which can take on values from some set.
– These sets can be words, or tags, or symbols representing anything, like the weather.
– A Markov chain makes a very strong assumption that if we want to predict the future in the sequence, all that matters is the current state. – The states before the current state have no impact on the future except via the current state.
– Example: It’s as if to predict tomorrow’s weather you could examine today’s weather but you weren’t allowed to look at yesterday’s weather.

It finds use in Pattern Recognition, Natural Language Processing (NLP), data analytics, etc.

A simple weather model

The probabilities of weather conditions (modeled as either rainy or sunny), given the weather on the preceding day, can be represented by a transition matrix:^[3] $P={\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}$

The matrix P represents the weather model in which a sunny day is 90% likely to be followed by another sunny day, and a rainy day is 50% likely to be followed by another rainy day. The columns can be labelled “sunny” and “rainy”, and the rows can be labelled in the same order.The above matrix as a graph.

(P)_{i j} is the probability that, if a given day is of type i, it will be followed by a day of type j.

Notice that the rows of P sum to 1: this is because P is a stochastic matrix.

Predicting the weather

The weather on day 0 (today) is known to be sunny. This is represented by a vector in which the “sunny” entry is 100%, and the “rainy” entry is 0%: ${\mathbf {x}}^{{(0)}}={\begin{bmatrix}1&0\end{bmatrix}}$

The weather on day 1 (tomorrow) can be predicted by: ${\mathbf {x}}^{{(1)}}={\mathbf {x}}^{{(0)}}P={\begin{bmatrix}1&0\end{bmatrix}}{\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}={\begin{bmatrix}0.9&0.1\end{bmatrix}}$

Thus, there is a 90% chance that day 1 will also be sunny.

The weather on day 2 (the day after tomorrow) can be predicted in the same way: ${\mathbf {x}}^{{(2)}}={\mathbf {x}}^{{(1)}}P={\mathbf {x}}^{{(0)}}P^{2}={\begin{bmatrix}1&0\end{bmatrix}}{\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}^{2}={\begin{bmatrix}0.86&0.14\end{bmatrix}}$

or ${\mathbf {x}}^{{(2)}}={\mathbf {x}}^{{(1)}}P={\begin{bmatrix}0.9&0.1\end{bmatrix}}{\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}={\begin{bmatrix}0.86&0.14\end{bmatrix}}$

General rules for day n are: ${\mathbf {x}}^{{(n)}}={\mathbf {x}}^{{(n-1)}}P$ ${\mathbf {x}}^{{(n)}}={\mathbf {x}}^{{(0)}}P^{n}$

Steady state of the weather

In this example, predictions for the weather on more distant days are increasingly inaccurate and tend towards a steady state vector. This vector represents the probabilities of sunny and rainy weather on all days, and is independent of the initial weather.

The steady state vector is defined as: ${\mathbf {q}}=\lim _{{n\to \infty }}{\mathbf {x}}^{{(n)}}$

but converges to a strictly positive vector only if P is a regular transition matrix (that is, there is at least one Pⁿ with all non-zero entries).

Since the q is independent from initial conditions, it must be unchanged when transformed by P. This makes it an eigenvector (with eigenvalue), and means it can be derived from P. For the weather example: ${\begin{aligned}P&={\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}\\\mathbf {q} P&=\mathbf {q} &&{\text{(}}\mathbf {q} {\text{ is unchanged by }}P{\text{.)}}\\&=\mathbf {q} I\\\mathbf {q} (P-I)&=\mathbf {0} \\\mathbf {q} \left({\begin{bmatrix}0.9&0.1\\0.5&0.5\end{bmatrix}}-{\begin{bmatrix}1&0\\0&1\end{bmatrix}}\right)&=\mathbf {0} \\\mathbf {q} {\begin{bmatrix}-0.1&0.1\\0.5&-0.5\end{bmatrix}}&=\mathbf {0} \\{\begin{bmatrix}q_{1}&q_{2}\end{bmatrix}}{\begin{bmatrix}-0.1&0.1\\0.5&-0.5\end{bmatrix}}&={\begin{bmatrix}0&0\end{bmatrix}}\\-0.1q_{1}+0.5q_{2}&=0\end{aligned}}$

and since they are a probability vector we know that $q_{1}+q_{2}=1.$

Solving this pair of simultaneous equations gives the steady state distribution: ${\begin{bmatrix}q_{1}&q_{2}\end{bmatrix}}={\begin{bmatrix}0.833&0.167\end{bmatrix}}$

In conclusion, in the long term, about 83.3% of days are sunny.

Let’s discuss the rest in the comments!

smriti-mishra