Recently I have been doing some revision on Data Science as I have gotten bored from Big Data. When one relearns, one would understand the subject from a different perspective. Following is my short note.

- Result/
**Example**: MCMC Word decryption of Metropolis variation, using Unigram of word, Bigrams of word and Trigrams of letter.
**Motivation**: Given a known equation with a set of parameters, one would and could a). Get the probability of a set of parameters (and hence can apporoximate each parameter’s distribution) b). Get the best set of parameters.
**Problem**: Infinite combination of parameters and hence the calculation of denominator is intractable.
**Solution**: MCMC
**Intuition**: a) Monte Carlo – Sampling methods which follow a general format, i.e.: i) Identify parameters ii) Sample value from distribution iii) Compute the output iv) Aggregate result b) Markov Chain – Localized (dependency, new sample depends on previous draw to reduce search space by introducing constraints which varies according to algorithms)
**Common variation (algorithm)**:

a) Metropolis – for when i) probability distribution of random walk is “symmetric”, i.e.: equal in any direction. ii) Unknown full conditional distribution, meaning, distribution of parameters are not known.

b) Metropolis-Hastings – same as a) however augments asymmetric probability distribution of random walk.

c) Gibbs-Sampling – Used when full conditional distribution is known. Can still get the joint probability, proven using Hammersly-Clifford Theorem.
**MCMC parameters** to improve convergence: a) Burn in – To remove initial records from sampled set as they were erratic in movements b) Sampling interval – i.i.d c) Number of samples – to get to the intended distribution d) Perturbation size – Random walks magnitude.

Reference(s):

- MCMC

### Like this:

Like Loading...