Markov Chain Monte Carlo (MCMC)

Recently I have been doing some revision on Data Science as I have gotten bored from Big Data. When one relearns, one would understand the subject from a different perspective. Following is my short note.

  1. Result/ Example:  MCMC Word decryption of Metropolis variation, using Unigram of word, Bigrams of word and Trigrams of letter.
  2. Motivation: Given a known equation with a set of parameters, one would and could a). Get the probability of a set of parameters   (and hence can apporoximate each parameter’s distribution) b). Get the best set of parameters.
  3. Problem: Infinite combination of parameters and hence the calculation of denominator is intractable.
  4. Solution: MCMC
  5. Intuition: a) Monte Carlo – Sampling methods which follow a general format, i.e.: i) Identify parameters ii) Sample value from distribution iii) Compute the output iv)  Aggregate result b) Markov Chain – Localized (dependency, new sample depends on previous draw to reduce search space by introducing constraints which varies according to algorithms)
  6. Common variation (algorithm):
    a) Metropolis – for when i) probability distribution of random walk is “symmetric”, i.e.: equal in any direction. ii) Unknown full conditional distribution, meaning, distribution of parameters are not known.
    b) Metropolis-Hastings – same as a) however augments asymmetric probability distribution of  random walk.
    c) Gibbs-Sampling – Used when full conditional distribution is known. Can still get the joint probability, proven using Hammersly-Clifford Theorem.
  7. MCMC parameters to improve convergence: a) Burn in – To remove initial records from sampled set as they were erratic in movements b)  Sampling interval – i.i.d c) Number of samples – to get to the intended distribution d) Perturbation size – Random walks magnitude.


  1. MCMC

Leave a Reply