Learning Notes, Programming, Python

Stochastic – Particle Filtering & Markov Chain Monte Carlo (MCMC) with python example



A particle can be seen as an evaluation of all random variables in a joint distribution.


\displaystyle  \text{Particle A: } [X=1, Y=2] \\ \\ \text{Particle B: } [X=3, Y=1] \\ \\ \text{where } X, Y \in  \{1, 2, 3\}


MCMC refers to methods for randomly sample particles from a joint distribution with a Markov Chain.

Particle Filtering

Particle Filtering is also termed Sequential Monte Carlo. It refers to the process of repeatedly sampling, cast votes after each iteration based on sampled particles and modify the next sampling based on the votes in order to obtain the probability distribution of some un-observable states.

Formally, let x be the unobservable states and y be the observable states related to x. Suppose we receive observations of y at each time step k, we can write the probability based on a Markov Chain:

\displaystyle X_k|(X_{k-1} =x_{k-1}) \propto p(x_k|x_{k-1})

\displaystyle Y_k|(X_{k} =x_{k}) \propto p(y_k|x_{k})

Based on Chapman-Kolmogorov Equation and Bayes Theorem, the conditional probability distribution of latent states x based on priori knowledge y is:

\displaystyle p(x_k|y_{1:k}) \propto p(y_k|x_k)\int_k p(x_k|x_{k-1})p(x_{k-1}|Y_{1:K-1})

MCMC Methods

Gibbs Sampling

Unknown: Joint distribution P(X_1, X_2, \dots, X_n)

Known: Conditional Probability P(X_i|\vec{X}_{others})

Goal: Obtain an estimation of the joint distribution


  1. Choose an initial value  X^0_i for the variable of interest.
  2. Compute distribution by randomly fixing  “others” variable P(X_j|X_i, \vec{X}_{others}) for some j \neq i
  3. Sample from distribution to get a realization of X_j , then update the conditional probability P(X_i|X_j, \vec{X}_{others}) correspondingly,
  4. Sample the target
  5. Do step 2 to step 3 repeatedly for all j \in [1, n] \neq i for k iterations.

An implementation is given below:

def main():
    This program demonstrates a two-variable Gibbs sampling iteration.

    X(size), Y(size)    Samplers which realize corresponding variables.
    PX, PY              Predefined probability distribution of the two random variable.
                        PX and PY are what we wish to estimate and is often unknown in
    properties          Property of the pdf PX and PY, including the domain, resolution and
                        a norm constant which is for plotting p.m.f
    :return None:
    X, Y, PX, PY, properties = GenerateSamplers()
    w = np.linspace(

    Xcollection = []
    x_k = X(1)  # Initial sampling
    y_0 = Y(1)  # Initial sampling
    PYcX = PY/x_k   # P(Y|X=x_k), should be know from statistical data instead
    PXcY = PX/y_0   # P(X|Y=y_0), should be know from statistical data also
    PYcX /= PYcX.sum() # Normalizing the conditional probabilities
    PXcY /= PXcY.sum()
    for k in xrange(50000):
        PYcX /= x_k # Update conditional probability
        PYcX /= PYcX.sum() # Normalize
        y_k = np.random.choice(w, p=PYcX, size=1) # sample from new probability distribution

        PXcY /= y_k # Update conditional probability
        PXcY /= PXcY.sum() # Normalize
        x_k = np.random.choice(w, p=PXcY, size=1)
        Xcollection.append(x_k) # Record the sample

    # Plotting
    plt.hist(np.array(Xcollection), bins=200, normed=1, alpha=0.5)
    plt.plot(w, PX/properties['normConstant'])

if __name__ == '__main__':

And the GenerateSampler() function:

def GenerateSamplers():
    Creates a pair of random variables, one probability distribution is a
     gaussian mixture, another is a simple gaussian with mean 0 and sd 10.

    Domain of the sample is set to -10 to 10

    :return [lambda: sample1, lambda: sample2:
    # Properties settings
    resolution = 2000 # 2000 partitions between whole domain
    domain = [-10, 10]
    gm = {'means': [-1, 2, -4], 'sds': [0.4, 8, 3], 'weight': [0.1, 0.6, 0.3]}
    gy = {'means': 0, 'sds': 5}

    # define a normed gaussian
    def Gaussian(mean, var, x):
        return 1 / (var * np.sqrt(2 * np.pi)) * np.exp(-0.5 * (x - mean) ** 2 / var ** 2)

    w = np.linspace(domain[0], domain[1], resolution)

    # Generate pdf
    PX = np.sum([gm['weight'][i]*Gaussian(gm['means'][i], gm['sds'][i], w)
                 for i in xrange(len(gm['means']))], axis=0)
    PY = Gaussian(gy['means'], gy['sds'], w)

    # Normalization
    PX /= PX.sum()
    PY /= PY.sum()

    # Create sampler functions
    X = lambda size: np.random.choice(w, p=PX, size=size)
    Y = lambda size: np.random.choice(w, p=PY, size=size)
    properties = {'resolution': resolution, 'domain': domain, 'normConstant': (domain[1] - domain[0])/float(resolution - 1)}
    return X, Y, PX, PY, properties

The result is the following figure, where P(X) is a mixture of gaussians (linear combination of gaussians):



See Also

Stochastic – Stationary Process Stochastic

Stochastic – Poisson Process with Python example

Stochastic – Python Example of a Random Walk Implementation



Learning Notes

Stochastic – Common Distributions


Each random variables can have a different probability distribution. Such distribution has a very dominant effect on the behavior of a stochastic process as described in previous articles Stochastic – Poisson Process and Stochastic – Random Walk.

Some commonly used distribution are recorded here.


Gaussian Distribution

\displaystyle p(x) = \frac{1}{\sqrt{2\pi \sigma}} \exp \left[\frac{-(x - x_0)^2}{2\sigma^2} \right]

\displaystyle \langle p(x) \rangle = x_0 \indent \langle p(x)^2 \rangle = \sigma^2


Poisson Distribution

\displaystyle p(n, t;\lambda) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}

\displaystyle \langle p(n, t;\lambda) \rangle = \lambda \indent \langle p(n, t;\lambda)^2 \rangle = \lambda


\displaystyle b(n) =  \begin{cases} p, & n=1\\1-p,&i=0\\0,&\text{otherwise} \end{cases}

\displaystyle \langle b(n) \rangle = p \indent \langle b(x)^2 \rangle = p(1-p)


\displaystyle g(n) = (1-p)^{n-1} p, \forall n \geq 1

\displaystyle \langle g(n) \rangle = \frac{1}{p} \indent \langle g(x)^2 \rangle = \frac{1-p}{p^2}


\displaystyle p(x) = \lambda e ^{-\lambda x}, \forall x \geq 0

\displaystyle \langle p(x) \rangle = \frac{1}{\lambda} \indent \langle p(x)^2 \rangle = \frac{1}{\lambda ^2}


\displaystyle p(x;\alpha, n) = \frac{\alpha^n x^{n-1} e^{-\alpha x}}{\Gamma(n)}, \forall x \geq 0

\displaystyle \langle p(x;\alpha, n) \rangle = \frac{n}{\alpha} \indent \langle p(x;\alpha, n)^2 \rangle = \frac{n}{\alpha^2}


\displaystyle p(x;\sigma) = \frac{x}{\sigma^2} \exp \left[-\frac{x^2}{2 \sigma^2}\right], \forall r > 0

\displaystyle \langle p(x;\sigma) \rangle = \sigma \sqrt{\frac{\pi}{2}} \indent \langle p(x)^2 \rangle = \sigma^2 \left(2-\frac{pi}{2}\right)

Properties: If two Gaussian variable X and Y are independent, then \sqrt{X^2 +Y^2} has Rayleigh distribution of the same variance as X and Y.

Learning Notes, Uncategorized

Stochastic – Wiener Process


p(x, t|x_0, t_0) = Transition probability from state (x_0, t_0) to (x, t)

g(s, t) = Generating function

W(t) = Sample path of a Wiener process


The definition of Wiener process is derived from the Fokker-Planck Equation, where the jump term of the master equation (or the Differential Chapman-Komogorov Equation) vanishes, and the coefficient of drift term A is zero and of diffusion term B is 1 [Eq.1]:

\displaystyle \frac{\partial}{\partial t} p(x,t |x_0, t_0) = \frac{1}{2}\frac{\partial^2}{\partial x^2} p(x, t|x_0, t_0)

A Wiener process is a Markov process which transitional probabilities fufill the upper equation.

Solving the PDE

Introduce the generating function [Eq.2]:

\displaystyle g(s, t) = \int dx  \big[ p(x, t|x_0, t_0) e^{isx} \big]

and also the Bra-ket notation, defined as follow:

\displaystyle \langle m | n \rangle = \delta (m-n)

\displaystyle \sum_n |n\rangle \langle n| = I

A Bra or ket are often referred as “basis”, because of the properties stated above, it is clear that once you define one of the basis, you can readily construct the other complement basis.

Rewriting the generating function:

\displaystyle | x \rangle = e^{isx}

\displaystyle | g(s, t) \rangle = \int dx \big[ |x \rangle  p(x, t|x_0, t_0) \big]

The complement basis can be defined as follow w.r.t the definition of Dirac-Delta function:

\displaystyle \langle y | = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \cdot e^{-iys} 

The generating function is selected so that when combined with [Eq.1] (i.e. the Fokker-Plank Equation) such that satisfies the following:

\displaystyle \frac{\partial g}{\partial t} = -\frac{1}{2} s^2 g

\displaystyle | g \rangle = \exp ( -\frac{1}{2} s^2 t ) |g \rangle
(solve by separation of variables)

Consider the initial condition: |g \rangle_0 = |x \rangle_0 = e^{isx_0} , we can solve for g(s, t):

\displaystyle | g(s, t) \rangle = \exp \big[-\frac{1}{2}s^2(t - t_0)\big] | x \rangle_0

Then by the property of Bra-ket:

\displaystyle \langle y | g(s, t) \rangle = p(y, t|x_0, t_0)

\displaystyle p(y, t|x_0, t_0) = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \Big[ e^{-isy} e^{isx_0} \exp \big[ -\frac{1}{2}s^2(t - t_0) \big] \Big]

\displaystyle \indent = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \Big[ \exp \big[ -isy + isx_0 -\frac{1}{2}s^2(t - t_0) \big] \Big]

Using integration by parts, we finally obtain the transition probability [Eq.3]:

\displaystyle p(x, t|x_0, t_0) = \frac{1}{\sqrt{2\pi(t-t_0)}} \exp \left[ -\frac{(x-x_0)^2}{2(t-t_0)} \right]

Interpretation of result

One can easily identify that the transitional probability is a Gaussian, then the actual process follows [Eq.3] and will have center and variance as follow:

\displaystyle \langle W(t) \rangle = x_0

\displaystyle \langle [W(t) - x_0]^2 \rangle = t - t_0


Gardiner, C. “Stochastic methods: a handbook for the natural and social sciences 4th ed.(2009).”

See also

Stochastic Process
Stochastic – Differential Chapman-Kolmogorov Equation

Programming, Python

Python – Scrapping Javascript Driven Web

Hi, I am migrating!

Because of the annoying fact that latex support is supper weak for official wordpress, I am moving to community wordpress.

For this post visit: http://learningnotes.fromosia.com/index.php/2017/04/06/python-scrapping-javascript-driven-web/

Required Packages


Note that this package has no official Windows release. This post will be based on Ubuntu.


sudo apt-get install qt5-default qt5-qmake libqt5webkit5-dev xvfb
sudo pip -H install webkit-server
sudo pip -H install dryscrape


Using XPath to locate web content

Commonly used syntax:

Syntax Effect
// Search all children recursively under current node
/ Search all children under current node
tag[@att=’val’] Search all ‘tag’ with ‘att’ attribute equal ‘val’


XML Content

<div><span id="DecentTag">First content to scrape </span>
<span class="Distraction"><span class="Distraction">
<span class="DecentClass"> Second content to scrape</span></span></span>
<div><span class="InnerSelf">Nope, nope, nope</span></div>

Then to get the three contents, you can use the following syntax


Using python to scrape web contents

If your target data doesn’t requires javascript running on the client, you can simply use python’s standard packages requests to obtain a string of web content following the example below

import lxml.html
import requests

url = "http://stackoverflow.com/help"
xpath = "id('help-index')/div[2]"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
tree = lxml.html.fromstring(r.content)
element = tree.xpath(xpath)

content = element.text_content()

Using python to scrape javascript driven web

If your target is updated by javascript from time to time, simple python request will not obtain what you want to get. Here we introduce a linux python package dryscrape. A simple example is given below:

import  dryscrape

sess = dryscrape.Session()

q = sess.at_xpath("some path")
content = q.text()

As simple as that

Learning Notes

Stochastic – Differential Chapman-Kolmogorov Equation


Here we do not show the derivation of Differential Chapman-Kolmogorov Equation, instead, we only show how to interpret the result. In the following sections, it is assumed that the stochastic process has Markov properties and the sample paths are always continuous and satisfy [Eq.1]:

\displaystyle \lim_{\Delta t \rightarrow 0} \frac{1}{\Delta t} = \int_{|\vec{x}-\vec{z}| > \delta} p(\vec{x}, t+\Delta t|\vec{z}, t) d\vec{x} = 0

which practically says that integrating all possibility of changes of states to (i.e. the “distance”) that are greater than  δ with δ > 0, the result probability is zero and, thus, there are no possibility that there are any changes of states within an infinitesimal period of time Δt. This is also called the Lindeberg condition.

Chapman-Kolomogov Equation

The Chapman-Kolomogov Equation defines, for a Markov Process, the conditional probabilities [Eq.2]:

\displaystyle p(\vec{x}_1, t_1|\vec{x}_2, t_2) = \int_{\vec{x} \in \Omega} p(\vec{x}_1, t_1|\vec{x}, t) \cdot p(\vec{x}, t| \vec{x}_2, t_2) d\vec{x}

Differential Chapman-Kolomogov Equation


W.r.t that, we requires the following properties in order to reduce [Eq.2] into a differential equation for all δ > 0 and denote condition |x-z| < δ as ε:

\displaystyle W(\vec{x}|\vec{z}, t)=\lim_{\Delta t \rightarrow 0} \frac{1}{\Delta t} \cdot p(\vec{x}, t+\Delta t|\vec{z}, t) \text{, for }|x-z| \geq \delta

\displaystyle A_i(\vec{z}, t) + O(\delta)  = \lim_{\Delta t \rightarrow 0}\frac{1}{\Delta t} \int_{\epsilon} p(\vec{x}, t+\Delta t|\vec{z}, t) (x_i - z_i) d\vec{x}

\displaystyle B_{ij}(\vec{z}, t) + O(\delta) \\ \indent =\lim_{\Delta t \rightarrow 0} \frac{1}{\Delta t} \int_{\epsilon} p(\vec{x}, t+\Delta t|\vec{z}, t) (x_i - z_i)(x_j - z_j) d\vec{x} 

In the first requirement, W(z, t) is the jump term which describe discontinuous evolution for all +ve non-zero distance between x and z. Note that if W = 0, we restore [Eq.1] and the sample path is continuous.

In the second requirement, if δ limit to zero, the O(δ) term vanishes (i.e. continuous path), and the term A_i is often called the drift term because it suggest that E[dx] = A(x, t)dt, which can be interpret as the instantaneous rate of change of z..

In the third requirement, B(z, t) is often referred as the diffusion term, which can be seen as the instantaneous rate of change of covariance of all process close to z.


The full derivation will not be stated here, in fact it can be easily found on the internet by search the key words Differential Chapman-Kolmogorov. However, it is crucial to understand the origin of it and a brief introduction to the origin of differential Chapman-Kolmogorov equation will be written below referencing Crispin Gardiner’s book Stochastic methods: a handbook for the natural and social sciences.

First, we consider the time evolution of certain function f(z) which we required it to be twice differentiable. Then the instantaneous rate of change of expected value of this function can be written as:

\displaystyle \frac{\partial}{\partial t} \int_{\vec{x} \in \Omega} f(\vec{x})p(\vec{x}, t|\vec{y}, t') d\vec{x}

\displaystyle = \lim_{\Delta t \rightarrow 0} \left[ \frac{1}{\Delta t} \int_{\vec{x} \in \Omega} d\vec{x} \left\{ f(\vec{x}) \Big[p(\vec{x}, t+\Delta t|\vec{y}, t') - p(\vec{x}, t|\vec{y}, t') \Big] \right\} \right]
[by first principle]

\displaystyle = \lim_{\Delta t \rightarrow 0} \left[ \frac{1}{\Delta t} \int d\vec{x} \int d\vec{z} \left\{ f(\vec{x}) \Big[ p(\vec{x}, t+\Delta t|\vec{z}, t)p(\vec{z}, t|\vec{y}, t') - f(\vec{z})p(\vec{z}, t|\vec{y}, t') \Big] \right\} \right]
[by Chapman-Kolmogorov Equation]

Divide the integrals into two regions |x-z| > δ and |x-z| < δ and some manipulation, the result will be [Eq.3]:

\displaystyle \frac{\partial p(\vec{z}, t|\vec{y}, t')}{\partial t} = - \sum_i \frac{\partial}{\partial z_i} \big[ A_i(\vec{z}, t)p(\vec{z}, t|\vec{y},  t') \big] + \frac{1}{2} \sum_{i, j} \frac{\partial^2}{\partial z_i \partial z_j}\Big[B_{ij}(\vec{z}, t)p(\vec{z}, t|\vec{y}, t')\Big] \\ \indent + \int_{\vec{x} \in \Omega} \Big[ W(\vec{z}|\vec{x}, t)p(\vec{x}, t|\vec{y}, t') - W(\vec{x}|\vec{z}, t)p(\vec{z}, t|\vec{y}, t') \Big] d\vec{x} 

this is the differential Chapman-Kolmogorov equation or sometimes called the master equation

Interpreting the equation

Discontinuous jumps

If we deliberately force the master equation to disobey [Eq.1], we will obtain a discontinuous process. For example, forcing both A(z, t) and B(z, t) to be zero, the differential equation is left to be:

\displaystyle \frac{\partial}{\partial t} p(\vec{z}, t|\vec{y}, t') = \int_{\vec{x} \in \Omega} d\vec{x} \Big[ W(\vec{z}|\vec{x}, t) p(\vec{x}, t|\vec{y}, t') - W(\vec{x}|\vec{z}, t)p(\vec{z}, t|\vec{y}, t')\Big]

Using initial condition p(\vec{z}, t|\vec{y}, t') = \delta(\vec{y} - \vec{z}) , with delta being the Dirac-Delta function, the particular solution to the master equation is:

\displaystyle p(\vec{z}, t+\Delta t|\vec{y}, t)

\displaystyle = \delta(\vec{y} - \vec{z})\Big[ 1 - \int_{\vec{x} \in \Omega} W(\vec{x}|\vec{y}, t)\Delta t d\vec{x}\Big] + W(\vec{z}|\vec{y}, t)\Delta t

see that if we want to know the probability of staying in y after Δt, we substitute z as y and get the following equation [Eq.4]

\displaystyle p(\vec{y}, t+\Delta t)|\vec{y}, t) = 1\Big[ 1-\int_{\vec{x} \in \Omega} W(\vec{x}|\vec{y}, t)\Delta t d\vec{x} \Big] + W(\vec{y}|\vec{y}, t)\Delta t 

Let’s break this down terms by terms so that it is easier to understand. Firstly, W(z|y, t) is the instantaneous probability (or probability per time) of going to state z from y, therefore, W(z|y, t)Δt is the accumulated probability of going to z from y within time period Δt. Secondly, the integral inside the square bracket denotes the total probability of leaving state y and going to any other states x, thus, one minus the integral is the total probability of staying in state y. In the case of [Eq.4], W(y|y, t)Δt is the probability of going from state y to state y meaning it includes the probability of first leaving state y then go back to y within Δt, and the integral represents ONLY staying in state y. Finally, the sum of both terms converge to the probability of staying in state y within Δt in [Eq.4]. 

Fokker-Planck Equation

The Fokker-Planck Equation is obtained when forcing the Jump term W in the master equation to become zero [Eq.5]

\displaystyle \frac{\partial p(\vec{z}, t|\vec{y}, t')}{\partial t} = - \sum_i \frac{\partial}{\partial z_i} \big[ A_i(\vec{z}, t)p(\vec{z}, t|\vec{y},  t') \big] + \frac{1}{2} \sum_{i, j} \frac{\partial^2}{\partial z_i \partial z_j}\Big[B_{ij}(\vec{z}, t)p(\vec{z}, t|\vec{y}, t')\Big]  

This equation is sometimes know as the diffusion process with drift term A and diffusion matrix B. Based on this equation, if we wish to compute p(\vec{z}, t+\Delta t|\vec{y}, t') , we can neglect the partial derivatives of A and so that equation [Eq.5] becomes:

\displaystyle \frac{\partial p(\vec{z}, t|\vec{y}, t')}{\partial t} = - \sum_i A_i(\vec{z}, t)\frac{\partial}{\partial z_i} \big[ p(\vec{z}, t|\vec{y},  t') \big] + \frac{1}{2} \sum_{i, j} B_{ij}(\vec{z}, t) \frac{\partial^2}{\partial z_i \partial z_j}\Big[p(\vec{z}, t|\vec{y}, t')\Big]  

and solve this equation with the constrain p(\vec{z}, t|\vec{y}, t) = \delta(\vec{z} - \vec{y}) (i.e. only one state exist at a time) and obtain:

\displaystyle p(\vec{z}, t+\Delta t|\vec{y}, t) = \sqrt{\Big[(2\pi)^N \text{det}(B(\vec{y}, t)\Delta t) \Big]}^{-1} \cdot \\ \indent \exp \bigg[ -\frac{1}{2\Delta t} [\vec{z}-\vec{y}-A(\vec{y}, t)\Delta t]^T B(\vec{y}, t)^{-1} [\vec{z} - \vec{y} - A(\vec{y}, t)\Delta t]\bigg]

which is, in fact, a Gaussian distribution with variance as matrix B and mean \vec{y} + A(\vec{y}, t)\Delta t .


Crispin W.. Gardiner. Stochastic methods: a handbook for the natural and social sciences. Springer, 2009.

Hierro, Juan, and César Dopazo. “Singular boundaries in the forward Chapman-Kolmogorov differential equation.” Journal of Statistical Physics 137.2 (2009): 305-329.

Stochastic Differential Equation with Applications