Learning Notes, Programming, Python

Stochastic – Particle Filtering & Markov Chain Monte Carlo (MCMC) with python example



A particle can be seen as an evaluation of all random variables in a joint distribution.


\displaystyle  \text{Particle A: } [X=1, Y=2] \\ \\ \text{Particle B: } [X=3, Y=1] \\ \\ \text{where } X, Y \in  \{1, 2, 3\}


MCMC refers to methods for randomly sample particles from a joint distribution with a Markov Chain.

Particle Filtering

Particle Filtering is also termed Sequential Monte Carlo. It refers to the process of repeatedly sampling, cast votes after each iteration based on sampled particles and modify the next sampling based on the votes in order to obtain the probability distribution of some un-observable states.

Formally, let x be the unobservable states and y be the observable states related to x. Suppose we receive observations of y at each time step k, we can write the probability based on a Markov Chain:

\displaystyle X_k|(X_{k-1} =x_{k-1}) \propto p(x_k|x_{k-1})

\displaystyle Y_k|(X_{k} =x_{k}) \propto p(y_k|x_{k})

Based on Chapman-Kolmogorov Equation and Bayes Theorem, the conditional probability distribution of latent states x based on priori knowledge y is:

\displaystyle p(x_k|y_{1:k}) \propto p(y_k|x_k)\int_k p(x_k|x_{k-1})p(x_{k-1}|Y_{1:K-1})

MCMC Methods

Gibbs Sampling

Unknown: Joint distribution P(X_1, X_2, \dots, X_n)

Known: Conditional Probability P(X_i|\vec{X}_{others})

Goal: Obtain an estimation of the joint distribution


  1. Choose an initial value  X^0_i for the variable of interest.
  2. Compute distribution by randomly fixing  “others” variable P(X_j|X_i, \vec{X}_{others}) for some j \neq i
  3. Sample from distribution to get a realization of X_j , then update the conditional probability P(X_i|X_j, \vec{X}_{others}) correspondingly,
  4. Sample the target
  5. Do step 2 to step 3 repeatedly for all j \in [1, n] \neq i for k iterations.

An implementation is given below:

def main():
    This program demonstrates a two-variable Gibbs sampling iteration.

    X(size), Y(size)    Samplers which realize corresponding variables.
    PX, PY              Predefined probability distribution of the two random variable.
                        PX and PY are what we wish to estimate and is often unknown in
    properties          Property of the pdf PX and PY, including the domain, resolution and
                        a norm constant which is for plotting p.m.f
    :return None:
    X, Y, PX, PY, properties = GenerateSamplers()
    w = np.linspace(

    Xcollection = []
    x_k = X(1)  # Initial sampling
    y_0 = Y(1)  # Initial sampling
    PYcX = PY/x_k   # P(Y|X=x_k), should be know from statistical data instead
    PXcY = PX/y_0   # P(X|Y=y_0), should be know from statistical data also
    PYcX /= PYcX.sum() # Normalizing the conditional probabilities
    PXcY /= PXcY.sum()
    for k in xrange(50000):
        PYcX /= x_k # Update conditional probability
        PYcX /= PYcX.sum() # Normalize
        y_k = np.random.choice(w, p=PYcX, size=1) # sample from new probability distribution

        PXcY /= y_k # Update conditional probability
        PXcY /= PXcY.sum() # Normalize
        x_k = np.random.choice(w, p=PXcY, size=1)
        Xcollection.append(x_k) # Record the sample

    # Plotting
    plt.hist(np.array(Xcollection), bins=200, normed=1, alpha=0.5)
    plt.plot(w, PX/properties['normConstant'])

if __name__ == '__main__':

And the GenerateSampler() function:

def GenerateSamplers():
    Creates a pair of random variables, one probability distribution is a
     gaussian mixture, another is a simple gaussian with mean 0 and sd 10.

    Domain of the sample is set to -10 to 10

    :return [lambda: sample1, lambda: sample2:
    # Properties settings
    resolution = 2000 # 2000 partitions between whole domain
    domain = [-10, 10]
    gm = {'means': [-1, 2, -4], 'sds': [0.4, 8, 3], 'weight': [0.1, 0.6, 0.3]}
    gy = {'means': 0, 'sds': 5}

    # define a normed gaussian
    def Gaussian(mean, var, x):
        return 1 / (var * np.sqrt(2 * np.pi)) * np.exp(-0.5 * (x - mean) ** 2 / var ** 2)

    w = np.linspace(domain[0], domain[1], resolution)

    # Generate pdf
    PX = np.sum([gm['weight'][i]*Gaussian(gm['means'][i], gm['sds'][i], w)
                 for i in xrange(len(gm['means']))], axis=0)
    PY = Gaussian(gy['means'], gy['sds'], w)

    # Normalization
    PX /= PX.sum()
    PY /= PY.sum()

    # Create sampler functions
    X = lambda size: np.random.choice(w, p=PX, size=size)
    Y = lambda size: np.random.choice(w, p=PY, size=size)
    properties = {'resolution': resolution, 'domain': domain, 'normConstant': (domain[1] - domain[0])/float(resolution - 1)}
    return X, Y, PX, PY, properties

The result is the following figure, where P(X) is a mixture of gaussians (linear combination of gaussians):



See Also

Stochastic – Stationary Process Stochastic

Stochastic – Poisson Process with Python example

Stochastic – Python Example of a Random Walk Implementation




Python – Reminder to configuring Jupyter Qtconsole


The configuration details can be found here. You would need to create a file “jupyter_qtconsole_config.py” in your director ~/.jupyter.

Setting external editor

c.JupyterWidget.editor = u'Code' # Set visual studio code to the default editor


Programming, Python

Python – Scrapping Javascript Driven Web

Hi, I am migrating!

Because of the annoying fact that latex support is supper weak for official wordpress, I am moving to community wordpress.

For this post visit: http://learningnotes.fromosia.com/index.php/2017/04/06/python-scrapping-javascript-driven-web/

Required Packages


Note that this package has no official Windows release. This post will be based on Ubuntu.


sudo apt-get install qt5-default qt5-qmake libqt5webkit5-dev xvfb
sudo pip -H install webkit-server
sudo pip -H install dryscrape


Using XPath to locate web content

Commonly used syntax:

Syntax Effect
// Search all children recursively under current node
/ Search all children under current node
tag[@att=’val’] Search all ‘tag’ with ‘att’ attribute equal ‘val’


XML Content

<div><span id="DecentTag">First content to scrape </span>
<span class="Distraction"><span class="Distraction">
<span class="DecentClass"> Second content to scrape</span></span></span>
<div><span class="InnerSelf">Nope, nope, nope</span></div>

Then to get the three contents, you can use the following syntax


Using python to scrape web contents

If your target data doesn’t requires javascript running on the client, you can simply use python’s standard packages requests to obtain a string of web content following the example below

import lxml.html
import requests

url = "http://stackoverflow.com/help"
xpath = "id('help-index')/div[2]"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
r = requests.get(url, headers=headers)
tree = lxml.html.fromstring(r.content)
element = tree.xpath(xpath)

content = element.text_content()

Using python to scrape javascript driven web

If your target is updated by javascript from time to time, simple python request will not obtain what you want to get. Here we introduce a linux python package dryscrape. A simple example is given below:

import  dryscrape

sess = dryscrape.Session()

q = sess.at_xpath("some path")
content = q.text()

As simple as that

Learning Notes, Python

Python – Matplotlib – Saving animation as .gif files


You would need the following to make this work:

  1. Matplotlib
  2. Imagemagick

Imagemagick has replace the binary “Convert” to “Imagemagick” in version 7.0. If you encounter error when using convert, try to use imagemagick instead (for both Windows and Linux, no idea about MAC OS).


import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.animation as animation

mpl.rcParams['animation.ffmpeg_path'] = "C:/Path/To/Image/Magick/ffmpeg.exe"
# For Imagemagick 7.0, convert.exe is replaced by magick.exe
mpl.rcParams['animation.convert_path'] = "C:/Path/To/Image/Magick/magick.exe"

# Save the animation
ani = animation.FuncAnimation(fig, animateFunction, frames=N, interval=2, blit=True, repeat=False)
ani.save("./output.gif", writer='imagemagick', fps=60, bitrate=-1)
Learning Notes, Python

Stochastic – Poisson Process with Python example

Poisson Distribution

The Poisson distribution is in fact originated from binomial distribution, which express probabilities of events counting over a certain period of time. When this period of time becomes infinitely small, the binomial distribution is reduced to the Poisson distribution. The proof can be found here.

The Poisson Distribution can be formulated as follow:

\displaystyle P(X=k)=\frac{\lambda ^k e^{-\lambda}}{k!}, k \in \mathbb{Z}^+

where X is a random variable.


Figure 1. Poisson Distribution

Poisson Process


For a random process X(t) , it is identified as a Poisson process if it satisfy the following conditions:

  1. X(0) = 0
  2. Each incremental process are independent (i.e. It is a Markov process)
  3. P(X(t)=k) = [(\lambda t)^k e^{-\lambda t}]/k!

One can think of it as an evolving Poisson distribution which intensity λ scales with time (λ becomes λt) as illustrated in latter parts (Figure 3).

Note: If λ stays constant for all t then the process is identified as a homogeneous Poisson process, which is stationary process.

Example – Simulation of Poisson processes

Similar to the case in random walk, the Poisson process \{X(t) \in \mathbb{Z}^+; t \in T\} can be formulated as follow [Eq.1]:

\displaystyle X(t) = X_0 + \sum_{i=1}^{t} X_i

where by definition we requires X_0 to be zero.

import numpy as np
import matplotlib.pyplot as plt

# Prepare data
N = 50 # step
lambdas = [1, 2, 5]
X_T = [np.random.poisson(lam, size=N) for lam in lambdas]
S = [[np.sum(X[0:i]) for i in xrange(N)] for X in X_T]
X = np.linspace(0, N, N)

# Plot the graph
graphs = [plt.step(X, S[i], label="Lambda = %d"%lambdas[i])[0] for i in xrange(len(lambdas))]
plt.legend(handles=graphs, loc=2)
plt.title("Poisson Process", fontdict={'fontname': 'Times New Roman', 'fontsize': 21}, y=1.03)


Figure 2. Poisson Process, k against t

Further Elaboration of results

To show the upper process follows definition 3, which said [Eq.2]:

\displaystyle P(X(t)=k) = \frac{(\lambda t)^k e^{-\lambda t}}{k!}

the graph P( X(t) = k ) against t is plotted w.r.t. different values of λ.

Sees each peaks of different k at different t is actually the expected value of the Poisson process at the same t in Figure 2, it can also be interpreted as the most possible k at time t. An annotated comparison is provided below:

Evolution of Poisson process

The following animation shows how the probability of a process X(t) = k evolve with time. One can observe two main features:

  1. The probability distribution spread wider as time passes.
  2. The peak of the probability distribution shifts as time passes, correspond to the simulation in Figure 2.

where both features are actually governed by definition 3 [Eq.2].


Figure 3. Evolution of Poisson Process

See Also

Stochastic Process
Stochastic – Python Example of a Random Walk Implementation
Stochastic – Stationary Process Stochastic
Python – Matplotlib – Saving animation as .gif files