Learning Notes

# Introduction

Each random variables can have a different probability distribution. Such distribution has a very dominant effect on the behavior of a stochastic process as described in previous articles Stochastic – Poisson Process and Stochastic – Random Walk.

Some commonly used distribution are recorded here.

# Distributions

## Gaussian Distribution

$\displaystyle p(x) = \frac{1}{\sqrt{2\pi \sigma}} \exp \left[\frac{-(x - x_0)^2}{2\sigma^2} \right]$

$\displaystyle \langle p(x) \rangle = x_0 \indent \langle p(x)^2 \rangle = \sigma^2$

## Poisson Distribution

$\displaystyle p(n, t;\lambda) = \frac{(\lambda t)^n e^{-\lambda t}}{n!}$

$\displaystyle \langle p(n, t;\lambda) \rangle = \lambda \indent \langle p(n, t;\lambda)^2 \rangle = \lambda$

## Bernoulli

$\displaystyle b(n) = \begin{cases} p, & n=1\\1-p,&i=0\\0,&\text{otherwise} \end{cases}$

$\displaystyle \langle b(n) \rangle = p \indent \langle b(x)^2 \rangle = p(1-p)$

## Geometric

$\displaystyle g(n) = (1-p)^{n-1} p, \forall n \geq 1$

$\displaystyle \langle g(n) \rangle = \frac{1}{p} \indent \langle g(x)^2 \rangle = \frac{1-p}{p^2}$

## Exponential

$\displaystyle p(x) = \lambda e ^{-\lambda x}, \forall x \geq 0$

$\displaystyle \langle p(x) \rangle = \frac{1}{\lambda} \indent \langle p(x)^2 \rangle = \frac{1}{\lambda ^2}$

## Gamma

$\displaystyle p(x;\alpha, n) = \frac{\alpha^n x^{n-1} e^{-\alpha x}}{\Gamma(n)}, \forall x \geq 0$

$\displaystyle \langle p(x;\alpha, n) \rangle = \frac{n}{\alpha} \indent \langle p(x;\alpha, n)^2 \rangle = \frac{n}{\alpha^2}$

## Rayleigh

$\displaystyle p(x;\sigma) = \frac{x}{\sigma^2} \exp \left[-\frac{x^2}{2 \sigma^2}\right], \forall r > 0$

$\displaystyle \langle p(x;\sigma) \rangle = \sigma \sqrt{\frac{\pi}{2}} \indent \langle p(x)^2 \rangle = \sigma^2 \left(2-\frac{pi}{2}\right)$

Properties: If two Gaussian variable X and Y are independent, then $\sqrt{X^2 +Y^2}$ has Rayleigh distribution of the same variance as X and Y.

# Annotation

$p(x, t|x_0, t_0)$ = Transition probability from state (x_0, t_0) to (x, t)

$g(s, t)$ = Generating function

$W(t)$ = Sample path of a Wiener process

# Definition

The definition of Wiener process is derived from the Fokker-Planck Equation, where the jump term of the master equation (or the Differential Chapman-Komogorov Equation) vanishes, and the coefficient of drift term A is zero and of diffusion term B is 1 [Eq.1]:

$\displaystyle \frac{\partial}{\partial t} p(x,t |x_0, t_0) = \frac{1}{2}\frac{\partial^2}{\partial x^2} p(x, t|x_0, t_0)$

A Wiener process is a Markov process which transitional probabilities fufill the upper equation.

# Solving the PDE

Introduce the generating function [Eq.2]:

$\displaystyle g(s, t) = \int dx \big[ p(x, t|x_0, t_0) e^{isx} \big]$

and also the Bra-ket notation, defined as follow:

$\displaystyle \langle m | n \rangle = \delta (m-n)$
(Orthogonality)

$\displaystyle \sum_n |n\rangle \langle n| = I$
(Completeness)

A Bra or ket are often referred as “basis”, because of the properties stated above, it is clear that once you define one of the basis, you can readily construct the other complement basis.

Rewriting the generating function:

$\displaystyle | x \rangle = e^{isx}$

$\displaystyle | g(s, t) \rangle = \int dx \big[ |x \rangle p(x, t|x_0, t_0) \big]$

The complement basis can be defined as follow w.r.t the definition of Dirac-Delta function:

$\displaystyle \langle y | = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \cdot e^{-iys}$

The generating function is selected so that when combined with [Eq.1] (i.e. the Fokker-Plank Equation) such that satisfies the following:

$\displaystyle \frac{\partial g}{\partial t} = -\frac{1}{2} s^2 g$

$\displaystyle | g \rangle = \exp ( -\frac{1}{2} s^2 t ) |g \rangle$
(solve by separation of variables)

Consider the initial condition: $|g \rangle_0 = |x \rangle_0 = e^{isx_0}$, we can solve for g(s, t):

$\displaystyle | g(s, t) \rangle = \exp \big[-\frac{1}{2}s^2(t - t_0)\big] | x \rangle_0$

Then by the property of Bra-ket:

$\displaystyle \langle y | g(s, t) \rangle = p(y, t|x_0, t_0)$

$\displaystyle p(y, t|x_0, t_0) = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \Big[ e^{-isy} e^{isx_0} \exp \big[ -\frac{1}{2}s^2(t - t_0) \big] \Big]$

$\displaystyle \indent = \frac{1}{2\pi} \int_{-\infty}^{\infty} ds \Big[ \exp \big[ -isy + isx_0 -\frac{1}{2}s^2(t - t_0) \big] \Big]$

Using integration by parts, we finally obtain the transition probability [Eq.3]:

$\displaystyle p(x, t|x_0, t_0) = \frac{1}{\sqrt{2\pi(t-t_0)}} \exp \left[ -\frac{(x-x_0)^2}{2(t-t_0)} \right]$

# Interpretation of result

One can easily identify that the transitional probability is a Gaussian, then the actual process follows [Eq.3] and will have center and variance as follow:

$\displaystyle \langle W(t) \rangle = x_0$

$\displaystyle \langle [W(t) - x_0]^2 \rangle = t - t_0$

# Reference

Gardiner, C. “Stochastic methods: a handbook for the natural and social sciences 4th ed.(2009).”

Stochastic Process
Stochastic – Differential Chapman-Kolmogorov Equation

Python

# Tutorial

The configuration details can be found here. You would need to create a file “jupyter_qtconsole_config.py” in your director ~/.jupyter.

## Setting external editor

c.JupyterWidget.editor = u'Code' # Set visual studio code to the default editor


# Python – Scrapping Javascript Driven Web

## Hi, I am migrating!

Because of the annoying fact that latex support is supper weak for official wordpress, I am moving to community wordpress.

# Required Packages

## dryscrape

Note that this package has no official Windows release. This post will be based on Ubuntu.

### Installation

sudo apt-get install qt5-default qt5-qmake libqt5webkit5-dev xvfb
sudo pip -H install webkit-server
sudo pip -H install dryscrape


# Tutorial

## Using XPath to locate web content

Commonly used syntax:

Syntax Effect
// Search all children recursively under current node
/ Search all children under current node
tag[@att=’val’] Search all ‘tag’ with ‘att’ attribute equal ‘val’

### Examples

XML Content

&lt;div&gt;&lt;span id="DecentTag"&gt;First content to scrape &lt;/span&gt;
&lt;span class="Distraction"&gt;&lt;span class="Distraction"&gt;
&lt;span class="DecentClass"&gt; Second content to scrape&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;div&gt;&lt;span class="InnerSelf"&gt;Nope, nope, nope&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;


Then to get the three contents, you can use the following syntax

id('DecentTag')
/body/div/span[@class='DecentClass']
/body//span[@class='InnerSelf']


## Using python to scrape web contents

If your target data doesn’t requires javascript running on the client, you can simply use python’s standard packages requests to obtain a string of web content following the example below

import lxml.html
import requests

url = "http://stackoverflow.com/help"
xpath = "id('help-index')/div[2]"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
tree = lxml.html.fromstring(r.content)
element = tree.xpath(xpath)

content = element.text_content()


## Using python to scrape javascript driven web

If your target is updated by javascript from time to time, simple python request will not obtain what you want to get. Here we introduce a linux python package dryscrape. A simple example is given below:

import&nbsp; dryscrape

dryscrape.start_xvfb()
sess = dryscrape.Session()
sess.visit("http://stackoverflow.com/help")

q = sess.at_xpath("some path")
content = q.text()



As simple as that