# Definition

## Particle

A particle can be seen as an evaluation of all random variables in a joint distribution.

Examples:

$\displaystyle \text{Particle A: } [X=1, Y=2] \\ \\ \text{Particle B: } [X=3, Y=1] \\ \\ \text{where } X, Y \in \{1, 2, 3\}$

## MCMC

MCMC refers to methods for randomly sample particles from a joint distribution with a Markov Chain.

## Particle Filtering

Particle Filtering is also termed Sequential Monte Carlo. It refers to the process of repeatedly sampling, cast votes after each iteration based on sampled particles and modify the next sampling based on the votes in order to obtain the probability distribution of some un-observable states.

Formally, let x be the unobservable states and y be the observable states related to x. Suppose we receive observations of y at each time step k, we can write the probability based on a Markov Chain:

$\displaystyle X_k|(X_{k-1} =x_{k-1}) \propto p(x_k|x_{k-1})$

$\displaystyle Y_k|(X_{k} =x_{k}) \propto p(y_k|x_{k})$

Based on Chapman-Kolmogorov Equation and Bayes Theorem, the conditional probability distribution of latent states x based on priori knowledge y is:

$\displaystyle p(x_k|y_{1:k}) \propto p(y_k|x_k)\int_k p(x_k|x_{k-1})p(x_{k-1}|Y_{1:K-1})$

# MCMC Methods

## Gibbs Sampling

Unknown: Joint distribution $P(X_1, X_2, \dots, X_n)$

Known: Conditional Probability $P(X_i|\vec{X}_{others})$

Goal: Obtain an estimation of the joint distribution

Steps:

1. Choose an initial value  $X^0_i$ for the variable of interest.
2. Compute distribution by randomly fixing  “others” variable $P(X_j|X_i, \vec{X}_{others})$ for some $j \neq i$
3. Sample from distribution to get a realization of $X_j$, then update the conditional probability $P(X_i|X_j, \vec{X}_{others})$ correspondingly,
4. Sample the target
5. Do step 2 to step 3 repeatedly for all $j \in [1, n] \neq i$ for k iterations.

An implementation is given below:

def main():
"""
This program demonstrates a two-variable Gibbs sampling iteration.

X(size), Y(size)    Samplers which realize corresponding variables.
PX, PY              Predefined probability distribution of the two random variable.
PX and PY are what we wish to estimate and is often unknown in
pactice.
properties          Property of the pdf PX and PY, including the domain, resolution and
a norm constant which is for plotting p.m.f
:return None:
"""
X, Y, PX, PY, properties = GenerateSamplers()
w = np.linspace(
properties['domain'][0],
properties['domain'][1],
properties['resolution'])

Xcollection = []
x_k = X(1)  # Initial sampling
y_0 = Y(1)  # Initial sampling
PYcX = PY/x_k   # P(Y|X=x_k), should be know from statistical data instead
PXcY = PX/y_0   # P(X|Y=y_0), should be know from statistical data also
PYcX /= PYcX.sum() # Normalizing the conditional probabilities
PXcY /= PXcY.sum()
for k in xrange(50000):
PYcX /= x_k # Update conditional probability
PYcX /= PYcX.sum() # Normalize
y_k = np.random.choice(w, p=PYcX, size=1) # sample from new probability distribution

PXcY /= y_k # Update conditional probability
PXcY /= PXcY.sum() # Normalize
x_k = np.random.choice(w, p=PXcY, size=1)
Xcollection.append(x_k) # Record the sample

# Plotting
plt.hist(np.array(Xcollection), bins=200, normed=1, alpha=0.5)
plt.plot(w, PX/properties['normConstant'])
plt.show()
pass

if __name__ == '__main__':
main()


And the GenerateSampler() function:

def GenerateSamplers():
"""
Creates a pair of random variables, one probability distribution is a
gaussian mixture, another is a simple gaussian with mean 0 and sd 10.

Domain of the sample is set to -10 to 10

:return [lambda: sample1, lambda: sample2:
"""
# Properties settings
resolution = 2000 # 2000 partitions between whole domain
domain = [-10, 10]
gm = {'means': [-1, 2, -4], 'sds': [0.4, 8, 3], 'weight': [0.1, 0.6, 0.3]}
gy = {'means': 0, 'sds': 5}

# define a normed gaussian
def Gaussian(mean, var, x):
return 1 / (var * np.sqrt(2 * np.pi)) * np.exp(-0.5 * (x - mean) ** 2 / var ** 2)

w = np.linspace(domain[0], domain[1], resolution)

# Generate pdf
PX = np.sum([gm['weight'][i]*Gaussian(gm['means'][i], gm['sds'][i], w)
for i in xrange(len(gm['means']))], axis=0)
PY = Gaussian(gy['means'], gy['sds'], w)

# Normalization
PX /= PX.sum()
PY /= PY.sum()

# Create sampler functions
X = lambda size: np.random.choice(w, p=PX, size=size)
Y = lambda size: np.random.choice(w, p=PY, size=size)
properties = {'resolution': resolution, 'domain': domain, 'normConstant': (domain[1] - domain[0])/float(resolution - 1)}
return X, Y, PX, PY, properties


The result is the following figure, where P(X) is a mixture of gaussians (linear combination of gaussians):

## Metropolis-Hastings

Stochastic – Stationary Process Stochastic

Stochastic – Poisson Process with Python example

Stochastic – Python Example of a Random Walk Implementation

# Reference

http://cs.stanford.edu/~ppasupat/a9online/1300.html

# Python – Scrapping Javascript Driven Web

## Hi, I am migrating!

Because of the annoying fact that latex support is supper weak for official wordpress, I am moving to community wordpress.

# Required Packages

## dryscrape

Note that this package has no official Windows release. This post will be based on Ubuntu.

### Installation

sudo apt-get install qt5-default qt5-qmake libqt5webkit5-dev xvfb
sudo pip -H install webkit-server
sudo pip -H install dryscrape


# Tutorial

## Using XPath to locate web content

Commonly used syntax:

Syntax Effect
// Search all children recursively under current node
/ Search all children under current node
tag[@att=’val’] Search all ‘tag’ with ‘att’ attribute equal ‘val’

### Examples

XML Content

&lt;div&gt;&lt;span id="DecentTag"&gt;First content to scrape &lt;/span&gt;
&lt;span class="Distraction"&gt;&lt;span class="Distraction"&gt;
&lt;span class="DecentClass"&gt; Second content to scrape&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;
&lt;div&gt;&lt;span class="InnerSelf"&gt;Nope, nope, nope&lt;/span&gt;&lt;/div&gt;
&lt;/div&gt;


Then to get the three contents, you can use the following syntax

id('DecentTag')
/body/div/span[@class='DecentClass']
/body//span[@class='InnerSelf']


## Using python to scrape web contents

If your target data doesn’t requires javascript running on the client, you can simply use python’s standard packages requests to obtain a string of web content following the example below

import lxml.html
import requests

url = "http://stackoverflow.com/help"
xpath = "id('help-index')/div[2]"

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
tree = lxml.html.fromstring(r.content)
element = tree.xpath(xpath)

content = element.text_content()


## Using python to scrape javascript driven web

If your target is updated by javascript from time to time, simple python request will not obtain what you want to get. Here we introduce a linux python package dryscrape. A simple example is given below:

import&nbsp; dryscrape

dryscrape.start_xvfb()
sess = dryscrape.Session()
sess.visit("http://stackoverflow.com/help")

q = sess.at_xpath("some path")
content = q.text()



As simple as that