MCMC sampling algorithm, etc.

Original link: http://www.cnblogs.com/CSLaker/p/9962912.html

First, direct sampling

We thought direct sampling is evenly distributed through the sample, to realize sampling of randomly distributed. Because good guess evenly distributed sampling, sampling distribution we want a bad pick, it would take some complex sampling strategy adopted by simple request.
Y is assumed to obey an distribution p (y), which is the cumulative distribution function CDF h (y), with a sample z ~ Uniform (0,1), we let z = h (y), i.e., y = h (z) ^ (-1), y is the result of sampling of the distribution P (y) of.

314331-a158633099c6fd5c-2

CDF and inverse transformation and application of the core idea of ​​the direct sampling. In the original distribution p (y), if a large distribution area [a, b] then the corresponding curve in the CDF, [h (a), h (b)] of the slope of the curve will be greater. Then, after the inverse transform, when the y-axis (z) sampled uniformly distributed, multi-part profile (y-axis portion occupying poly) corresponding probability sample obtained even bigger,

limitation

In practice, the distribution of all samples are more complex, and solving the CDF solving inverse function are not feasible.

Second, refuse sampling

Samples are rejected by an easy distribution of sampled starting sampling algorithm generates the general distribution of a sample it is difficult to directly sample. Since p (x) can not be too complicated direct sampling in the program, then it may be a sample distribution q (x) such as a Gaussian distribution, and a method according to a certain reject some of the samples, to achieve the object close p (x) distribution.

25225434-fd6db018b45d4152a09ea1de2b5304ad

calculation steps

Setting a convenient sampling function q (x), and a constant k, such that p (x) below the total k * q (x) in. (See above)

  • x-axis: from Q (x) obtained by sampling the distribution of a;
  • y-axis direction: from a uniform distribution (0, k * q (a)) obtained by sampling the U;
  • If it fell right into a gray area: u> p (a), refused to accept this or sampling;
  • Repeat the process.

Calculation step (BN)

  • The prior probability distribution network specified generate a sampled samples;
  • Reject all the evidence does not match the sample;
  • In the remaining samples of the level of incident X = x frequent counts to obtain estimated probability,

limitation

  • Rejected too many samples! As evidence of the increased number of variables, the evidence is consistent with the proportion of the sample e occupied in all samples decreased exponentially, so the problem with this approach is complex completely unavailable.
  • Difficult to find suitable k * q (a), to accept the probability might be low.

Third, the importance sampling (likelihood weighting)

Likelihood Weighting

Importance sampling is mainly used to find a complex distribution p (x) of the mean , the last did not get samples.

The simple idea of ​​importance sampling distribution q (x) by means of an easy-sampling of this simple distribution q (x) The resulting samples of all accepted. But this certainly does not meet the resulting sample distribution p (x), it is necessary for each additional sample corresponding importance weight. The importance of sampling, to p (x0) / q (x0) as a weight value of the weight of each sample. Thus, when the sample and the distribution p (x) similar to the corresponding weight is greater; when distribution p (x) much difference, corresponding weights small. This approach is sampled with a sample weight of importance of the right to obey q (z) distribution, after the results of the weight multiplied by the sample is actually subject to p (z) distribution.

314331-0096a433b005eb92-2

Through the above formula, we can know the importance of sampling can be used to approximate the mean complex distribution.

Fourth, Gibbs sampling

Consider an example: E: eat, learn, play; time T: morning, afternoon, evening; weather W: sunny, windy, rain. Samples (E, T, W) meet certain probability distribution. Now we want to be sampled, such as: play + + sunny afternoon.

The problem is that we do not know p (E, T, W) , or that do not know the joint distribution of three things. Of course, if known, there is no need to use the Gibbs sampler. However, we know three things conditional distributions. In other words, p (E | T, W ), p (T | E, W), p (W | E, T). To do now is through these three conditions known distribution of Gibbs sampling method and then obtain the joint distribution.
Specific methods: First, a combination of random initialization, ie learning + + windy night, then change them according to a variable conditional probability. Specifically, suppose we know + windy night, we give E generates a variable, for example, learning → eat. We then follow the conditional probability of change under a variable, according to the learning + wind, the night turned into morning. Similarly, the wind into the wind (of course, can become the same variable). Such learning + dinner + night + morning + → windy windy. In the same manner to give a sequence, each containing three variables, i.e. a Markov chain. Then skip a certain number of initial cells (such as 100), and a separator unit to take a certain number (for example a spacer 20 to take). Such sample to the unit is approaching the joint distribution.

25225719-116e5550e9fd448f894cddc6b6ab02b4

Fifth, the sample reservoir

Sampling reservoir (Reservoir Sampling), i.e., the n can be carried out like in data O (n) time random probability, for example: from 1000 medium probability data 100 extracted. In addition, if the amount of data collection is particularly large or is growing (equivalent to the total amount of unknown data set), the algorithm can still equal probability sampling.

Algorithm steps:

  • First select the first k elements of the data stream, stored in set A;
  • From j (k + 1 <= j <= n) elements each time it starts first with probability p = k / j select whether the j-th element left. If j is selected, randomly selected from an element A and replaces it with the element j; otherwise, directly out of the element;
  • Repeat Step 2 until the end of the last set A is to ensure that the remaining k elements randomly.

Six, MCMC algorithms

Random sampling stochastic simulation: Gibbs sampling Gibbs Sampling

MCMC algorithms learning summary

[Emphasis] sampling method (b) MCMC algorithms introduced and the relevant code implementation

Markov chain convergence theorem

Markov Chain Theorem: If a non-periodic Markov chain having transition probability matrix P, and any of its two states are in communication, then the \ (\ lim_ {p \ to \ infty} P_ {ij} ^ n \) present and is independent of i, denoted \ (\ lim_ {P \ to \ infty} of P_ ^ {} ij of n-= \ PI (J) \) , we have:

20180604161544272

Wherein \ (\ pi = [\ pi (1), \ pi (2), ..., \ pi (j), ...], \ sum_ {i = 0} ^ {\ infty} \ pi_i = 1 , \ pi \) is called the stationary distribution of Markov chains.

All MCMC (Markov Chain Monte Carlo) methods are based on the theorem as a theoretical basis.

Description:

  1. State of the Markov chain theorem does not require limited, may be a plurality of infinite;
  2. Theorem "non-periodic" concept does not explain, because the vast majority of Markov chains we encounter are aperiodic;

20180604161729263

Smooth and detailed conditions

012132441751392

012132476599223

For a new distribution, how to construct the corresponding transition matrix?

For a distribution \ (\ PI (X) \) , according to the detailed stable condition , if the transfer matrix P configuration satisfies \ (\ PI (I) of P_ {ij of} = \ PI (J) of P_ {JI} \) , then \ (\ pi (x) \ ) is the stationary distribution of the Markov chain transition matrix can be constructed according to this condition.

Typically, the initial transition matrix \ (P \) generally does not satisfy the stationary conditions detailed, we construct the transition matrix by introducing a new acceptance rate \ (P '\) , and so \ (\ pi (x) \ ) to meet the meticulous stable condition. Thus, we can use any transition probability matrix (uniform distribution, Gaussian distribution) as the transition probability between states.

If we assume that the transition probabilities between states are the same, then the algorithm when receiving rate may simply have to use \ (\ pi (j) / \ pi (i) \) represented.

Metropolis-Hastings sampling

For a given probability distribution p (x), we hope to have a convenient way to generate its corresponding sample. Since the Markov chain can converge to the stationary distribution, so a very nice idea is: if we can construct a transition matrix for the Markov chain P so that the stationary distribution of the Markov chain is exactly p (x), then we from any departure along an initial state x0 Markov chain transfer, to obtain sequence x0, x1, x2, ⋯ xn a transfer, xn + 1 ⋯ ,, if the Markov chain has converged in step n, so we get π (x) samples xn, xn + 1 ⋯.

MCMC-something-1.jpg

In the state of the Markov chain, each state represents a sample \ (x_n \) , i.e., the assignment of all variables.

By analyzing the MCMC source can be known: the same state transition probability between the hypotheses, then the next sample on the sample will depend on a sample. Assuming that the original probability distribution corresponding to a sample \ (\ pi (x) \ ) is small, the acceptance rate of a high probability of a sample 1; on the other hand, if a probability distribution of the original sample \ (\ pi (x ) \) is large, then the next big probability sample will be rejected. This mechanism ensures that the resulting sample subject to distribution \ (\ PI (the X-) \) .

From the above analysis, the probability distribution corresponding to the sample if the initial state is very small, then the algorithm in the beginning of the samples generated during operation (even very small probability distribution of the sample) are very likely to be received, so that the algorithm sampling started running sample does not meet the original distribution \ (\ PI (the X-) \) . As long as the sampling algorithm to large distribution probability sample ( this case is the convergence! ), Then after the sampled samples will basically obey the original distribution. Of course, you need to run for a period of time when traversing from the initial state probability distribution to large state, this process is known as convergence process. MCMC algorithm after convergence to ensure that the distribution of probability \ (\ pi (x) \ ) big place to produce more samples, the distribution probability \ (\ pi (x) \ ) small local produce fewer samples.

A Markov state transition process to go through multiple uses to reach a steady state, this time the sample was relatively close to the real distribution. This process is called Burn in . Can generally be achieved by discarding burn in front of the N samples results.

doubt

  • MCMC convergence What does it mean? This process is leading to what parameters will be updated convergence? How to determine when convergence?

    No convergence process parameters will be updated convergence of ideas similar law of large numbers. Applying MCMC sampling algorithm, an initial small sample size, and distribution may subject the complex distribution \ (\ pi (x) \ ) differ very far, but as the number of state transitions (app transfer matrix P), based on the proof of the theorem, the final sample distribution will gradually obey complex distribution \ (\ PI (the X-) \) .

  • \ (\ pi \) is the probability distribution of each state corresponds to it? If so, the initially selected a state, the \ (\ pi \) How to set? In the MCMC or the certification process, the initial \ (\ pi \) of the probability distribution of how to set up?

    In the MCMC proof, \ (\ PI \) is the probability distribution of each state corresponds. The proof given initial \ (\ pi \) should just to prove whether the initial sample in line with what is distributed, after a transfer of a certain number of the resulting samples are subject to complex distribution \ (\ PI (the X-) \) , in the actual code implementation, without this \ (\ pi \) is set.

Seven, the code

import numpy as np
import random
import matplotlib.pyplot as plt
import pandas as pdf

Rejection Sampling

def f(x):
    if 0 <= x and x <= 0.25:
        y = 8 * x
    elif 0.25 < x and x <= 1:
        y = (1 - x) * 8/3
    else:
        y = 0
    return y
def g(x):
    if 0 <= x and x <= 1:
        y = 1
    else:
        y = 0
    return y
def plot(fun):
    X = np.arange(0, 1.0, 0.01)
    Y = []
    for x in X:
        Y.append(fun(x))
    plt.plot(X, Y)
    plt.xlabel("x")
    plt.ylabel("y")
    plt.show() 
plot(f)
plot(g)

1019353-20190105185946590-1126786209.png
1019353-20190105185952733-2058071248.png

def rejection_sampling(N=10000):
    M = 3
    cnt = 0
    samples = {}

    while cnt < N:
        x = random.random()
        acc_rate = f(x) / (M * g(x))
        u = random.random()
        if acc_rate >= u:
            if samples.get(x) == None:
                samples[x] = 1
            else:
                samples[x] = samples[x] + 1
            cnt = cnt + 1
            
    return samples
s = rejection_sampling(100000)
X = []
Y = []
for k, v in s.items():
    X.append(k)
    Y.append(v)
plt.hist(X, bins=100, edgecolor='None')  

1019353-20190105190002187-28177626.png

MCMC Sampling

Metropolis-Hastings Algorithm

Reference: MCMC algorithms related to introduction and code implementation

PI = 3.1415926

def get_p(x):
    # 模拟pi函数
    return 1/(2*PI)*np.exp(- x[0]**2 - x[1]**2)

def get_tilde_p(x):
    # 模拟不知道怎么计算Z的PI,20这个值对于外部采样算法来说是未知的,对外只暴露这个函数结果
    return get_p(x)
def domain_random(): #计算定义域一个随机值
    return np.random.random()*3.8-1.9
def metropolis(x):
    new_x = (domain_random(),domain_random()) #新状态
    #计算接收概率
    acc = min(1,get_tilde_p((new_x[0],new_x[1]))/get_tilde_p((x[0],x[1])))
    #使用一个随机数判断是否接受
    u = np.random.random()
    if u<acc:
        return new_x
    return x
def testMetropolis(counts = 100,drawPath = False):
    plt.figure()
    #主要逻辑
    x = (domain_random(),domain_random()) #x0
    xs = [x] #采样状态序列
    for i in range(counts):
        xs.append(x)
        x = metropolis(x) #采样并判断是否接受
    #在各个状态之间绘制跳转的线条帮助可视化
    X1 = [x[0] for x in xs]
    X2 = [x[1] for x in xs]
    if drawPath: 
        plt.plot(X1, X2, 'k-',linewidth=0.5)
    ##绘制采样的点
    plt.scatter(X1, X2, c = 'g',marker='.')
    plt.show()
testMetropolis(5000)

1019353-20190105190015327-958250173.png

def metropolis(x):
    new_x = domain_random()
    #计算接收概率
    acc = min(1,f(new_x)/f(x))
    #使用一个随机数判断是否接受
    u = np.random.random()
    if u<acc:
        return new_x
    return x
def testMetropolis(counts = 100,drawPath = False):
    plt.figure()
    #主要逻辑
    x = domain_random()
    xs = [x] #采样状态序列
    for i in range(counts):
        xs.append(x)
        x = metropolis(x) #采样并判断是否接受
    #在各个状态之间绘制跳转的线条帮助可视化
    plt.hist(xs, bins=100, edgecolor='None')  
#     plt.plot(xs)
    plt.show()
testMetropolis(100000)

1019353-20190105190047964-1071205782.png

Gibbs Sampling

def partialSampler(x,dim):
    xes = []
    for t in range(10): #随机选择10个点
        xes.append(domain_random())
    tilde_ps = []
    for t in range(10): #计算这10个点的未归一化的概率密度值
        tmpx = x[:]
        tmpx[dim] = xes[t]
        tilde_ps.append(get_tilde_p(tmpx))
    #在这10个点上进行归一化操作,然后按照概率进行选择。
    norm_tilde_ps = np.asarray(tilde_ps)/sum(tilde_ps)
    u = np.random.random()
    sums = 0.0
    for t in range(10):
        sums += norm_tilde_ps[t]
        if sums>=u:
            return xes[t]
def gibbs(x):
    rst = np.asarray(x)[:]
    path = [(x[0],x[1])]
    for dim in range(2): #维度轮询,这里加入随机也是可以的。
        new_value = partialSampler(rst,dim)
        rst[dim] = new_value
        path.append([rst[0],rst[1]])
    #这里最终只画出一轮轮询之后的点,但会把路径都画出来
    return rst,path
def testGibbs(counts = 100,drawPath = False):
    plt.figure()

    x = (domain_random(),domain_random())
    xs = [x]
    paths = [x]
    for i in range(counts):
        xs.append([x[0],x[1]])
        x,path = gibbs(x)
        paths.extend(path) #存储路径
        
    p1 = [x[0] for x in paths]
    p2 = [x[1] for x in paths]
    xs1 = [x[0] for x in xs]
    xs2 = [x[1] for x in xs]
    if drawPath: 
        plt.plot(p1, p2, 'k-',linewidth=0.5)
    ##绘制采样的点
    plt.scatter(xs1, xs2, c = 'g',marker='.')
    plt.show()
testGibbs(5000)

1019353-20190105190055812-1550383359.png

Reproduced in: https: //www.cnblogs.com/CSLaker/p/9962912.html

Guess you like

Origin blog.csdn.net/weixin_30888413/article/details/94856814