Talk scoring ranking system 3 - Bayesian updating / Average

Talk scoring ranking system 3 - Bayesian updating / Average

Tags (separated by spaces): blog statistics garden scoring system to be completed


On a chatted with us to deal with small sample estimates Wilson interval estimation , but in principle it this way is more of a Trick, it does not solve the small sample size estimates are not credible when issues from nature, and It is estimated to add a relevant sample size and confidence lower bound, then the lower bound estimate alternative scoring.

In essence you want to solve the problem of small sample estimates are not credible, in line with a more logical way of thinking is based on our experience to give an estimate of the expected, and then continue with the collected samples to update our expectations, so the sample a small amount of time, the sample will not have a greater impact on our expectations, estimates the value approximates the value of our experience preset, so as to avoid as small sample size estimation is not credible.

Assuming that \ (\ theta \) is to be given our estimate, x is the data we collect, \ (\ PI (\ Theta) \) is based on the experience we expected given. Bayesian expressions are as follows:

\[ \begin{align} p(\theta|x) \propto p(x|\theta) * \pi(\theta) \end{align} \]

Principle seems simple, but the implementation of practical application there will be a few questions:

  • How to practical problems abstracted into a probability distribution \ (p (x | \ theta ) \)
  • How to set the expected probability distribution \ (\ pi (\ theta) \)
  • How to update new samples for distribution to obtain parameter estimates

Let us thumbs up before continuing with the example above answer questions one by one

Two yuan Bayesian updating

  1. Sample distribution abstract \ (p (x | \ theta
    ) \) our previous chapter discussed how to praise Paizhuan action of a user point of abstraction. Each user is simply a point Like \ (\ SIM Bernoulli (P) \) , independently of one another between users, so that users of an N article thumbs amount \ (\ sim Binomial (n, p) = \ left (\! \ begin {array}
    {c} n \\ k \ end {array} \! \ right) p ^ k (1-p) ^ {(nk)} \) abstract the probability distribution of the sample ,, parameter p (thumbs rate) how do we use these samples to estimate that we want to update it?

  2. Abstract intended distribution - conjugate distribution \ (\ pi (\ theta)
    \) which relates to another concept - conjugate prior distribution. The name is very difficult to remember tall (just over wiki to find the corresponding Chinese ...). Simple explanation if your prior distribution and posterior distribution is consistent, they are conjugate prior and posterior conjugate. This property is so attractive is that you can continue to update with new sample prior distribution. Because if the distribution constant \ (p (\ theta | x_i ) \ propto p (x_i | \ theta) * \ pi (\ theta) \) can always attached to take down \ (x_i, i \ in ( 1,2 , .. N) \)
    has several distribution of this nature, as applied to the binomial distribution is Beta. The prior distribution to beta distribution, binomial distribution continue to use the sample data to update the prior distribution, posterior distribution or beta distribution.

    Memory cards ~ Beta distribution
    Beta Function: \ (Beta (A, B) = \ FRAC {!! (A-. 1) (B-. 1)} {(A + B-. 1)!} \)
    Beta distribution probability density \ (F (X; A, B) = X ^ {(A-. 1)} (. 1-X) ^ {(B-. 1)} / Beta (A, B) \)
    Beta distribution statistics: \ (\ MU = \ frac {a} {a + b} \)

  3. Distribution Update - Bayesian updating
    see Beta probability density distribution of the binomial distribution is easy to think, because they are very similar. And x is the binomial contrast we want to estimate the parameters p, Beta two distribution parameters a, b correspond to the number of positive and negative samples k, nk. In other words Beta distribution is a binomial distribution parameters.
    Next we need to use as a Beta distribution properties conjugate distribution (after prior posterior distribution constant) to update the parameters:
    \ [\ align = left the begin {} \ PI (P | \ K + Alpha, \ Beta + nk) & = P (X = k | p, n) * \ pi (p | \ alpha, \ beta) \\ E (\ hat {p}) & = \ frac {\ alpha + k} {\ alpha + \ beta + n} \ leftarrow \ frac {\ alpha} {\ alpha + \ beta} \\ where & \ quad \ pi (\ alpha, \ beta) \ sim Beta (\ alpha, \ beta) \\ & \ quad x \ sim Binomial (n, p
    ) \ end {align} \] probability thumbs and if we anticipated brick is 50% / 50%, both \ (\ Alpha = \ Beta \) . When we are not very sure of the expected time (more convinced of user behavior), we \ (\ alpha, \ beta \ ) can give relatively small, this sample will soon be corrected prior probability, whereas \ (\ alpha , \ beta \) to a larger value. Here \ (\ alpha, \ beta \ ) can be understood as we expected based on the virtual sample set, the following is an intuitive example:
    \ (\ Alpha = 2, \ Beta = 2, \ Hat {P} = 0.5 \)When a point Like collected samples becomes the updated parameters, $ \ Alpha =. 3, \ Beta = 2, \ Hat {P} \ to $ 0.67
    \ (\ Alpha = 10, \ = 10 Beta, \ {P} = 0.5 Hat \) , if a collected sample point after updating parameters becomes Like, $ \ alpha = 11, \ beta = 10, \ hat {p} \ to 0.52 $
    a more intuitive \ ( \ alpha, \ beta \) values change on the distribution parameter p as shown below:
    image.png-146.2kB
  • \ (\ alpha, \ beta \ ) the greater the smaller the variance, the more concentrated distribution p
  • \ (\ alpha \) increases, the greater the mean estimate p
  • \ (\ beta \) increase, p estimated mean smaller
    set aside part of mathematics, Bayesian method to estimate the updated User rating can be very simple with the following expression, where \ (\ alpha \) is Like a predetermined amount of points, \ (\ Beta \) is a preset amount Paizhuan, n being the amount of all the samples collected, where k is the number of samples collected from the midpoint of the Like.
    \ [\ Hat {p} =
    \ frac {\ alpha + k} {\ alpha + \ beta + n} \] How to set \ (\ alpha \ beta \) determines where to start the final score updates, and the final score the sensitivity of the new sample.

Multivariate Bayesian updating

Above we made a simple abstraction of user behavior, including point only praise and brick two acts. The reality is often more complex, such as User rating (star rating), this is how we should be more prudent to use a Bayesian score estimate it?

Let us binomial distribution according to the above ideas to comb

  1. Sample distribution abstract \ (p (x | \ theta
    ) \) assumes that the user's score from 1 minute -5 points, users will eventually hit three points, you can be abstracted into a five possible number of distribution, with the user's choice vector representation is \ ((0,0,1,0,0) \) . The distribution of the number of the following
    \ [\ begin {align} P (x | \ theta) & = \ begin {pmatrix} N \\ x_1, ..., x_5 \\ \ end {pmatrix} \ prod_ {i = 1} ^ 5 \ theta_i ^ {
    x_i} \\ score & = \ sum_ {i = 1} ^ 5 p_i * i \ end {align} \] where N is the number of users, \ (x_i \) is selected to play points i number of users, to meet $ \ sum_ {i =. 1} ^. 5 x_i = N $, \ (\ theta_i \) is the probability of hit i points satisfy \ (\ sum_ {i = 1 } ^ 5 \ theta_i = 1 \ )
    we fight 1 minute to give a probability score -5 points collected by the user, with the desired number of final distribution as the final score.

  2. Conjugate Prior
    and binomial distribution, like, now we need to find a number of distribution conjugate prior distribution - Dirchlet distribution, beta distribution is extended to a number. Dirchlet probability density distribution as follows:
    \ [the Dir (\ Theta | \ Alpha) = \ {FRAC \ the Gamma (\ alpha_0)} {\ the Gamma (\ alpha_1) ... \ the Gamma (\ alpha_K)} \ {K = prod_ 1} ^ K \ theta ^ {
    \ alpha_k-1} \] where \ (\ alpha_0 = \ sum_i ^ K \ alpha_k \) , when K = 2 when Dirchlet is beta distribution. Beta and identically distributed, where \ (\ alpha_i \) can be interpreted as the expected score of the i priori.

  3. Bayesian updating
    to determine the prior distribution and posterior distribution, we use the same method with beta samples collected on the parameters are updated.
    \ [\ Begin {align} Dir (\ theta | D) & \ propto P (D | \ theta) * Dir (\ theta | \ alpha) \\ Dir (\ theta | D) & = \ frac {\ Gamma ( \ alpha_0 + N)} {\ Gamma (\ alpha_1 + m_1) ... \ Gamma (\ alpha_K + m_k)} \ prod_ {k = 1} ^ K \ theta ^ {\ alpha_k + m_k-1} \\ \ end {align} \]
    the above-described conditional probabilities may be simply understood as when we collected samples wherein N \ (m_i \) sample hit points i, i is the probability of hit points from the preset prior probability will \ (\ frac {\ alpha_i} {\ sum_i ^ k \ alpha_i} \) is updated to \ (\ frac {\ alpha_i +
    m_i} {\ sum_i ^ k \ alpha_i + N} \) have a posterior probability, we can be obtained as the final score
    \ [\ frac {\ sum_ { i = 1} ^ K i * (\ alpha_i + m_i)} {\ sum_i ^ k \ alpha_i + N} \]

  4. Bayesian average
    scoring a very common method for small samples, called Bayesian average, many scoring method movie sites are the source of it. Let's write about the expression:
    \ [X = \ {C FRAC * m + \ sum_ = {I}. 1 x_i ^ {K}} {C + N} \]
    where C is a preset amount of the sample (a small sample sample add some a priori), N is the amount of sample collection, m is an overall average score priori, \ (x_i \) is the score of each user. Bayesian average can be simply understood as a sample to be smoothed average priori estimation of the overall score.
    But in fact, let the above-described formula based on the score distribution gives Dirchlet be modified at $ \ sum_. 1} ^ {I = I * K \ alpha_i = C * m $, \ (\ sum_. 1} ^ {I = I * K = m_i \ sum_ = {I}. 1 x_i ^ K \) , will find two calculations is exactly the same! !



For scoring we discuss the time decay, as well as two methods do not solve the small sample estimate of confidence. But this is only a small part of the scoring system, there is a very interesting how based on a preference to adjust the final score. Have the opportunity to chat again


To be continue


Reference

  1. http://www.evanmiller.org/bayesian-average-ratings.html
  2. http://www.ruanyifeng.com/blog/2012/03/ranking_algorithm_bayesian_average.html
  3. https://www.quantstart.com/articles/Bayesian-Inference-of-a-Binomial-Proportion-The-Analytical-Approach
  4. https://en.wikipedia.org/wiki/Beta_function

Guess you like

Origin www.cnblogs.com/gogoSandy/p/11031234.html