Laplace distribution operator development experience sharing

summary: Laplace is used for probability statistics and random sampling of Laplace distribution.

This article is shared from Huawei Cloud Community " Laplace Distributed Operator Development Experience Sharing ", author: Li Changan.

1. Task analysis

A detailed description:

Laplace is used for probability statistics and random sampling of Laplace distribution. The goal of this task is to expand based on the existing probability distribution scheme in the Paddle framework and add a new Laplace API. The call path is: paddle.distribution.Laplace . For the class signature and each method signature, please design by researching Paddle and industry implementation practices. The code style and design ideas are required to be consistent with the existing probability distribution.

In fact, a lot has been said, but one thing: to realize the Laplace distribution operator, first we need to know what is the Laplace distribution. In probability theory and statistics, the Laplace distribution is a continuous probability distribution. Since it can be seen as two exponential distributions in different positions put together back-to-back, it is also called a double exponential distribution. In contrast to the normal distribution, which is expressed as the square of the difference from the mean value of μ, the Laplace probability density is expressed as the absolute value of the difference. As shown in the code below, the image of the Laplace distribution is actually somewhat similar to the normal distribution, so its formula is similar to that of the normal distribution.

%matplotlib inline 
import matplotlib.pyplot as plt
import numpy as np
def laplace_function(x, lambda_):
 return (1/(2*lambda_)) * np.e**(-1*(np.abs(x)/lambda_))
x = np.linspace(-5,5,10000)
y1 = [laplace_function(x_,1) for x_ in x]
y2 = [laplace_function(x_,2) for x_ in x]
y3 = [laplace_function(x_,0.5) for x_ in x]
plt.plot(x, y1, color='r', label="lambda:1")
plt.plot(x, y2, color='g', label="lambda:2")
plt.plot(x, y3, color='b', label="lambda:0.5")
plt.title("Laplace distribution")
plt.legend()
plt.show()

2. Design document writing

Design documents are the embodiment of our API design ideas and are a necessary part of the entire development work. Through the above task introduction, we can know that the development of this API is mainly the development of Laplace distribution, and some corresponding methods need to be included. First of all, we need to figure out the mathematical principles of the Laplace distribution. It is recommended to go to Wikipedia to check the mathematical principles of the Laplace distribution to understand the mathematical principles. In addition, we can refer to the code implementation of Numpy, Scipy, Pytorch, and Tensorflow to write design documents.

First of all, we should know the probability density function formula, cumulative distribution function, and inverse cumulative distribution function of Laplace distribution, and develop the code according to the formula. The formula is as follows:

Referring to the code implementation of Numpy, Scipy, Pytorch, and Tensorflow, we can easily implement the code corresponding to the formula here. The implementation scheme is shown in Section 3.1 below.

2.1 API Implementation Scheme

This API is implemented in paddle.distribution.Laplace .

Develop based on paddle.distribution API base class.

The specific implementation in the class API (some methods have been developed, so the source code is used directly), this API has two parameters: the location parameter self.loc, and the scale parameter self.scale. Contains the following methods:

  • mean calculates the mean:
 self.loc
  • stddev calculates the standard deviation:
 (2 ** 0.5) * self.scale;
  • variance calculates the variance:
 self.stddev.pow(2)
  • sample random sampling (refer to pytorch multiplexing reparameterization sampling results):
 self.rsample(shape)
  • rsample reparameterized sampling:
 self.loc - self.scale * u.sign() * paddle.log1p(-u.abs())

Where u = paddle .uniform(shape=shape, min=eps - 1, max=1); eps is determined according to dtype;

  • prob probability density (including parameter value):
 self.log_prob(value).exp()

Directly inherit parent class implementation

  • log_prob logarithmic probability density (value):
 -paddle.log(2 * self.scale) - paddle.abs(value - self.loc) / self.scale
  • entropy entropy calculation:
 1 + paddle.log(2 * self.scale)
  • cdf cumulative distribution function (value):
 0.5 - 0.5 * (value - self.loc).sign() * paddle.expm1(-(value - self.loc).abs() / self.scale)
  • icdf inverse cumulative distribution function (value):
 self.loc - self.scale * (value - 0.5).sign() * paddle.log1p(-2 * (value - 0.5).abs())
  • kl_divergence kl divergence between two Laplace distributions (other – an instance of the Laplace class):
 (self.scale * paddle.exp(paddle.abs(self.loc - other.loc) / self.scale) + paddle.abs(self.loc - other.loc)) / other.scale + paddle.log(other.scale / self.scale) - 1

References: https://openaccess.thecvf.com/content/CVPR2021/supplemental/Meyer_An_Alternative_Probabilistic_CVPR_2021_supplemental.pdf

At the same time, register the _kl_laplace_laplace function in paddle /distribution/kl.py. When using it, you can directly call kl_divergence to calculate the kl divergence between laplace distributions.

2.2 Considerations for testing and acceptance

After we have developed the corresponding code, how should we prove that the code we have developed is correct? At this time, we need unit test code to prove that our code is correct. So what is unit testing? A unit test case is actually a collection of "input data" and "expected output". You need to input data with you, and give the expected output according to the logical function. The logical function mentioned here refers to the expected output that can be given through the requirements document. Rather than the expected output that we deduce through the implemented code. This is also the most overlooked point. You have to do unit tests, and then use the code to infer the expected output. If your code logic is wrong and the expected output is wrong, then your unit test will be meaningless. In fact, this part can be said to be the most important and difficult part of the whole work. We need to figure out the expected output and how to deduce the expected output through the implemented code. Only when the unit test passes, our development The task is basically completed.

According to the different methods and characteristics of the api class, the unit test is divided into three parts: the characteristics of the test distribution (no additional parameters are required), the probability density function of the test distribution (need to pass values), and the test KL divergence (need to pass in an instance).

1. Test the characteristics of the Lapalce distribution

  • Test method: This part mainly tests the distribution's mean, variance, entropy and other characteristics. The class TestLaplace inherits unittest.TestCase and implements the methods setUp (initialization), test_mean (mean single test), test_variance (variance single test), test_stddev (stddev single test), test_entropy (entropy single test), test_sample (sample single test).
    • The mean, variance, and standard deviation calculate the corresponding values ​​through Numpy, and compare the return values ​​of the corresponding properties in the Laplace class. If they are consistent, they are correct;
    • In addition to verifying whether the returned data type and data shape are legal, the sampling method also needs to prove that the sampling results conform to the laplace distribution. The verification strategy is as follows: Randomly sample 30,000 sample values ​​under the laplace distribution, calculate the mean and variance of the sampling samples, and compare the mean and variance returned by scipy.stats.laplace under the same distribution to check whether they are within a reasonable error range; -Smirnov test further verifies whether the sampling belongs to the laplace distribution. If the calculated ks value is less than 0.02, the inconsistent hypothesis is rejected, and the two belong to the same distribution;
    • The entropy calculation verifies the correctness of the result by comparing whether the value of scipy.stats.laplace.entropy is consistent with the return value of the class method.
  • Test case: Single test needs to cover single-dimensional Laplace distribution and multi-dimensional distribution, so two initialization parameters are used
    • ‘one-dim’: loc=parameterize.xrand((2, )), scale=parameterize.xrand((2, ));
    • ‘multi-dim’: loc=parameterize.xrand((5, 5)), scale=parameterize.xrand((5, 5))。

2. Test the probability density function of the Lapalce distribution

  • Test method: This part mainly tests various probability density functions of the distribution. The class TestLaplacePDF inherits unittest.TestCase and implements the methods setUp (initialization), test_prob (prob single test), test_log_prob (log_prob single test), test_cdf (cdf single test), test_icdf (icdf). The above distributions are all implemented in scipy.stats.laplace. Therefore, given an input value, compare the results of the scipy implementation and paddle implementation of the Laplace distribution under the same parameters. If the error is within the tolerance range, it proves that the implementation is correct.
  • Test case: without loss of generality, the test uses multi-dimensional position parameters and scale parameters to initialize the Laplace class, and covers int type input and float type input.
    • ‘value-float’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2., 5.]); * ‘value-int’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([2, 5]);
    • ‘value-multi-dim’: loc=np.array([0.2, 0.3]), scale=np.array([2, 3]), value=np.array([[4., 6], [8, 2]])。

3. Test the KL divergence between Lapalce distributions

  • Test method: This part tests the KL divergence between two Laplace distributions. The class TestLaplaceAndLaplaceKL inherits unittest.TestCase and implements setUp (initialization) and test_kl_divergence (kl_divergence) respectively. In scipy, scipy.stats.entropy can be used to calculate the divergence between two distributions. Therefore, comparing the two Laplace distributions under paddle.distribution.kl_divergence and the divergence calculated under scipy.stats.laplace, if the result is within the error range, it proves that the method is implemented correctly.
  • Test case: distribution 1: loc=np.array([0.0]), scale=np.array([1.0]), distribution 2: loc=np.array([1.0]), scale=np.array([0.5 ])

3. Code development

The development of the code mainly refers to Pytorch. Here it involves the development of unit test code, kl divergence registration and other codes. It is necessary to carefully read the implementation forms of other distribution codes in PaddlePaddle .

import numbers
import numpy as np
import paddle
from paddle.distribution import distribution
from paddle.fluid import framework as framework
class Laplace(distribution.Distribution):
    r"""
    Creates a Laplace distribution parameterized by :attr:`loc` and :attr:`scale`.
    Mathematical details
    The probability density function (pdf) is
 .. math::
 pdf(x; \mu, \sigma) = \frac{1}{2 * \sigma} * e^{\frac {-|x - \mu|}{\sigma}}
    In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 loc (scalar|Tensor): The mean of the distribution.
 scale (scalar|Tensor): The scale of the distribution.
 name(str, optional): Name for the operation (optional, default is None). For more information, please refer to :ref:`api_guide_Name`.
    Examples:
 .. code-block:: python
 import paddle
                        m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.sample()  # Laplace distributed with loc=0, scale=1
                        # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True, 
                        # [3.68546247])
 """
    def __init__(self, loc, scale):
 if not isinstance(loc, (numbers.Real, framework.Variable)):
            raise TypeError(
 f"Expected type of loc is Real|Variable, but got {type(loc)}")
 if not isinstance(scale, (numbers.Real, framework.Variable)):
            raise TypeError(
 f"Expected type of scale is Real|Variable, but got {type(scale)}"
 )
 if isinstance(loc, numbers.Real):
            loc = paddle.full(shape=(), fill_value=loc)
 if isinstance(scale, numbers.Real):
            scale = paddle.full(shape=(), fill_value=scale)
 if (len(scale.shape) > 0 or len(loc.shape) > 0) and (loc.dtype
 == scale.dtype):
 self.loc, self.scale = paddle.broadcast_tensors([loc, scale])
 else:
 self.loc, self.scale = loc, scale
 super(Laplace, self).__init__(self.loc.shape)
    @property
    def mean(self):
 """Mean of distribution.
        Returns:
            Tensor: The mean value.
 """
 return self.loc
    @property
    def stddev(self):
 """Standard deviation.
        The stddev is 
 .. math::
 stddev = \sqrt{2} * \sigma
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            Tensor: The std value.
 """
 return (2**0.5) * self.scale
    @property
    def variance(self):
 """Variance of distribution.
        The variance is 
 .. math::
            variance = 2 * \sigma^2
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            Tensor: The variance value.
 """
 return self.stddev.pow(2)
    def _validate_value(self, value):
 """Argument dimension check for distribution methods such as `log_prob`,
 `cdf` and `icdf`. 
 Args:
 value (Tensor|Scalar): The input value, which can be a scalar or a tensor.
        Returns:
          loc, scale, value: The broadcasted loc, scale and value, with the same dimension and data type.
 """
 if isinstance(value, numbers.Real):
            value = paddle.full(shape=(), fill_value=value)
 if value.dtype != self.scale.dtype:
            value = paddle.cast(value, self.scale.dtype)
 if len(self.scale.shape) > 0 or len(self.loc.shape) > 0 or len(
 value.shape) > 0:
            loc, scale, value = paddle.broadcast_tensors(
 [self.loc, self.scale, value])
 else:
            loc, scale = self.loc, self.scale
 return loc, scale, value
    def log_prob(self, value):
 """Log probability density/mass function.
        The log_prob is
 .. math::
            log\_prob(value) = \frac{-log(2 * \sigma) - |value - \mu|}{\sigma}
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor|Scalar): The input value, can be a scalar or a tensor.
        Returns:
          Tensor: The log probability, whose data type is same with value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.log_prob(value) 
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [-0.79314721])
 """
        loc, scale, value = self._validate_value(value)
 log_scale = -paddle.log(2 * scale)
 return (log_scale - paddle.abs(value - loc) / scale)
    def entropy(self):
 """Entropy of Laplace distribution.
        The entropy is:
 .. math::
 entropy() = 1 + log(2 * \sigma)
        In the above equation:
 * :math:`scale = \sigma`: is the scale parameter.
        Returns:
            The entropy of distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.entropy()
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [1.69314718])
 """
 return 1 + paddle.log(2 * self.scale)
    def cdf(self, value):
 """Cumulative distribution function.
        The cdf is
 .. math::
 cdf(value) = 0.5 - 0.5 * sign(value - \mu) * e^\frac{-|(\mu - \sigma)|}{\sigma}
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor): The value to be evaluated.
        Returns:
            Tensor: The cumulative probability of value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.cdf(value)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [0.54758132])
 """
        loc, scale, value = self._validate_value(value)
 iterm = (0.5 * (value - loc).sign() *
 paddle.expm1(-(value - loc).abs() / scale))
 return 0.5 - iterm
    def icdf(self, value):
 """Inverse Cumulative distribution function.
        The icdf is 
 .. math::
 cdf^{-1}(value)= \mu - \sigma * sign(value - 0.5) * ln(1 - 2 * |value-0.5|)
        In the above equation:
 * :math:`loc = \mu`: is the location parameter.
 * :math:`scale = \sigma`: is the scale parameter.
 Args:
 value (Tensor): The value to be evaluated.
        Returns:
            Tensor: The cumulative probability of value.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            value = paddle.to_tensor([0.1])
 m.icdf(value)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [-1.60943794])
 """
        loc, scale, value = self._validate_value(value)
        term = value - 0.5
 return (loc - scale * (term).sign() * paddle.log1p(-2 * term.abs()))
    def sample(self, shape=()):
 """Generate samples of the specified shape.
 Args:
 shape(tuple[int]): The shape of generated samples.
        Returns:
            Tensor: A sample tensor that fits the Laplace distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.sample()  # Laplace distributed with loc=0, scale=1
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [3.68546247])
 """
 if not isinstance(shape, tuple):
            raise TypeError(
 f'Expected shape should be tuple[int], but got {type(shape)}')
 with paddle.no_grad():
 return self.rsample(shape)
    def rsample(self, shape):
 """Reparameterized sample.
 Args:
 shape(tuple[int]): The shape of generated samples.
        Returns:
            Tensor: A sample tensor that fits the Laplace distribution.
        Examples:
 .. code-block:: python
 import paddle
                            m = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
 m.rsample((1,))  # Laplace distributed with loc=0, scale=1
                            # Tensor(shape=[1, 1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [[0.04337667]])
 """
        eps = self._get_eps()
        shape = self._extend_shape(shape) or (1, )
        uniform = paddle.uniform(shape=shape,
                                 min=float(np.nextafter(-1, 1)) + eps / 2,
                                 max=1. - eps / 2,
 dtype=self.loc.dtype)
 if len(self.scale.shape) == 0 and len(self.loc.shape) == 0:
            loc, scale, uniform = paddle.broadcast_tensors(
 [self.loc, self.scale, uniform])
 else:
            loc, scale = self.loc, self.scale
 return (loc - scale * uniform.sign() * paddle.log1p(-uniform.abs()))
    def _get_eps(self):
 """
        Get the eps of certain data type.
        Note: 
            Since paddle.finfo is temporarily unavailable, we 
            use hard-coding style to get eps value.
        Returns:
            Float: An eps value by different data types.
 """
        eps = 1.19209e-07
 if (self.loc.dtype == paddle.float64
                or self.loc.dtype == paddle.complex128):
            eps = 2.22045e-16
 return eps
    def kl_divergence(self, other):
 """Calculate the KL divergence KL(self || other) with two Laplace instances.
        The kl_divergence between two Laplace distribution is
 .. math::
 KL\_divergence(\mu_0, \sigma_0; \mu_1, \sigma_1) = 0.5 (ratio^2 + (\frac{diff}{\sigma_1})^2 - 1 - 2 \ln {ratio})
 .. math::
            ratio = \frac{\sigma_0}{\sigma_1}
 .. math::
            diff = \mu_1 - \mu_0
        In the above equation:
 * :math:`loc = \mu`: is the location parameter of self.
 * :math:`scale = \sigma`: is the scale parameter of self.
 * :math:`loc = \mu_1`: is the location parameter of the reference Laplace distribution.
 * :math:`scale = \sigma_1`: is the scale parameter of the reference Laplace distribution.
 * :math:`ratio`: is the ratio between the two distribution.
 * :math:`diff`: is the difference between the two distribution.
 Args:
 other (Laplace): An instance of Laplace.
        Returns:
            Tensor: The kl-divergence between two laplace distributions.
        Examples:
 .. code-block:: python
 import paddle
                            m1 = paddle.distribution.Laplace(paddle.to_tensor([0.0]), paddle.to_tensor([1.0]))
                            m2 = paddle.distribution.Laplace(paddle.to_tensor([1.0]), paddle.to_tensor([0.5]))
                            m1.kl_divergence(m2)
                            # Tensor(shape=[1], dtype=float32, place=Place(cpu), stop_gradient=True,
                            # [1.04261160])
 """
 var_ratio = other.scale / self.scale
        t = paddle.abs(self.loc - other.loc)
        term1 = ((self.scale * paddle.exp(-t / self.scale) + t) / other.scale)
        term2 = paddle.log(var_ratio)
 return term1 + term2 - 1

4. Summary

Currently, the API is locked for contributions. Looking back at the development process of the API, the development of the API is actually not difficult. The main problem is how to conduct unit tests to prove that the developed API is correct, and there are some related details, such as the registration of KL divergence. In addition, we took a detour at the beginning, referred to the Normal development style, and wrote the API in the 2.0 style, which affected some time, and in the final single test, we found some bugs in the Uniform implementation. On the whole, the part that takes time is in the single test part.

 

Click to follow and learn about Huawei Cloud's fresh technologies for the first time~

{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4526289/blog/8631618