Tuesday, March 19, 2019

SciPy - 9 (functions that deal with several common probability distributions)

Probability distributions

Scipy has functions that deal with several common probability distributions. Currently there are 81 continuous probability distributions and 10 discrete distributions. These are defined in the scipy.stats sub-package. This package also defines several statistical functions.

Examples of continuous probability distributions are:

norm : Normal or Gaussian
chi2 : Chi-squared
t : Student’s T
uniform : Uniform
Examples of discrete probability distributions are:

binom : Binomial
poisson : Poisson
Examples of statistical functions:

mode : Modal value
moment : central moment
describe: descriptive statistics
histogram: histogram of data

There are two ways of using probability distribution functions.

1. Generate a “frozen” distribution object and then work with the methods of this object. A “frozen” distribution is one with its parameters set to specific values. For example a Gaussian with mean = 3.5 and standard deviation = 2.0.

To create a “frozen” Gaussian or Normal distribution with mean = 3.5 and standard deviation = 2.0 we can use.

n = stats.norm(loc=3.5, scale=2.0)

Then to draw a random number from this distribution we can execute:

print( n.rvs())

For a normal distribution the keyword parameter loc defines the mean and the keyword parameter scale defines the standard deviation. For other distributions these will correspond to appropriate parameters of the distribution; the parameters needed by a distribution is specified in the docstring of the distribution, which can be viewed with the Python help function. For example, to view the docstring for the Poisson distribution we can use help(stats.poisson) or scipy.info(stats.poisson).

2. Use functions in the appropriate class by always passing the parameters that define the distribution, when calling functions associated with the distribution.

For example, to draw a random number from a Gaussian or Normal distribution with mean = 3.5 and standard deviation = 2.0 we can use:

print(stats.norm.rvs(loc=3.5, scale=2.0))

Given a probability distribution function we are interested in obtaining properties of the distribution. Functions are available for calculating the following important properties:

1.Probability density function (PDF) and probability mass function (PMF)

For continuous variates the probability density function (PDF) is proportional to the probability of the variate being in a small interval about the given value. The probability of the variate being in a finite interval is the integral of the PDF over the interval.

The value of the PDF at any value of the variate can be obtained using the function pdf of the concerned distribution.

# PDF of Gaussian of mean = 0.0 and std. deviation = 1.0 at 0.
print(stats.norm.pdf(0, loc=0.0, scale=1.0))

We can also pass an array of values to this function, to get the PDF at the specified values of the variate:

print(stats.norm.pdf([-0.1, 0.0, 0.1], loc=0.0, scale=1.0))

For discrete variates the probability mass function (PMF) gives the probability of the variate having a value x.

For example, for a binomial distribution with p = 0.5 and number of trials n = 10 we can calculate the PMF using the following:

tries = range(11) # 0 to 10
print(stats.binom.pmf(tries, 10, 0.5))


The following function plots the PMF of a binomial distribution with n tries and probability of success p:

def binom_pmf(n=4, p=0.5):
    # There are n+1 possible number of "successes": 0 to n.
    x = range(n+1)
    y = stats.binom.pmf(x, n, p)
    plt.plot(x,y,"o", color="black")

    # Format x-axis and y-axis.
    plt.axis([-(max(x)-min(x))*0.05, max(x)*1.05, -0.01, max(y)*1.10])
    plt.xticks(x)
    plt.title("Binomial distribution PMF for tries = {0} & p ={1}".format(
            n,p))
    plt.xlabel("Variate")
    plt.ylabel("Probability")

    plt.draw()

The plot for the binomial distribution with n = 0 and p = 0.5 is show below:




The Cumulative density function (CDF)

CDF gives the probability that the variate has a value less than or equal to the given value. The following function plots the CDF for a Gaussian distribution of given mean and standard deviation:

def norm_cdf(mean=0.0, std=1.0):
    # 50 numbers between -3σ and 3σ
    x = sp.linspace(-3*std, 3*std, 50)
    # CDF at these values
    y = stats.norm.cdf(x, loc=mean, scale=std)

    plt.plot(x,y, color="black")
    plt.xlabel("Variate")
    plt.ylabel("Cumulative Probability")
    plt.title("CDF for Gaussian of mean = {0} & std. deviation = {1}".format(
               mean, std))
    plt.draw()

The CDF for the Gaussian of mean = 0.0 and standard deviation = 1.0 is shown below:


Percent point function (PPF) or inverse cumulative function

PPF is the inverse of the CDF. That is, PPF gives the value of the variate for which the cumulative probability has the given value. The following function plots the PPF for a Gaussian distribution:

def norm_ppf(mean=0.0, std=1.0):
    # 100 numbers between 0 and 1.0 i.e., probabilities.
    x = sp.linspace(0, 1.0, 100)
    # PPF at these values
    y = stats.norm.ppf(x, loc=mean, scale=std)

    plt.plot(x,y, color="black")
    plt.xlabel("Cumulative Probability")
    plt.ylabel("Variate")

    plt.title("PPF for Gaussian of mean = {0} & std. deviation = {1}".format(
               mean, std))
    plt.draw()

The figure below shows plot of PPF for Gaussian of mean = 0.0 and std. deviation = 1.0:

Survival function (SF)

Survival function gives the probability that the variate has a value greater than the given value; SF = 1 - CDF. A function for plotting the SF for a Gaussian with a given mean and std. deviation is shown below. An example plot for mean = 0.0 and std. deviation = 1.0 is also shown:

def norm_sf(mean=0.0, std=1.0):
    # 50 numbers between -3σ and 3σ
    x = sp.linspace(-3*std, 3*std, 50)
    # SF at these values
    y = stats.norm.sf(x, loc=mean, scale=std)

    plt.plot(x,y, color="black")
    plt.xlabel("Variate")
    plt.ylabel("Probability")
    plt.title("SF for Gaussian of mean = {0} & std. deviation = {1}".format(
               mean, std))
    plt.draw()



Inverse survival function (ISF)

ISF is the inverse of the survival function. It gives the value of the variate for which the survival function has the given value. Function to plot the ISF, and an example plot, for a Gaussian is given below:

def norm_isf(mean=0.0, std=1.0):
    # 100 numbers between 0 and 1.0
    x = sp.linspace(0, 1.0, 100)
    # PPF at these values
    y = stats.norm.isf(x, loc=mean, scale=std)

    plt.plot(x,y, color="black")
    plt.xlabel("Probability")
    plt.ylabel("Variate")

    plt.title("ISF for Gaussian of mean = {0} & std. deviation = {1}".format(
               mean, std))
    plt.draw()



Random variate

A random value or several random values can be drawn from a distribution using the rvs method of the appropriate class.

The following plot shows the histogram of events recorded in a series of 2 second intervals. The data is simulated by drawing 100 random events from a Poisson distribution with mean μ = 1.69. Also plotted is the Probability Mass Function (PMF) of the Poisson distribution with mean μ = 1.69. Poisson is a discrete distribution and the solid line is just a smooth line drawn through the points of the Poisson distribution PMF. The mean μ and the standard deviation σ = sqrt(μ) are marked. See the program below:

import scipy as sp
from scipy import stats
from matplotlib import pyplot as plt
from scipy import interpolate

def simulate_poisson():
    # Mean is 1.69
    mu = 1.69
    sigma = sp.sqrt(mu)
    mu_plus_sigma = mu + sigma

    # Draw random samples from the Poisson distribution, to simulate
    # the observed events per 2 second interval.
    counts = stats.poisson.rvs(mu, size=100)

    # Bins for the histogram: only the last bin is closed on both
    # sides. We need one more bin than the maximum value of count, so
    # that the maximum value goes in its own bin instead of getting
    # added to the previous bin.
    # [0,1), [1, 2), ..., [max(counts), max(counts)+1]
    bins = range(0, max(counts)+2)

    # Plot histogram.
    plt.hist(counts, bins=bins, align="left", histtype="step", color="black")

    # Create Poisson distribution for given mu.
    x = range(0,10)
    prob = stats.poisson.pmf(x, mu)*100

    # Plot the PMF.
    plt.plot(x, prob, "o", color="black")

    # Draw a smooth curve through the PMF.
    l = sp.linspace(0,11,100)
    s = interpolate.spline(x, prob, l)
    plt.plot(l,s,color="black")

    plt.xlabel("Number of counts per 2 seconds")
    plt.ylabel("Number of occurrences (Poisson)")

    # Interpolated probability at x = μ + σ; for marking σ in the graph.
    xx = sp.searchsorted(l,mu_plus_sigma) - 1
    v = ((s[xx+1] -  s[xx])/(l[xx+1]-l[xx])) * (mu_plus_sigma - l[xx])
    v += s[xx]

    ax = plt.gca()
    # Reset axis range and ticks.
    ax.axis([-0.5,10, 0, 40])
    ax.set_xticks(range(1,10), minor=True)
    ax.set_yticks(range(0,41,8))
    ax.set_yticks(range(4,41,8), minor=True)

    # Draw arrow and then place an opaque box with μ in it.
    ax.annotate("", xy=(mu,29), xycoords="data", xytext=(mu, 13),
                textcoords="data", arrowprops=dict(arrowstyle="->",
                                                   connectionstyle="arc3"))
    bbox_props = dict(boxstyle="round", fc="w", ec="w")
    ax.text(mu, 21, r"$\mu$", va="center", ha="center",
            size=15, bbox=bbox_props)

    # Draw arrow and then place an opaque box with σ in it.
    ax.annotate("", xy=(mu,v), xytext=(mu_plus_sigma,v),
                arrowprops=dict(arrowstyle="<->", connectionstyle="arc3"))
    bbox_props = dict(boxstyle="round", fc="w", ec="w")
    ax.text(mu+(sigma/2.0), v, r"$\sigma$", va="center", ha="center",
            size=15, bbox=bbox_props)

    # Refresh plot and save figure.
    plt.draw()
    plt.savefig("simulate_poisson.png")

simulate_poisson()


The program generates the plot as shown below:



This is the end of this post. Till we meet again keep practicing and learning Python as Python is easy to learn!











Share:

0 comments:

Post a Comment