Probability — Gaussian Distribution

Ritik Jain
6 min readJul 7, 2022

--

Almost 95% of random phenomena observed follows Gaussian Distribution.

Photo by Fred Heap on Unsplash

Writing this article as a complete guide on Gaussian distribution. Try to cover different aspects from understanding the distribution to driving the formulae.

Gaussian distribution a.k.a. normal distribution is a probability distribution that is symmetric around the mean, showing that the frequency is more around the mean value. The graph shows a bell-shaped curve.

A distribution is a collection of discrete or continuous values and frequency of the observation, as the weight of the population.

Distribution Function

A probability distribution function is a mathematical representation that gives the probabilities of occurrence of different possible outcomes for an experiment. It is a mathematical description of a random phenomenon in terms of its sample space and the probabilities of events.

Gaussian Distribution function

To calculate the probability of any customarily distributed data. The Gaussian probability distribution function is given by:

where μ and σ are the natural parameters of the probability distribution.

The probability density function is a composition of two important factors:

The scaling factor will help reduce the probability of the center value and distribute it among other values. The shifting factor will help shift the entire graph over the x-axis.

Tweaking the scaling and shifting factors will give a family of Gaussian distributions. We’ll be seeing it in the latter part of the post.

Cumulative Density Function

The probability density function can help in calculating the probability for an individual point. PDF is defined as P(X = xᵢ)

The cumulative density function tells about the probability that a random variable takes on a value less than or equal to x. CFD is defined as P(X ≤ x). The Cumulative density function is given by:

It is computationally expensive to compute the probability of integrals. To address this challenge,

Properties of CDF

  • Every CDF function is right continuous and it is non-increasing. Where
  • If ‘X’ is a discrete random variable then its values will be x1, x2, …..etc and the probability Pᵢ = p(xᵢ) thus the CDF of the random variable ‘X’ is discontinuous at the points of xᵢ.
  • If the CDF of a real-valued function is said to be continuous, then ‘X’ is called a continuous random variable

Natural Parameters and their derivation

Natural is the qualification chosen for this particular class of exponential families, so one can accept it as a definition without seeking further reasons.

In statistics, we generally use Maximum likelihood estimation to find the MVU estimator of any natural parameters. In the case of Gaussian distribution, we have two explicit natural parameters to find are:

  1. Mean (μ)
  2. Standard Deviation a.k.a sigma(σ)

Please follow the Github Notebook to understand the mathematics behind the MVU estimator of Gaussian Distribution.

In Repo, we are

  • Using Maximum Likelihood Estimator for finding the MVU estimator
  • Validation of MVU estimator using Center Limit theorem (CLT) and Sampling Distribution

Family of Gaussian Distribution

In Gaussian distribution, we can use natural parameters (μ and σ) to generate the family of gaussian distribution.

Scale Factor (change σ)

The scale factor is responsible for distributing the probabilities across the x-axis (random variable). The scale factor has an inverse relation with the probability distribution.

With a smaller sigma(σ) value, the probability at the mean is higher and the graph is shrinker. With a higher sigma(σ) value, the probability at the mean value is lower, distributed across the random variable and the graph is wider.

Scale factor — DSClassroom

Shift Factor (change μ)

The shifting factor is responsible for shifting the distribution on the x-axis. The shift factor (μ) has a direct relationship with the probability distribution.

With a lower mean value, the probability distribution shift toward the negative quadrant of a graph. With a higher mean value, the probability distribution shifts toward the positive quadrant of a graph.

Family from Shift Factor — DSClassroom

With the different combinations of the shift and the scale factor, we can generate an entire family of the Gaussian distribution.

Empirical Rule

In statistics, the empirical rule common refers to the “68–95–99.7 rule” or “three-sigma rule” which is a shorthand used to remember the percentage of values that lie within an interval estimate in a normal distribution.

The empirical rule is determined from the standard Gaussian distribution.

The standard Gaussian distribution is a normal distribution with a mean(μ) of zero and a standard deviation(σ) of 1. The standard normal distribution is zero-centered and the degree to which a given measurement deviates from the mean is provided by the standard deviation. It has zero skew and kurtosis of 3.

The Empirical Rule states that 99.7% of data observed following a normal distribution lies within 3 standard deviations of the mean. The

Interval Estimate for different λ

Under this rule,

  • Roughly 68.3% of the data is within 1 standard deviation of the average. (μ ± 1σ)
  • Roughly 95.5% of the data is within 2 standard deviations of the average (μ ± 2σ)
  • Roughly 99.7% of the data is within 3 standard deviations of the average (μ ± 1σ)
Source

The empirical rule is beneficial because it serves as a means of forecasting data. This is especially true when it comes to large datasets and those where variables are unknown.

The empirical rule is restricted to three sigmas that cover the entire distribution. In some cases, for wider class distribution, only a certain fraction of distance is significant from the mean. That’s where Chebyshev's Inequality comes into the picture.

Chebyshev’s Inequality is more general, stating that a minimum of 75% of data should lie within 2 standard deviations and 88.89% of data within 3 standard deviations. Pafnuty Chebyshev stated the inequality as follows —

Chebyshev’s Inequality

Inverse Transform sampling

Inverse transform sampling is a method to generate new data points from a given distribution. It is a simple methodology for pseudo-random number sampling.

Inverse transform sampling takes uniform samples between 0 and 1, interpreted as a probability, and then returns the number from the distribution function. It uses the cumulative distribution function to generate the number from distribution.

E.g., For a car traveling 30 miles per hour, the distance required to brake to a stop is normally distributed with a mean of 50 feet and a standard deviation of 8 feet. Suppose, we want to generate 100 normally distributed data points in python.

After generating the data, let’s see the distribution of the generated data using histogram.

As you can see, the distribution of generated data has a mean value of 50 and a standard deviation of 8.

Applications of Gaussian Distribution

  • VaR (Value at Risk) is very popular in practice as it measures the maximum possible loss with a given probability (on the left tail) during a certain period.
  • The variance of returns is just one possible risk measure, i.e. different risk measures can be employed when dealing with returns of the risky assets.

--

--

Ritik Jain

Fallen for data and understand the problems which can be resolve. Passionate for ML and MLOps.