Data Science different topic's explanation -- Part-9 -- Normal Distribution

anish-arya · 23 July 2022 15:01

This is data science different topic explanation conversation series.
If you are not following below mentioned are the links for each part.

Part-1:

Part-2:

Part-3:

Part-4:

Part-5:

Part-6:

Part-7:

Part-8:

Data Science different topic’s explanation – Part-9 – Normal Distribution

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np


def normal() -> None:
    fig, ax = plt.subplots(1, 1)
    # calculate a few first moments
    mean, var, skew, kurt = norm.stats(moments='mvsk')
    # display the probability density function (`pdf`)
    x = np.linspace(norm.ppf(0.01),  norm.ppf(0.99), 100)
    ax.plot(x, norm.pdf(x),
        'r-', lw=5, alpha=0.6, label='norm pdf')
    ax.plot(x, norm.cdf(x),
        'b-', lw=5, alpha=0.6, label='norm cdf')
    # check accuracy of `cdf` and `ppf`
    vals = norm.ppf([0.001, 0.5, 0.999])
    np.allclose([0.001, 0.5, 0.999], norm.cdf(vals))

    # generate random numbers:
    r = norm.rvs(size=1000)
    # and compare the histogram
    ax.hist(r, normed=True, histtype='stepfilled', alpha=0.2)
    ax.legend(loc='best', frameon=False)
    plt.show()

normal()

anish-arya · 23 July 2022 15:14

Continuing the left over content of above post.
At the heart of the statistics lies the normal distribution, known to millions of people as a bell-shaped curve. It is a two-parameter family of curves that represent plots of probability density functions:

It looks a little scray, but we will get it all figured out soon enough. The normal distribution density function has two mathematical constants:

π – The ratio of the circle to its diameter is about 3,142;
e – The base of the natural logarithm is about 2.718;

anish-arya · 23 July 2022 15:14

Continuing the left over content of above post here.

And two parameters that set the shape of a particular curve:

µ is a mathematical expectation or mean. It shows that data near the mean are more frequent in occurrence than data far from the mean.
σ^2 – variance, will also be discussed in the next posts:

And, of course, the variable x itself for which the function value is calculated, i.e. the probability density.

The constants, of course, don’t change. But parameter are what give the final shape to a particular normal distribution.

anish-arya · 23 July 2022 15:17

Continuing the missed out content here.

So, the specific form of the normal distribution depends on 2 parameters: the expectation (µ) and variance (σ2). Briefly denoted by N(m, (σ2)). The parameter µ (expectation) determines the distribution centre, which corresponds to the maximum height of the graph. The variance σ2 characterises the range of variation, that is, the “spread” of the data".

Another interesting detail of this distribution is when we calculate the standard deviation we find that:

about 68% of values are within 1 standard deviation of the mean.
about 95% of values are within 2 standard deviations of the mean.
about 99.7% of value are within 3 standard deviations of the mean.

Please feel free to post your thoughts for this post in the comment section and make sure to give a like to this post for support and motivation for fresh content.