This is data science different topic explanation conversation series.
If you are not following below mentioned are the links for each part.
Data Science different topic’s explanation – Part-7 – Probability Distributions
There are a lot of engineers who have never been involved in the field of statistics or data science. But in order to build data science pipelines or rewrite produced code by data scientists to an adequate, easily maintained code many nuances and misunderstandings arise from the engineering side. For those Data/ML engineers and novice data scientist, i make this series of posts. I will try to explain some basic approaches in plain english and, based on ti, explain some of the data science basic concepts.
There are many distributions, But first, we need to understand probability functions for continuous random variables.
Let’s consider an experiment in which the probability of events is as follow. The probability of getting numbers 1,2,3,4 is 1/10, 2/10, 3/10, 4/10 respectively. It would be more convenient for us if we had an equation for that experiment that would give us those values based on the probability of events for example, the equation for this experiment can be set to f(x) = x/10, where x=1,2,3,4. This equation (or function) is called probability distribution function. Although some authors also call it probability function, a frequency function or a probability mass function. It tells us that a random variable x is likely to appear.
Continuing the conversation series for next topic here.
Cumulative Distribution Function (CDF)
The cumulative distribution function provides an integral picture of the probability distribution. As the name cumulative suggests, it is simply the probability that a variable will take a value less than or equal to particular value. In the example above given x=3, the CDF tells us the sum probability of all random variables form 1 to 3.