What is Hypothesis testing?
The process of using probability and statistics to set up an experimental situation and decide whether or not to reject the “status quo” hypothesis based on sample data is called hypothesis testing.
Before getting into details of how Hypothesis testing works, let us get ourselves familiar with some terminology related to Hypothesis testing.
Terminology
Null Hypothesis [Ho]: It is the “status quo” or “prior belief”.It assumes that the observation is due to a chance factor. The null hypothesis is assumed to be true unless proven otherwise.
Alternative Hypothesis [H1]: Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect. We reject the null hypothesis in favor of the alternative hypothesis only if there is convincing statistical evidence against Ho. The alternative hypothesis is sometimes referred to as the research hypothesis.
Example:
Suppose we wanted to determine whether a coin was fair and balanced(unbiased). A Null hypothesis states that half the flips would result in Heads and half in Tails. This can mathematically be written as follows
Ho: P(Head) = 0.5
Now the coin is tossed let’s say 10 times and 7 Heads and 3 Tails are observed.Now, Alternative hypothesis states that the coin is a biased one as we didn’t observe an equal number of Heads and Tails in our experiment.
H1: P(Head) ≠ 0.5
Now let’s go back and visit our definition of Ho -“It assumes that the observation is due to a chance factor”. A chance factor is an influence that contributes randomly to each observation and is unpredictable. In a simple sense, the Null hypothesis argues that prior belief ( P(Head)= 0.5 in this case) is true and the observations from an experiment is due to some randomness and hence can be ignored.
The definition of H1 says that -“Contrary to the null hypothesis, the alternative hypothesis shows that observations are the result of a real effect”. Therefore alternative hypothesis argues that observations of the experiment are biased because the coin itself is a biased coin and not due to some chance factor.
Now that we are done with Null hypothesis and Alternate Hypothesis, let’s move to the next terms.
The two types of hypothesis tests, based on the alternative hypothesis H1, are:
Two-sided, or two-tailed, tests:
When you want to detect a difference on either side of the mean, the test is said to be two-tailed and takes the form
H1: μ ≠ value. The two-sided test for the above example can be given as follows,
H1: P(Head)≠ 0.5. Graphically a two-tailed test can be represented as
$$ H_1: P(Head) > 0.5$$ 
(or)
$$ H_1: P(Head) < 0.5$$.

Note: Since we are showing the plots of normal distribution, it didn’t mean that hypothesis test only applicable for only normal distribution.
When an Hypothesis test is performed, we either have to reject Null hypothesis or fail to reject it. The possible errors that may occur are
Type-I error:
A Type I error occurs when the researcher rejects a null hypothesis when it is true.
The probability of committing a Type I error is called the significance level.
This probability is also called alpha, and is often denoted by α.
Type one error can be interpreted as- α=P(Type, I, error)=P(Reject, H_0 ,when, H_0 ,is ,true)
Type -II error:
A Type II error occurs when the researcher fails to reject a null hypothesis that is false.
The probability of committing a Type II error is called Beta, and is often denoted by β.
The probability of not committing a Type II error is called the Power of the test.
Type two error can be interpreted as - $$ β=P(Type ,II ,error)=P(fail, to ,reject, H_0 ,when ,H_0 ,is ,actually, false)
Note :- We would like the probability of committing either one of these errors to be as small as possible. Unfortunately, decreasing the probability of committing one type of error only increases the probability of committing the other type of error.So our main focus of interest would be Type I error , i.e 𝛼
Significance level (${\alpha}$)
Graphically can we explain what Significance level means?
The significance level determines how far out from the null hypothesis value we’ll draw that line on the graph. To draw a significance level of 0.05, we need to shade the 5% of the distribution that is furthest away from the null hypothesis.
The shaded region is also called as Critical region and the if the sample mean falls into that region ,we reject the Null Hypothesis $H_0$ .
What does a significance level of α = 0.05 mean? It means that if H0 is actually true and the hypothesis test is repeated on different random samples of data from the same population, then we would expect H0 to be incorrectly rejected 5% of the time.
$pValue$
P-values are the probability of obtaining an effect at least as extreme as the one in your sample data, assuming the truth of the null hypothesis.
$$ pValue=P(Occuring\ of observation | H_0\ is\ assumed\ to\ be\ true)$$
We fail to reject the null hypothesis $H_0$ if $p{Value> \alpha }$ and reject if $p{Value< \alpha }$.
The Misunderstood p Value  The p value is one of the most misunderstood quantities in psychological research. Even professional researchers misinterpret it, and it is not unusual for such misinterpretations to appear in statistics textbooks! The most common misinterpretation is that the p value is the probability that the null hypothesis is true—that the sample result occurred by chance. For example, a misguided researcher might say that because the p value is .02, there is only a 2% chance that the result is due to chance and a 98% chance that it reflects a real relationship in the population. But this is incorrect. The p value is really the probability of a result at least as extreme as the sample result if the null hypothesis were true. So a p value of .02 means that if the null hypothesis were true, a sample result this extreme would occur only 2% of the time. You can avoid this misunderstanding by remembering that the p value is not the probability that any particular hypothesis is true or false. Instead, it is the probability of obtaining the sample result if the null hypothesis were true.
Credit: Understanding Null Hypothesis Testing – Research Methods in Psychology – 2nd Canadian Edition
$z-Score$
A z-score (aka, a standard score) indicates how many standard deviations an element is from the mean. A z-score can be calculated from the following formula. $$z =\frac{(\overline{x}-\mu)}{\frac{\sigma}{\sqrt{n}}}$$
Here is how to interpret z-scores.
- A z-score less than 0 represents an element less than the mean.
- A z-score greater than 0 represents an element greater than the mean.
- A z-score equal to 0 represents an element equal to the mean.
- A z-score equal to 1 represents an element that is 1 standard deviation greater than the mean; a z-score equal to 2, 2 standard deviations greater than the mean; etc.
- A z-score equal to -1 represents an element that is 1 standard deviation less than the mean; a z-score equal to -2, 2 standard deviations less than the mean; etc.
Enough of theory , now let’s jump into Hypothesis implementation with example
A survey shows that the average black friday sales of male is much higher(500$) when compared to that of female. A company which is planning for it’s black friday sales want to know if this is true and hence wanted to take data from samples of different sizes such as 100,500,1000 from the population and note their black friday spending details.The company wants to know if there is really any difference in spending or it is just by chance(with significance leve 15%). Can you help the company come to a conclusion on this with the help of data provided about different samples?
Stating Null Hypothesis and Alternate Hypothesis
Null Hypothesis Ho :The average spending of male and female is same i.e, \mu_m= \mu_f
Alternative Hypothesis H1: The average spending of male is greater than that of female, i.e, $\mu_m > \mu_f
Choosing significance level
As it was not mentioned in the problem we are taking the standard significance level $\alpha=0.15
Setting up Test Statistic
How do we decide whether or not to reject the null hypothesis H0 ?
a. we start by determining a test statistic with our sample data
What is test statistic?
a. It is the evidence that we look for, to prove our null hypothesis
b. The most natural choice for a test statistic of the difference in population mean is the difference in sample mean $ \mu_m-\mu_f .
Calculating the P-value using Permutation test
Final Note:
If we reject the null hypothesis, we do not prove the alternative hypothesis is true. We merely state there is sufficient evidence to reject the null hypothesis.
If we fail to reject the null hypothesis, we do not prove the null hypothesis is true. We merely state there is not sufficient evidence to reject the null hypothesis.
Unfortunately, whatever the decision, there is always a chance we made an error!