Last Updated on May 9, 2024 by Editorial Team Author(s): Karan Kaul | カラン Originally published on Towards AI. Photo by Dan Cristian Pădureț on Unsplash Storytime Imagine this — You got a new personal high score of 98 on your favourite game. You feel proud of this achievement & you share this news with a friend. However, your friend isn't impressed ? He implies that a score of 98 is fairly common for that game & isn’t such a big deal. You don't believe him & you decide to challenge his statement. You propose that by using statistics, you can prove how rare/less likely it is to get a score of 98. What we just saw, is a scenario where we are trying to test a claim. Your friend said that a score of 98 is fairly common for the game. This statement is a status quo or ground truth or a statement that we know normally holds true. By rejecting or proving his statement wrong, we are indirectly proving our own statement correct. We can prove him wrong if we somehow manage to prove that the mean score for this game is less than 98, which means our score is higher than what most people usually score. Okay, so what are the Null & Alternate Hypotheses? Whatever is already true or evident, forms the Null Hypothesis. H0 : “Our score is less than or equal to the mean scores” The claim that we are trying to prove, forms the Alternate Hypothesis. H1 : “Our score is greater than the mean scores” How do we test our claim & reject his statement? First, we need to figure out whether our event of scoring, is dependent on other player scores or not. It is quite apparent that this event is indeed “Independent”. Other players don't affect our scoring at all. Hence we will employ an Independent samples t-test to test our claim. What is an Independent Sample t-test? We employ this test when – The population standard deviation is unknown Samples are independent This test gives us a statistic or a value. This value represents how far away from the mean, our sample lies. To be specific — how many std. devs away are we from expected average scores. We can also find the p-value(probability) associated with this result from a t-table. Note — If we say that our sample is 2 Std.dev away from the mean, we don't necessarily imply that our sample is 2 std.dev above the mean or lower. If we only care about being different from the mean, we don't care if we score higher or lower, it just has to be different from the mean. In this scenario, we employ a 2 tailed independent sample t-test. However, in our case, we want to prove that we scored higher than the average. Hence, we need to be above the mean. This calls for a 1 tailed independent sample t-test. ? This is a lot of information to grasp if you are a beginner. I recommend going through this wonderful playlist on statistics for a better understanding. (This is my fav stats playlist on YT!) But how far above should our value be from the mean, to prove our friend wrong? Suppose the average score is 60. Can we say — that a score of 70 & being 1.2 Std. dev above the mean is enough? Or a score of 75 & being 1.5 std. dev above the mean is? … We need a fixed value, above which we can reject our friend's claim & prove ours correct. This value of std. dev, above which we can reject our friend's claim, is called the Critical Value. This critical value relates to a defined alpha/significance level. A 95 % significance level(alpha 0.05), when paired with a one-sided t-test signifies — most scores are expected to lie within 95% of the distribution or within 2 std. devs above the mean. Most values are expected to lie within the 95% area, which is the acceptance region This critical value is associated with the alpha chosen, it is not random. If we instead chose alpha = 0.01 or 99% significance level, the std. dev value(critical value) associated with it, would be further towards the right above the mean. You might now be able to infer — If we want to be very specific in our test i.e. our value should lie significantly far from the average values, then we will employ a small alpha such as 0.01 which means 99% significance. (The critical value/std.dev to beat would be higher) If we don't want such specificity, we can choose a normal alpha such as 0.1, which means 90% significance. This would result in a higher probability of rejecting the null since we don't expect our sample to be very far from the mean & would reject the null even with this smaller difference. (The critical value/std.dev to beat would be smaller ~) The critical value is near the mean for alpha = 0.1(90%) compared to 0.01(99%) Suppose our test statistic that we calculate using the t-test exceeds the critical value that we got using alpha & a t-table. In that case, observing the value we got would be considered a rare event & this would provide us with enough statistical evidence regarding the difference between the expected & observed values. Hence, we would reject the claim “Our score is less than or equal to the mean scores”. Hypothesis testing (Python & Scipy) Libraries Import the following libraries. We will also employ some basic visualizations to understand the data better. import numpy as npimport matplotlib.pyplot as pltimport seaborn as snsfrom scipy import statssns.set() Data/Samples We need a sample of scores from other players. A list called sample_scores contains these scores — sample_scores = [1,5,6,10,15,20,25,27,31,35,40,40,41,41,41,46,46,45,46,47,50,51,52,58,60,60,60,60,60,61,61,62,65,66,67,70,70,71,71,73,74,75,75,75,76,78,80,81,81,82,92,83,85,86,86,88,90,98,102,113]print(len(sample_scores))# 60 Let’s get a summary of our samples — summary = stats.describe(sample_scores)print(summary)# DescribeResult(nobs=60, minmax=(1, 113), mean=59.266666666666666, variance=637.012429378531, skewness=-0.43382097754515087, kurtosis=-0.2625823356606394) Save the mean and std. dev in separate variables. […]