Perhaps a good way of explaining hypothesis testing is to consider an experiment.

Suppose we have a large group of patients, and we were testing a drug to lower their blood pressure. We first measure the blood pressures of the group before the treatment programme.

There will be some variability of the results, with most of the patients having a result close to some mean value (it would be highly unusual if they all had the same readings!). We can model this variability as a random process, say a Gaussian (also known as Normal) distribution - which is like a bell shaped curve. Most of the readings will have a result close to the mean, with a few noticeably lower or higher than this mean. If we had selected a group of people with high blood pressure, we would expect the mean value to be higher than what the medical community would consider the acceptable range.

After the treatment, we measure the blood pressures of the same group of patients. The question we pose is: is the drug effective?

A well-used scientific methodology to use is Hypothesis Testing. In Hypothesis Testing, we start off with a Null Hypothesis (H0), which can be regarded as our default assumption. In the case of our experiment, a suitable Null Hypothesis to consider is that the before and after blood pressure readings are about the same (give or take the variability of the readings). In other words, the drug has no significant effect.

Next we analyse the before and after data, and calculate a *test statistic*, which is a single number that in effect summarises the data. The process by which the test statistic is calculated depends on what kind of Statistical Hypothesis Test we use, which in turn is based on what assumptions we make about the data.

Now, this test statistic itself has a distribution, so that we can calculate a *p-value* from this parameter. This is done using either statistical tables or statistical calculators and software packages (including, I daresay, some online calculators in my blog). It is important to understand what the p-value represents. The p-value is the *probability of obtaining either the observed result or a more extreme result than the observed result based on chance alone*. A low p-value tells us that the result is less likely due to chance alone. We assign a *significance level*, say, of 5% (or 0.05), and if the calculated p-value is less than this, we reject the Null Hypothesis and consider the result significant. On the other hand, if the p-value exceeds 0.05, we accept the Null Hypothesis. An extremely low p-value would be all the more striking, suggesting that something interesting is going on, in terms of the potential effectiveness of the drug.

The p-value tells us how likely the observed results (or a more extreme result) arise from chance. It is not definitive proof that the drug is effective, were it to have a low value - it suggests that something significant is going on, and that further investigation is warranted. On the other hand, if the p-value is quite high (approaching the maximum value of 1), provided the experiment has been carefully designed, the result is not considered significant, and the researcher may not pursue the case further.