Hypothesis testing relies on using information in a random sample from a population of interest to make inference about a certain characteristic of the population. It involves formulating hypotheses (both null and alternative hypotheses), taking a random sample, computing a test statistic from the sample data, and then using the test statistic to make a decision about the null hypothesis. If the information (expressed in the test statistic) is consistent with the null hypothesis, we will not reject the hypothesis; however, if the information is inconsistent with the null hypothesis, we will conclude that the hypothesis is false.
For many beginners in statistics, the above highlighted statement is confusing, and to some, annoying, because it talks about rejection only and never says when we should accept a hypothesis. Why professors and researchers are so stingy with wholehearted acceptance?
Well, this is not because of peculiarities of those professionals, but because of the nature of scientific research. To explain, let’s look at a simple example. Suppose we have collected 9 random samples of compressive strength of a given mix of concrete. From the data we calculate the sample mean and sample standard deviation to be, say 49.2 MPa and 6.9 MPa, respectively. Suppose also that the design spec states that the compressive strength should not be lower than, say 40 MPa. (Note that the language used in the design spec is a little sloppy here. What does the compressive strength really refer to? A mean value, a 5% percentile, or something else? Let us assume for now that it refers to the mean value of the compressive strength being no less than 40 MPa. If it refers to some percentile, then actually we would need to use the concept of tolerance interval for the discussion). Then, we can formulate the hypotheses as
- NH: \mu = 40 MPa
- AH: \mu < 40 MPa
where the Greek \mu refers to the population mean of the compressive strength.
To proceed with the hypothesis test, we speculate in this way: What we have had is only some sample data and their statistics. If we can establish how probable that this sample would occur if the population mean is 40 MPa indeed (i.e., the NH is true), then we can make a statement about the truth of the null hypothesis.
For this example, it can be shown that given a population mean of 40 MPa, the probability of observing a sample more bizarre than the observed one is only 0.2%, suggesting very low probability that the population mean would be 40 MPa. Hence we reject the NH. (The details is not the focus for this blog. But the t-statistic can be used for this simple case. Using the results, we yield t = (49.2-40)/(6.9/sqrt(9)) = 4).
Do we accept the AH then? Not really. For a population with mean smaller than 40 MPa, we would expect the sample mean even smaller. Therefore, in this specific case, we will reject both NH and AH.
“But what’s the point of the hypothesis testing?” you would ask immediately.
The point is: the sample testifies that the population mean is higher than 40 MPa. It is very unlikely it will be 40 MPa or below.
Of course, if the AH is a two-sided version, i.e., \mu != 40 MPa, then the rejection of the NH will indicate an acceptance of the AH.
The other scenario is that of not rejecting the NH. In this case, can we accept the NH instead?
Suppose that a second sample yields a sample mean of 40 MPa exactly. Then, whatever the standard deviation is, the t statistic is 0, and the p-value is 0.5 or 50%. Therefore, there is no evidence that the NH (\mu = 40 MPa) should be rejected.
But can we accept that the population mean is 40 MPa indeed?
Not really, again! Why? Because if the population mean is 38 MPa instead, it is of significant possibility, as long as the population standard deviation is not too small, that the the sample mean can be as high as 40 MPa, the observed value. This also holds true when the population mean is 42 MPa.
Logically, it make sense. But practically, what can we do if the no rejection conclusion is made? Usually, we will act as if the NH suggests a status quo, i.e., no action.