A test statistic is the output of a scalar function of all the observations. This statistic provides a single number, such as the average or the correlation coefficient, that summarizes the characteristics of the data, in a way relevant to a particular inquiry. As such, the test statistic follows a distribution determined by the function used to define that test statistic and the distribution of the input observational data.

For the important case in which the data are hypothesized to follow the normal distribution, depending on the nature of the test statistic and thus the underlying hypothesis of the test statistic, different null hypothesis tests have been developed. Thus computing a p-value requires a null hypothesis, a test statistic together with deciding whether the researcher is performing a one-tailed test or a two-tailed testand data.

Even though computing the test statistic on given data may be easy, computing the sampling distribution under the null hypothesis, and then computing its cumulative distribution function CDF is often a difficult problem.

Today, this computation is done using statistical software, often via numeric methods rather than exact formulaebut, in the early and mid 20th century, this was instead done via tables of values, and one interpolated or extrapolated p-values from these discrete values[ citation needed ].

Rather than using a table of p-values, Fisher instead inverted the CDF, publishing a list of values of the test statistic for given fixed p-values; this corresponds to computing the quantile function inverse CDF.

Here a few simple examples follow, each illustrating a potential pitfall. The test statistic is "the sum of the rolled numbers" and is one-tailed. The researcher rolls the dice and observes that both dice show 6, yielding a test statistic of If the researcher assumed a significance level of 0.

In this case, a single roll provides a very weak basis that is, insufficient data to draw a meaningful conclusion about the dice. This illustrates the danger with blindly applying p-value without considering the experiment design.

Five heads in a row[ edit ] Suppose a researcher flips a coin five times in a row and assumes a null hypothesis that the coin is fair. The test statistic of "total number of heads" can be one-tailed or two-tailed: The researcher flips the coin five times and observes heads each time HHHHHyielding a test statistic of 5.

This demonstrates that specifying a direction on a symmetric test statistic halves the p-value increases the significance and can mean the difference between data being considered significant or not. Sample size dependence[ edit ] Suppose a researcher flips a coin some arbitrary number of times n and assumes a null hypothesis that the coin is fair.

The test statistic is the total number of heads and is a two-tailed test. In both cases the data suggest that the null hypothesis is false that is, the coin is not fair somehowbut changing the sample size changes the p-value.

In the first case, the sample size is not large enough to allow the null hypothesis to be rejected at the 0. This demonstrates that in interpreting p-values, one must also know the sample size, which complicates the analysis. Alternating coin flips[ edit ] Suppose a researcher flips a coin ten times and assumes a null hypothesis that the coin is fair.

The test statistic is the total number of heads and is two-tailed. This yields a test statistic of 5 and a p-value of 1 completely unexceptionalas that is the expected number of heads.Volume 17, No. 2, Art.

