For example, we might have data on the lengths of abstracts that different people wrote of the same document:
1249 1059 769 1893 1368 1633 1911 1614 5213 2076 1008 1362 1698 1349 2163 1209 698 4203 1326 2338We might, however, not have any particular hypothesis that predicts a specific mean length.
In such cases, we can sometimes ask which of a family of possible hypotheses are supported by the data and which are not.
One way of forming a family of hypotheses about the population mean of a set of values on an interval or ratio scale is to assume a normal distribution and estimate the population's standard deviation from the sample. The formula for doing this is almost the same as that for the standard deviation of the sample, but gives slightly bigger values, especially for small samples. For the data above, for example, the sample standard deviation is about 931, and the estimated standard deviation for the population is about 955. These values can be obtained in Microsoft Excel with the STDEVP and STDEV functions respectively (not the other way round, as you might expect).
Some of the members of the family of populations with the same estimated standard deviation but with different means are very likely to yield a sample of the given size which has an observed mean at least as far away from the population mean as the observed mean; some are very unlikely to do so. The following chart, for instance, shows the probabilities of different sample means for two different population means, with one population mean the same as the observed sample mean and the other much lower.
The observed sample mean is so far away from some population means that, depending on the level of confidence we are looking for, we would reject the hypothesis that the sample could have come randomly from that population. We can describe which hypothetical population means are close enough for a particular confidence level by using a confidence interval.
In Microsoft Excel, we can use the CONFIDENCE function to determine how far this confidence interval goes above and below the observed mean, given the confidence level that we want, the population's estimated standard deviation, and the sample size. Note that, for this function, we give the confidence level in an inverse form; for example, if we want the 95% level (19 times out of 20), the first argument of the function would be 0.05 (i.e., 100%- 95%=5%).
For example, for the estimated standard deviation of about 955 and a sample size of 20, at the 95% confidence level the CONFIDENCE function gives a value of about 419. The corresponding confidence interval is thus about 1633 plus or minus 419, or about 1215 to 2052.