# LIS 504 - Descriptive statistics

## Single data set

The same article was abstracted by 20 people. Here are the lengths in characters (bytes) of the abstracts that they produced:
1249 1059 769 1893 1368 1633 1911 1614 5213 2076 1008 1362 1698 1349 2163 1209 698 4203 1326 2338

If we distribute these numbers into bins, each covering a range of 500 bytes, we get the following distribution:
1 - 500 501 - 1000 1001 - 1500 1501 - 2000 2001 - 2500 2501 - 3000 3001 - 3500 3501 - 4000 4001 - 4500 4501 - 5000 5001 - 5500
0 2 9 6 1 1 0 0 0 0 1

This distribution can be displayed as a histogram:

We can compute some measures of central tendency:
 Mean 1806.95 Median 1491 Mode None - since all the lengths are different

We can also compute measures of dispersion:
 Range (=Max-Min) 4515 Standard deviation 1070.66 Variance 1.14632e+06

We can compute the quartiles, including the minimum (0th quartile), maximum (4th quartile), and median (2nd quartile): 698, 1239, 1491, 1952.25, 5213. Likewise, we can determine various percentiles; for example, counting by increments of 10%, we have 698, 984.1, 1179, 1302.9, 1356.8, 1491, 1659, 1898.4, 2093.4, 2524.5, 5213.

## Two data sets

Here are the times in seconds that the same people took to write their abstracts (they had at most 1 hour, or 3600 seconds):
2472 3535 3328 2610 3188 2379 3494 2662 3385 3440 2633 3389 3600 2730 3600 2399 3600 2792 2431 3600

We can graph these times against the lengths of the abstracts produced in a scatterplot:

From the scatterplot, it is pretty clear that there is not much association between the two variables; that is, there is not much tendency for abstracts that took longer to write to be longer. We can measure the exact degree of association with the correlation coefficient, which, in this case, equals 0.126773412, a low, if positive, value.

Home

Last updated January 19, 2001.