Soc2205a-b Healey Chapter 1 Elementary Concepts in Statistics
Data: Observations gathered through sociological research. Make sense of data through statistical analysis.
Description vs. Inference:
Descriptive statistics used to describe and explore patterns, trends, and relationships. A way of summarizing information about a sample. Example: mean (average), median, standard deviation, etc. can describe one variable. Tables and graphs are also a way of describing what the distribution of a sample variable “looks” like (we will look at these in Healey Ch. 2-5). To describe the relationship between two variables, we use what is called a measure of association (Healey Ch. 12-16). Measures of association can tell us 1. how strong a relationship between two variables is, and 2. what the direction of the relationship is (explain negative and positive relationships)
Inferential statistics (Healey Ch. 6-11) used to make predictions based on the data. To take what is known about a sample and infer or make a prediction about a population. Need to have random sample (see Healey Ch. 6) or probability sample (in order to calculate probability of being able to estimate pop. parameter.)
Population = total set of subjects.
Sample = subset of population.
Statistic = Numerical summary of sample
Parameter = Numerical summary of population
Sample Statistics (Descriptive) à Population Parameters (Inferential)
population = all statistics students à mean = population parameter
sample = soc205a class à mean = sample statistic
Univariate, Bivariate, Multivariate = one, two or multiple variable analysis
What are variables? Concepts in numerical form that can vary in value. Variables are things that we measure, control, or manipulate in research. They differ in many respects, most notably in the role they are given in our research and in the type of measures that can be applied to them.
Dependent vs. independent variables. Independent variables (also “causal” or “explanatory”) are those that are manipulated. Designated as X. Dependent (or “outcome” or “response” variables are only measured or registered. Designated as Y. This distinction appears terminologically confusing to many because, as some students say, "all variables depend on something." However, once you get used to this distinction, it becomes indispensable.
Control Variables - used in multivariate stats. Usually nominal or ordinal. Used in multivariate analysis.
Hypothesis: a statement that describes the relationship between two or more variables.
Qualitative vs. Quantitative Data:
Qualitative = categorical, categories are unordered. No mathematical operations except percentages or proportions. i.e. marital status
Quantitative = variables have a numerical format. Various mathematical operations can be used, including < >, + -, etc.
Discrete (finite number of values - cannot subdivide basic unit of measurement) and Continuous (infinite number) i.e. number of children, vs. weight or age. However distinction blurry - depends also on how variable is measured.
Measurement scales. Variables differ in "how well" they can be measured, i.e., in how much measurable information their measurement scale can provide. There is obviously some measurement error involved in every measurement, which determines the "amount of information" that we can obtain. Another factor that determines the amount of information that can be provided by a variable is its "type of measurement scale." Specifically variables are classified as (a) nominal, (b) ordinal, (c) interval or (d) ratio. Your text treats interval ratio as single type of level of measurement.
LEVELS OF MEASUREMENT:
Levels of measurement are very important - determine what statistics will be used.
Nominal variables allow for only qualitative classification. That is, they can be measured only in terms of whether the individual items belong to some distinctively different categories, but we cannot quantify or even rank order those categories. For example, all we can say is that 2 individuals are different in terms of variable A (e.g., they are of different race), but we cannot say which one "has more" of the quality represented by the variable. Typical examples of nominal variables are gender, race, colour, city, etc.
Ordinal variables allow us to rank order the items we measure in terms of which has less and which has more of the quality represented by the variable, but still they do not allow us to say "how much more." A typical example of an ordinal variable is the socioeconomic status of families. For example, we know that upper-middle is higher than middle but we cannot say that it is, for example, 18% higher. Also this very distinction between nominal, ordinal, and interval scales itself represents a good example of an ordinal variable. For example, we can say that nominal measurement provides less information than ordinal measurement, but we cannot say "how much less" or how this difference compares to the difference between ordinal and interval scales.
Interval variables allow us not only to rank order the items that are measured, but also to quantify and compare the sizes of differences between them. For example, temperature, as measured in degrees Fahrenheit or Celsius, constitutes an interval scale. We can say that a temperature of 40 degrees is higher than a temperature of 30 degrees, and that an increase from 20 to 40 degrees is twice as much as an increase from 30 to 40 degrees. Healey uses the term Interval for Interval-Ratio* variables.
*Note: Ratio variables are very similar to interval variables; in addition to all the properties of interval variables, they feature an identifiable absolute zero point, thus they allow for statements such as x is two times more than y. Typical examples of ratio scales are measures of time or space. Most statistical data analysis procedures do not distinguish between the interval and ratio properties of the measurement scales.