__Soc2205a-b Healey Chapter 1 Elementary Concepts
in Statistics__

**Data: **Observations gathered through sociological
research. Make sense of data through statistical analysis.

**Description vs. Inference**:

__Descriptive__ statistics used to describe and
explore patterns, trends, and relationships. A way of
summarizing information about a sample.
Example: mean (average), median, standard deviation, etc. can describe one
variable. Tables and graphs are also a way of describing what the distribution
of a sample variable “looks” like (we will look at these in Healey Ch. 2-5). To
describe the relationship between two variables, we use what is called a
measure of association (Healey Ch. 12-16). Measures of association can tell us
1. how strong a relationship between two variables is,
and 2. what the direction of the relationship is (**explain** negative and positive
relationships)

__Inferential__ statistics (Healey Ch. 6-11)
used to make predictions based on the data. To take what is
known about a sample and infer or make a prediction about a population.
Need to have__ random sample__ (see Healey Ch. 6) or __probability__
sample (in order to calculate probability of being able to estimate pop.
parameter.)

Population
= total set of subjects.

Sample
= subset of population.

Statistic
= Numerical summary of sample

Parameter
= Numerical summary of population

Sample
Statistics (Descriptive) à Population Parameters (Inferential)

__Example :__** **

population = all statistics students à mean = population parameter

sample = soc205a class à
mean = sample statistic

**Univariate****, Bivariate, Multivariate = one, two or
multiple variable analysis**

**What are variables?** __Concepts
in numerical form that can vary in value__. Variables are things that we measure, control, or
manipulate in research. They differ in many respects, most notably in the role
they are given in our research and in the type of measures that can be applied
to them.

**Dependent vs. independent
variables.** Independent
variables (also “causal” or “**explanatory**”)
are those that are manipulated. Designated as **X**. Dependent (or “outcome” or “**response**” variables are only measured
or registered. Designated as **Y**. This distinction appears terminologically confusing to
many because, as some students say, "all variables depend on
something." However, once you get used to this distinction, it becomes
indispensable.

**Control Variables - **used in multivariate stats. Usually nominal or ordinal. Used in
multivariate analysis.

**Hypothesis:
**a
statement that describes the relationship between two or more variables.

**Qualitative vs. Quantitative Data:**

Qualitative
= categorical, categories are unordered. No mathematical operations except
percentages or proportions. i.e. marital status

Quantitative
= variables have a numerical format. Various mathematical operations can be
used, including
< >, + -, etc.

**Discrete (finite
number of values - cannot subdivide basic unit of measurement) and Continuous
(infinite number) **i.e. number of children, vs. weight or age. However distinction blurry - depends also on
how variable is measured.

**Measurement scales.** Variables differ in "how well" they can be
measured, i.e., in how much measurable information their measurement scale can
provide. There is obviously some measurement error involved in every
measurement, which determines the "amount of information" that we can
obtain. Another factor that determines the amount of information that can be
provided by a variable is its "**type
of measurement scale**." Specifically variables are classified as (a)
nominal, (b) ordinal, (c) interval or (d) ratio. Your text treats interval
ratio as single type of level of measurement.

**LEVELS OF MEASUREMENT:**

**Levels of measurement are very important - determine
what statistics will be used. **

**Nominal variables** allow for only qualitative classification.
That is, they can be measured only in terms of whether the individual items
belong to some distinctively different categories, but we cannot quantify or
even rank order those categories. For example, all we can say is that 2
individuals are different in terms of variable A (e.g., they are of different
race), but we cannot say which one "has more" of the quality
represented by the variable. Typical examples of nominal variables are gender,
race, colour, city, etc.

**Ordinal variables** allow us to rank order the items we
measure in terms of which has less and which has more of the quality
represented by the variable, but still they do not allow us to say "how
much more." A typical example of an ordinal variable is the socioeconomic
status of families. For example, we know that upper-middle is higher than
middle but we cannot say that it is, for example, 18% higher. Also this very
distinction between nominal, ordinal, and interval scales itself represents a
good example of an ordinal variable. For example, we can say that nominal
measurement provides less information than ordinal measurement, but we cannot
say "how much less" or how this difference compares to the difference
between ordinal and interval scales.

**Interval variables** allow us not only to rank order the items
that are measured, but also to quantify and compare the sizes of differences
between them. For example, temperature, as measured in degrees Fahrenheit or
Celsius, constitutes an interval scale. We can say that a temperature of 40
degrees is higher than a temperature of 30 degrees, and that an increase from
20 to 40 degrees is twice as much as an increase from 30 to 40 degrees. Healey
uses the term **Interval **for** Interval-Ratio* variables.**

***Note: Ratio variables** are very similar to interval variables; in
addition to all the properties of interval variables, they feature an identifiable
absolute zero point, thus they allow for statements such as x is two times more
than y. Typical examples of ratio scales are measures
of time or space. Most statistical data analysis procedures do not distinguish
between the interval and ratio properties of the measurement scales.