Statistics 3 Flashcards
Association
We speak of an association (or: correlation) between two variables if certain values of one variable tend to go with certain values of the other.
Dependent variable
In association analysis, we are interested in whether one or a set of variables help to explain or predict another variable. We call the variable we seek to explain or predict the dependent variable (or also: outcome variable, response variable, Y). We use the term dependent variable because the values of the dependent variable are hypothesized to depend on values of other variables.
Independent Variable
We call the variable(s) that are hypothesized to explain or predict the dependent variable independent variable(s) (or: treatment, stimulus, explanatory variable, X).
Three components of association
- Nature of direction
- Strength
- Statistical significance
To establish association means that we establish all three components
Nature of direction
Nature of the direction of the relationship between two variables in your sample.
- Respondents with high education are more likely to turn out in elections. Therefore, education increases electoral participation
Strength
Strength of the relationship between two variables in your sample
- Respondents with high education are twice as likely to participate in elections. Therefore, education strongly increases electoral participation
Statistical methods for establishing associations
- linear regression
- Pearsons r
Statistical significance
Statistical significance of the relationship in your sample
That is, how likely is it that the association you observe in a sample generalizes to the population (e.g., all UK voters)?
Linear Relationship
- On a scatter plot it is the simplest way of describing a relationship between two quant variables.
- A straight line, linear relationship.
- Linear regression gives us the best linear association.
How to do a linear function
𝒚=𝜶+ 𝜷𝒙
whereby:
𝑦 is the dependent variable;
𝑥 is the independent variable;
𝛼 is the intercept or constant: the value of 𝑦 when 𝑥 = 0;
𝛽 is the slope or gradient: how much 𝑦 changes when 𝑥 increases by 1.
What tweak must we make to the function?
Linear functions are deterministic, but real-world data is messy. To account for this, we say that we explain the expected value of the dependent variable: E(y).
𝑬(𝒚)=𝜶+ 𝜷𝒙
We account for variation around the regression line
Take-home point: linear regression does not make exact predictions; it predicts an average value of Y for a given X
Least squares estimation
- Establishes the line of best fit
- Least squares estimation establishes the combination of intercept and regression slope which minimize the sum of the squared residuals, often also called the sum of squared errors (SSE)
𝑆𝑆𝐸= ∑(𝑦 − 𝑦̂)𝟐
Establishing the Nature (or: Direction) of Linear Associations
The sign of the slope coefficient (𝜷) tells us about the nature (direction) of linear associations
𝛽>0 = positive relationship (as X increases, so does Y)
𝛽<0 = negative relationship (as X increases, Y decreases)
𝛽=0 = independence (as X increases, Y stays the same)
How can learn about the strength of linear associations by interpreting the size of the slope coefficient (𝜷) ?
- 𝜷 gives us the change in Y if X increases by 1
- Can make sense of that by putting in relation with distributions of X and Y
- Straightforward to calculate increases (or decreases) of more/less than 1