L3 - Associations Flashcards

1
Q

What kind of analysis is used to assess association between continuous variables?

A

Correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What kind of analysis is used to assess association between categorical variables?

A

Odds Ratios (and contingency tables)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is covariance?

A

This is a measure of the degree of concurrent variation in people’s scores on two variables. This is a measure of strength and direction of linear association between two variables.Can be pop parameter or sample statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are similar features of variance and covariance?

A
  • Similar formula!!- their size depends on the metric of the variables used to calculate it.. so they need to be standardised.. we need to know whether something is a big or small covariance—>PEARSON’S CORRELATION COEFFICIENT!
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is pearson’s correlation coefficient?

A

This is basically the standardised version of covariance!This is where the deviation scores are replaced with z scores in the process of calculating covariance. denoted with an “r”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some properties of the t distribution?

A
  • bell shaped - Variability defined by its df- as df increases, the t dist converges to a standard normal dist
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which distribution does the sampling distribution of a population correlation coefficient of zero correspond to?

A

dont know right now!!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are degrees of freedom?

A

It basically means you have certain constraints operating, and you only have a certain number (df) of options until the solution is known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

how do you calculate the degrees of freedom for a correlation?

A

df = n-2basically, because there are 2 variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does the the regions of rejection for a normal distribution differ from a t distribution?

A

The region of rejection is further out from zero in the t distribution

As df increases, it pushes the region close to 1.96 which is the normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the observed test statistic found for sample correlation?

A

Tobs = (r - null hypoth value) / standard error.

whereby, standard error is:

square root of: (1- r^2)/(n-2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When will a larger value of Tobs occur?

A

When:

  • There is a bigger correlation (r), because then the numerator is larger and standard error is smaller
  • There is a smaller standard error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is pearson correlation coefficient an attractive statistic for research?

A

Because it is a natural effect size measure. there is no need to transform the value to know the size of the effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When would we use fisher’s r to z transformation?

A

When there is a large sample, for confidence interval estimates on XECI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the underlying assumptions of Pearson Correlations?

A

That the scores on both variables are..

  • linearly related to each other.
  • continuous
  • independently observed of each other
  • normally distributed
  • measured without error
  • unrestricted in their range

formula to calculate pearson correlation is biased, but consistent.
Violation of the assumptions causes further bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a cross-classified contingency table?

A

A table in which each cell uniquely identifies the number of people (frequency) with joint membership in two variables.

17
Q

When is there a dependency between two categorical variables?

A

When the relative frequency of people in the categories of one variable covary with the categories of another variable.

18
Q

What test do we use to determine whether two categorical variables or independent or dependent?

A

CHI SQUARED TEST!

19
Q

How do you perform a Chi Squared NHST?

A
  1. Calculate Tobs:
    = sum of all ((observed cell freq - expected freq) / observed cell freq)
  2. Calculate the df:
    df = (rows - 1 ) x (columns -1)
  3. Find the critical value for the alpha value, with the corresponding df.
  4. Reject null if Tobs is LARGER. than critical value.
20
Q

How do you calculate expected frequencies?

A

(row marginal freq x column marginal freq) / n

21
Q

What can analysis of expected frequencies show?

A

Greater discrepancy between obs and expected cell frequencies indicates greater degree of dependency.

greater dependency reflects stronger association between the two variables.

stronger association corresponds to larger observed chi square statistic.

22
Q

What does the chi square distribution look like?

A

It is positively skewed, due to squared features (all values on positive side of x axis).

So critical region only in upper tail.
What does the chi square distribution look like?
Shape of this distribution is only affected by df.

23
Q

What can a chi squared test tell you in the end?

A

Whether there is an association or not.

No metric.

24
Q

What is an odds ratio?

A

An odds ratio indicates the odds of one category occurring in one variable, relative to the odds of a second and different category occurring in another variable.

25
Q

How is an odds ratio calculated?

A

For a contingency table: (A x D) / (B x C)

26
Q

How are odds calculated?

A

Probability of the event occuring / (1- Prob.)

27
Q

What are the properties of an odds ratio?

A

If OR = 1 then the two variables are independent and have no association.

OR can never be negative.

Stronger association is indicated by an estimated value further from 1.

Odds ratio is undefined if any observed cell freq is zero.

OR is a symmetric relationship between variables, so it doesn’t matter which row or column has what variable.

It is both a sample statistic and a population parameter.

28
Q

What does it mean when we have a 95% CI for an odds ratio that captures 1?

A

It provides no evidence for rejecting the null hypothesis.

29
Q

What happens when the odds ratio is below 1?

A

It is harder to interpret, so we take the reciprocal of the estimated value (1/x).

If it’s in a CI, we will swap the lower bound to become the upper bound and vice versa.

only done when BOTH values of a CI are