Exam 1 Flashcards

1
Q

What effect does sample size have on your sampling distribution? In other words, how does sample size effect sampling error?

A

As your sample size gets bigger, your distribution gets narrower. You focus more on the actual effect and have less error in your results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do the amount of predictor variables affect type I error?

A

More predictor variables can lead to more redundancy and therefore more Type 1 error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the limitations to null hypothesis significance testing?

A

1) flawed logic. We want to know the probability of our data given the null hypothesis, but we actually get the probability of the null hypothesis given our data. In order to calculate what we’re hoping for, we would need to use Baysian statistics.
2) Correlations are rarely actually zero.
3) Impedes our ability to move forward as a field because we keep comparing to zero rather than to previous findings.
4) Turns a decision continuum into a dichotomous decision.
5) .05 is arbitrary
6) Problem with how we use significance testing–we often confuse statistical significance with practical significance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are advantages to null hypothesis significance testing?

A

1) good at penalizing design weakness
2) objective measurement makes it easy to interpret your results
3) objectivity makes it hard for people to discount results they don’t like
4) Widely accepted measure of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are alternative approaches to significance testing?

A

1) confidence intervals
2) effect sizes
3) meta-analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a confidence interval give us that a significance test does not?

A

1) more if a continuum than dichotomous, so you get more information than just yes/no
2) you see the variability in results. The confidence interval may not include zero so it may be significant, but if it’s very large then we might not be that confident in it.
3) confidence interval is focused on the effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are Cohen’s arguments to using NHST as long as it’s alongside other things?

A

Add info here

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why does Cummings say we should give up on NHST altogether?

A

NHST is fatally flawed because it leads to non-publication of non-significant findings and overall makes research untrustworthy. We need “new statistics” (estimation based on effect sizes, confidence intervals, and m-a). If our goal is building a cumulative qualitative discipline, there is no room for NHST.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are examples of effect sizes?

A

Raw mean differences
Cohen’s d
Pearson’s r
Partial eta squared (proportion of variance in y a given predictor accounts for)
R squared (proportion of variance accounted for in your regression equation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are Baguley’s two main arguments against standardized effect sizes?

A

1) what SD is used in the denominator can have implications for comparability.
2) standardization is bad at accounting for different versions of the same measurement instrument, individuals scoring particularly high or low on a variable, and different study designs. Since any of these changes would impact sampling variance, they would also impact standardized effect sizes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the advantages to unstandardized effect sizes?

A

1) puts things in the unit of measurement of your study (more interpretable to people outside field of original unit of measurement is meaningful)
2) easier to calculate, therefore less prone to error
3) less influenced by unreliability, range restriction, and study design

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the advantages to standardized effect sizes?

A

1) different advantages of interpretability

2) allows us to more easily compare across studies that use different measures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why do we want to know the confidence intervals around effect sizes?

A

Gives us some information about the VARIABILITY. If the confidence interval for a small effect size is large, there’s a possibility that the effect size is bigger than the effect size itself makes it seem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The Facebook study’s d was .02 and the confidence interval was .012-.03. What does this tell us?

A

We can be very confident that the effect size is very small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

If we have a d of .6 and the 95% confidence interval is .20-1.00, how do we interpret this?

A

The d is a fairly size able effect, but we can’t be very confident in it. The true effect size could be pretty small (.2) or very large (1.00)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the criticisms of Cohen’s d guidelines (eg Meyer et al., 2001)

A

His guidelines (.1 small, .3 medium, .5 large) sets the bar too high and is not in line with what actually gets published. Look at a lot of medical correlations–important things often have “small” correlations!

Boscow et al. also argue that benchmarks should be field specific. This has some feasibility issues, though.

Chris suggests there may be a balance somewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the advantages to interpreting correlations with BESD?

A

It’s simple to calculate and can be really useful in communicating findings to community partners and the public.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are disadvantages to interpreting correlations with BESD?

A

It makes a number of assumptions that may limit its scientific use, such as:

1) assuming variance in the two groups is similar
2) the formula assumes a 50% base success rate
3) assumes you have truly dichotomous variables, whereas in reality sometimes take continuous variables and artificially dichotomize them.

When any of these things happen, BESD becomes less accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What factors matter for the standard error of a correlation?

A

The correlation and the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you interpret a 95% confidence interval for a correlation?

A

We can be 95% confident that a correlation from a population with an effect size (correlation) of zero would fall with in this range.

So if the correlation we’ve found is outside of this confidence interval, we can be pretty sure that there is an effect.

It’s backward like this because correlations are centered around 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a Fisher Z transformation for?

A

Allows us to compare a correlation to a value other than 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Why shouldn’t you calculate a pearson’s r (or a point-biserial correlation) when you’re working with a dichotomous variable?

A

You’re violating the assumption that the distribution of the two variables will be the same. The Pearson r only has a maximum of of 1 when this assumption is met, so if it’s not (as in the case of a dichotomous variable) the correlation will be attenuated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What type of correlation do you want to calculate if you have a dichotomous variable?

A

A BiSerial correlation (not Pearson’s r and not point-BiSerial)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What kind of correlation do you calculate if both of your variables are dichotomous?

A

The best solution is a tetrachoric correlation, because it assumes underlying continuous variables.

Another option is the phi coefficient, but the limitation is that the correlation will be attenuated if if the proportion of people in each group is not the same (aka the variation is not the same)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What effect does unreliability have on correlations?

A

Attenuates them. For a regression, unreliability in X will lower the regression weight, but unreliability in Y has no effect on your regression weight.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is one problem with R squared?

A

It’s not an unbiased estimate. It’s affected by sampling error and also the number of predictors in the model.

You can account for this by reporting the adjusted R squared. The larger the sample size, the closer together R squared and the adjusted R squared will be.

27
Q

Why can’t you use a linear regression when you have a dichotomous outcome?

A

We are assuming the relationship is linear, and it can’t be with a dichotomous outcome.

Also, linear regression assumes the error terms are normally distributed, which isn’t true with a dichotomous outcome.

28
Q

What kind of regression would you do if you have a dichotomous outcome?

A

Logistic regression

29
Q

How do you report an effect size of a logistic regression?

A

Cox and Snell came up with a formula, Nagelkerke is another one. It’s typically called a pseudo r squared. You can’t interpret it in the same way as a regular r squared, but you can use it to compare two different models and determine which is better.

30
Q

What do we correct for in a meta-analysis?

A

Unreliability
Sampling error
Range restriction

31
Q

What’s the difference between a theoretical and operational correction in a meta-analysis?

A

In an operational correction, you only correct for unreliability in the criterion variable (for y), not the predictor variable.

32
Q

What’s range restriction?

A

We often treat our sample as if it’s the full range of the population, but in many cases it’s not. For example, looking at intelligence among college students.

33
Q

What is an example of direct vs indirect range restriction?

A

In terms of intelligence among college students, GPA would be direct, family income would be indirect.

34
Q

What do U values tell us about range enhancement vs restriction?

A

A U value of less than one –> range enhancement, our original correlation would be inflated

A U value of more than one –> range restriction, our original correlation would be attenuated.

35
Q

How do you decide the order of your corrections in a meta-analysis?

A

If direct range restriction, correct for range restriction first, then u reliability.

If I direct tame restriction, correct for unreliability first, then range restriction.

36
Q

What is a bare bones meta-analysis?

A

Only corrects for sampling error, not range restriction or unreliability.

37
Q

What are the steps in a meta-analysis?

A
  1. Develop inclusion/ exclusion criteria and identify articles
  2. Code relevant information in identified articles (sample size, effect size, variance or SD in sample, reliability, info about moderators)
  3. Conduct corrections
38
Q

Why do we calculate the variance of rho?

A

Allows us to calculate confidence intervals

39
Q

In the context of meta-analysis, what’s the difference between a confidence interval and a credibility interval?

A

Credibility interval gives us an idea of whether the validity of the effect size can generalize across situations. Is there a lot of variability or just a little across these different contexts? If the credibility interval is large or includes zero, likely means moderators.

Confidence intervals are to calculate how accurate our estimate of the population parameters is, aka how much sampling error is in our population parameter estimate.

40
Q

What order would you do things in terms of confidence and credibility intervals?

A
  1. Calculate credibility intervals to see if there are moderators.
  2. Break up your effect sizes by those moderator.
  3. Calculate confidence intervals to see how accurate your estimates are in each moderating condition.
41
Q

Recommendations in writing questions:

A
Conscious of reading level 
Avoid double barreled questions
Avoid leading questions
Avoid double negatives
Avoid idioms
Avoid extreme wording
42
Q

What is acquiescence responding?

A

Responding how they think we want them to

43
Q

How can you guard against socially desirable responding?

A

Can make anonymous, but that can limit what we can do with our data.

Social desirability scale

Multiple sources of info

Forced choice with equally desirable options

44
Q

What is item difficulty?

A

The proportion of individuals who get a dichotomous question right or wrong. (You get it by calculating the average of a dichotomously scored question)

45
Q

Crocker and Algina say having all our items of a medium difficulty is desirable. Why? What are the advantages and disadvantages of that?

A

The advantage is that it will maximize variance.

The disadvantage is that you lose some accuracy, you can’t differentiate well at the top or bottom of the distribution.

46
Q

What is item discrimination?

A

It tells us how well the item distinguishes between individuals that score very low and very high. You can also look at the correlation between the score on the item and the score on the test as a whole (you want positive item total correlations).

47
Q

What do we ideally want in terms of item difficulty and discrimination?

A

We want a range of item difficulties and all high discrimination.

48
Q

What are we looking for when we evaluate a test as a whole?

A

Reliability and validity

49
Q

What are the two types of measurement error?

A

Systematic measurement error–errors that consistently affect someone’s score.

Random measurement error–error based on chance alone (not feeling well, miscalculation, etc.). You would NOT expect random measurement error to affect a participant at multiple administrations.

50
Q

When we are talking about reliability, what kind of measurement error are we talking about?

A

Random measurement error. We can account for the random piece, but we can’t differentiate someone’s true score from systematic error.

51
Q

What is classical test theory?

A

X = T + E

Our observed test scores are a function of our true ability plus some error in measurement.

52
Q

What are assumptions in classical test theory?

A

1) we assume the correlation between the true score and the random measurement error is zero
2) we assume there’s no correlation between random measurement error scores across multiple test administrations.
3) mean of error scores in population would work out to be zero.

53
Q

What are the properties of a parallel test?

A

1) true score of one test is equal to the true score of the other test
2) variance of the tests has to be equal

We can’t ever have a truly parallel test because we can’t ever know the true score or the exact variance of the error.

54
Q

What are our types of errors that impact reliability of a test?

A

1) transient error–type of error that changes over time. It’s a form of random error. It would be lumped in with your true score if you only took the measure once, but it’s lumped in with random error if you take the test more than once.
2) content error
3) random response error
4) rater error

55
Q

Why do Schmidt and Hunter say we need to be more thoughtful about how we measure reliability?

A

We usually use Cronbach’s alpha, but this isn’t always appropriate if we are concerned about a type of error it doesn’t account for, like transient error.

56
Q

What are three forms of reliability?

A

1) alternate forms. Addresses content and random error.

2) test-retest.
Addresses transient error mostly, but also random error.

3) alternate forms test retest.
Addresses content error, transient error, and random error.

57
Q

What is internal consistency reliability?

A

It refers to how consistently people are responding to items on a measure. It addresses co tent error and random response error because it basically treats different items or subsets of items as alternate forms.

58
Q

What are the two types of internal consistency reliability?

A

1) Split halves reliability: split the test in half in some way and calculate the correlation of the two halves.
2) Cronbach’s alpha: widely used because it’s easy to calculate and convenient because it can be used for both dichotomous and continuous items.

59
Q

What contributes to Cronbach’s alpha?

A

Number of items in the test and the average inter-item correlation.

60
Q

Why is split halves reliability an underestimate?

A

Because test length is associated with reliability. You can correct this using the Spearman-Brown correction.

61
Q

What are limitations to internal consistency reliabilities?

A

1) only appropriate when the scale is unidimensional. If your test has subtests, you should report the reliabilities for the subtests, too.
2) generally not appropriate for speeded tests.
3) problems with how we interpret it: it’s technically the lower bound of the theoretical reliability, because we can’t ever have two totally parallel tests.

62
Q

What were some of Schmidt’s criticisms with how we overuse Cronbach’s alpha?

A

1) people often use Cronbach’s alpha as an indicator of unidimensionality, which you can’t do.
2) the cutoff that an alpha above .70 is good is totally arbitrary and does not necessarily indicate meaningfulness.
3) when we use alpha to correct correlations, we need to remember that it’s already attenuated for the reasons discussed (multidimensionality and because it’s the lower bound estimate) so this means we are over-correcting our correlations and thus our meta-analytic estimate is inflated.

63
Q

What factors affect statistical power?

A

Effect size, alpha level, and sample size