Chapter 6: The Beast of Bias Flashcards

1
Q

Sources of bias

A
  • outliers
  • violations of assumptions (additivity/linearity, normality, homogeneity/homoscedasticity, independence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

stuff that can be affected by bias

A
  • parameter estimates (including effect sizes)
  • standard errors and CIs
  • test statistics and p-values
  • conclusions

there are methods of reducing bias

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

linear model and parameters

A

we can use the liner model to test theories or for prediction. in both cases, our interest is in estimating parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

estimators

A

-estimation is the process of estimating parameters from sample data
- an estimator is a procedure, rule, or criterion that is used to estimate the parameters
- the result of estimation are estimates of the parameters
- estimates can be below or above the actual parameter value. a value above is called an overestimate, and below is an underestimate.
- in practice, we never know whether our estimates are above or below the parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

qualities that make a good estimator

A
  • unbiasedness: on avg, its going to give you the population parameter. the distribution is not leaning towards one side or the other
  • consistency: as the sample gets bigger, the estimates become more precise
  • efficiency: not too spread (little error). mean is the most efficient, median is somewhat efficient, and mode is inefficient

a biased estimator is sometimes the preferred option, can be overcome with a bigger sample size
bias does not mean bad. a biased estimator is a method that will not equal the parameter, on average
bias is a property of an estimator, not an estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

estimators and mean, median, mode

A
  • on average, the mean is going to give you estimates that match the population parameter, it is unbiased. the expected value of the sampling means is the parameter
  • the median is unbiased as long as the sample is normally distributed
  • the mode is unbiased as long as the sample is normally distributed

mean is the best estimator because it is unbiased , consistent, and efficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

OLS method

A
  • give estimates of the parameter while making sum of squares as small as possible
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is an outlier?

A
  • a score that is very different from the other scores
  • there are different kinds
  • outliers affect parameter estimates
  • have an effect on the parameters, but an even bigger effect on the SS
  • bias > SD > SE > CI (makes them much wider, which is an issue for significance testing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

overview of assumptions

A
  • if assumptions are violated, you can’t trust the test statistic
  • assumptions violations vary by degree
  • even if assumption is violated, some tests are still valid
  • assumptions about the characteristics of the data
  • some statistical tests are robust to violations of an assumption, meaning that the results are usually still valid even if the assumption is violated
  • parametric tests: statistical tests that make assumptions
  • nonparametric tests: don’t require assumptions about the distribution be met
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

additivity and linearity

assumption

A
  • the relationship between X and Y can be represented by a line
  • linear relationship between the predictors and the outcome
  • important that this is met because fitting a linear model to nonlinear data would be inappropriate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

normality

assumption

A
  • the residuals of the model / the sampling distribution of the parameters (b’s) must be normally distributed
  • for CIs around a parameter estimate to be accurate, the estimate must have a normal sampling distribution
  • for significance tests of models to be accurate, the sampling distribution of what’s being tested must be normal
  • matters if we’re assuming that the residuals are normally distributed, using a linear model, the assumption of normality is important in choosing an estimation method. if assumption is met, use OLS method
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

central limit theorem

A
  • describes the relationship b/w a population of individual scores and the samplign distribution of the means (estimates)
  • as the sample size increases, the shape of the sampling distribution is going to approach normality, not matter the shape of the individual score distribution (parent distribution)
  • 30 people
  • sampling distribution depends on sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

homoscedasticity/homogeneity of variance

assumption

A
  • homoscedasticity: assumption that the population variances from each different group are exactly the same
  • homogeneity: different groups come from populations with the same variance
  • homoscedasticity is the same, but with a continuous variable
  • if assumption is violated, consider estimating the parameters using the weighted least squares method (WLS)
  • CIs and NHST considerably biased if assumption is not met

funeling indicates violation of homogeneity / heteroscedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

independence

assumption

A

-error terms in your model are unrelated to one another
- cannot trust CIs or NHST if violated
- use robust methods/HLM is violated
- if the errors aren’t independent, this gives a low estimate of SE, which affects CI/NHST

How well did you know this?
1
Not at all
2
3
4
5
Perfectly