Quiz 4 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Three General Assumptions of Parametric Statistical Tests

A
  1. Normality of sampling distributions/population residuals
  2. When you compare more than 1 population, the variances of the populations are equal
  3. Data from your population are independent
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Parametric Tests

A
  • statistical tests used to estimate a specific parameter value (e.g. t-tests)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Normality

A

Inferential statistics assume that our sample data are drawn from normal sampling distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can we know about normality?

A
  • If we have a large enough sample (typically more than 30), we have met the assumption
  • If we have a small sample, we examine our sample to infer normality of the sampling distribution
  • If you have multivariate data, examine each variable by itself to see if it is normally distributed (e.g. aX + bY) → a linear combination needs to be normal
  • If we have a normally distributed sample data, it is likely that it is from a normally distributed population → sampling distribution would be normal
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Skewness

A
  • When a distribution is perfectly normal, the values of skewness and kurtosis are zero
  • Positive skewness means that there is a pile up of cases on the left and a long right tail (skewed to the right)
  • Negative skewness means that there is a pileup of cases to the right and a long left tail (skewed to the left)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When does the CLT not work/apply?

A
  • When distributions have thick tails
  • If your sample is small
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

The Central Limit Theorem

A
  • The CLT is one of the most remarkable results of the theory of probability
  • In its simplest form, theorem states that:
    - The mean of a large
    number of
    independent
    observations from
    the same distribution
    has, under certain
    general conditions, an
    approximate normal
    distribution
  • Note: exception of distributions with heavy tails
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Testing Normality in Single Variables

A

TB p. 183-191

  1. Is the sample size big enough to assume that the sampling distribution is normally distributed?
  2. Look at histogram of each continuous variable → starting at visual inspection of normality
  3. Perform the Kolmogrorov-Smirnov (K-S) test or the Shapiro-Wilk test
  • Significant results would suggest that the data are NOT normally distributed
  • Caveat: the power of the test depends on the sample size, and is often a moot point because in large samples without thick tails, we would assume normality anyways
  • Shapiro-Wilk test is highly sensitive to even small deviations from normality in large samples
    Look to skewness and kurtosis stats
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

For the formula for skewness

A
  • Average of the z scores raised to the third power

–> Increases the influence of outliers by raising to the third power

  • Converting raw scores into z scores
  • If skewed to the right, we will get a positive skewness score
  • If skewed to the left, we will get a negative value
  • No skewness → Formula results in zero

Cutoffs: ± 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If skewed to the right, we will get a _______ skewness score

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

If skewed to the left, we will get a ______ value

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Formula for Kurtosis K4

A
  • Kurtosis values above zero indicate a distribution that is too peaked with short thicks tails
  • Kurtosis value below zero is platykurtic

Leptokurtic (thicker tails)→ positive kurtosis statistic

Platykurtic (thinner tails) → negative kurtosis statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Kurtosis value below zero is _________

A

platykurtic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Leptokurtic (thicker tails)→ ______ kurtosis statistic

A

positive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Platykurtic (thinner tails) → ________ kurtosis statistic

A

negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Rule of thumb cutoffs for kurtosis

A

±7 be concerned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Significance Tests for Skewness and Kurtosis

A

Step 1: convert skewness and kurtosis scores into z scores

Step 2: compare z scores to critical values of ±1.96 for small samples and ±2.58 for large samples. If greater than the critical value, significant skewness/kurtosis

  • More stringent for larger samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the Big Deal if a Distribution Isn’t “Normal?”

A
  • We could get inaccurate results from our analysis
  • Mess with type I and type II error rates
  • Meaning that the null could be true when our stats tell us it isn’t or vice versa
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What to do if normality assumption is NOT met

A
  1. Data transformation
    Appropriate when there is skewness in distribution
    - Replacing the data with a function of all the data within that variable
  2. Non-parametric tests
  3. Modern methods (e.g. bootstrapping)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Data Transformation

A

Most common → square root transformation

Most useful when data are skewed to the right

Pulls more extreme values closer to the middle

Bigger impact on bigger values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Square Root Transformation is most useful when _______

A

data are skewed to the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

When data are skewed left what transformation can be done?

A

When data are skewed to the left:

  • Reflect scores and the do a square root transformation
  • Subtract values from a large number to reflect
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Log transformations

A

For extreme positive skew → reduces positive skew
Pulls in values to a greater degree than square root transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Inverse transformation

A
  • Transforms data with extreme positive skew to normal
  • 1 / (value of data)
  • Need to add a constant to bring all values to non zero, but CAN have a negative as long as there’s no zero
  • Table 6.1 in TB
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

To Transform … or Not

A
  • The CLT: sampling distribution will be normal in samples >30 (unless sample distribution has a substantial skew; heavy tails)
  • Transformations sometimes fail to restore normality and equality of variances
  • They make the interpretation of results difficult, as findings are based on transformed data
24
Q

Rank Order Tests (AKA Non-Parametric Tests)

A

Can be used in place of their parametric counterparts when it is questionable that the normality assumption holds

sometimes more powerful in detecting population differences when certain assumptions are not satisfied

Nonparametric tests cannot assess interactions

25
Q

Modern Methods

A

Refers to approaches dealing with non normal data that require a lot of computing power

  • E.g. bootstrapping methods
26
Q

Bootstrapping

A

Goal is to observe the sampling distribution shape directly to allow us to do hypothesis testing

Uses sample data to estimate the sampling distribution itself

By drawing random samples with replacement from the sample data

Because sampling distribution is estimated directly, no need to assume normality

P-value can be calculated based on how rare it is to get the observed test-statistic value or more extreme values in the estimated sampling distribution (regardless of whether it is normal or not)

27
Q

Rule of thumb for Bootstrapping

A

5,000 - 10,000 bootstrap samples

28
Q

Can use bootstrapping to create a CI

A

Find central value (mean) of the data points and look at what falls at lower 2.5th percentile and upper 97.5th percentile and use them as upper and lower bounds for CI

29
Q

The Three Assumptions of Parametric Statistics

A
  1. The sampling distribution(s) is(are) normally distributed
  2. Homogeneity (equality) of Variance
  3. Data from your population are independent
30
Q

Homogeneity (equality) of Variance

A

The assumption that the dependent variable exhibits similar amounts of variance across the range of values for an independent variable

31
Q

Assessing Homogeneity of Variance

A

Visual inspection of graphs
- Scatter plot, residual
plot

Levene’s Tests

Can become overly sensitive in large samples

Variance Ratio (Hartley’s
FMAX)

32
Q

Variance Ratio (Hartley’s Fmax)

A

With 2 or more groups
VR = Large
variance/smallest
variance
If VR < 2, homogeneity of variance can be assumed
If the group sizes are roughly equal, hypothesis testing results are robust to the violation of homogeneity
Would still likely still get valid results even if
If the largest group size is smaller than 1.5 times the smallest group size → can use this concept

33
Q

Leven’s Tests

A
  • Tests if variance in
    different groups are
    the same
      - Significant = variances 
         not equal
    
      - Non significant = 
        variances are equal
        Null hypothesis is that 
        there’s homogeneity 
         of variance
34
Q

Visual inspection of graphs for assessing homogeneity of variance

A
  • Scatter plot, residual
    plot

*Space between line of best fit and actual point
–> Deviation, error, residual

35
Q

Addressing Homogeneity of Variance

A
  1. Using robust methods
  2. Bootstrapping
  3. Transforming an outcome variable
36
Q

Why is independence of data important?

A
  • general formula for test statistic involves two types of variability, one in numerator and one in denom
  • Formula for test-statistic = explained variability/unexplained variability
  • We want test statistic to be bigger such that we have greater explanatory power
    BUT with dependent data, unexplained variability becomes artificially smaller
    In the case of dependent data → increased Type I error rate
37
Q

Measuring Relations Between Variables

A

We can see whether, as one variable deviates from its own mean, the other deviates in the same way from its own mean, the opposite way, or stays the same

This can be done by calculating the covariance

If there is a similar (or opposite) pattern, we say the two variables covary

38
Q

Variance

A

measure of how much a group of scores deviates from the mean of a single variable

Average squared deviation from the mean

39
Q

Covariance

A

Tells us by how much a group of scores on two variables differ from their respective means

40
Q

Covariance Steps

A

Calculate the deviation (error) between the mean and each subject’ score for the first variable (x)

Calculate the deviation (error) between the mean and their score for the second variable (y)

Multiply these deviations (error) values

These multiplied numbers are called the cross product deviations

Add up these cross product deviations

Divide by N-1 → result is covariance

41
Q

COVARIANCE IS THE AVERAGE _________

A

CROSS-PRODUCT DEVIATION

42
Q

Limitations of Covariance

A

depends upon the units of measurement

E.g. the covariance of two variables measures in miles and dollars would be much smaller than if the variables were measures in feet and cents, even if the relationship was exactly the same

43
Q

Solution to the limitations of covariance

A

Solution –> standardize it
Divide by the product of standard deviations of both variables

The standardized version of covariance is known as the correlation coefficient
Relatively unaffected by units of measurement

44
Q

Correlation Coefficient

A

When x and y are both continuous, their correlation is called the Pearson Product Moment Correlation Coefficient

Correlation statistics are standardized and can range from -1 to 1

It can be used as a measure of effect size

45
Q

The equation for Correlation is similar to z scores….

A

both a standardized statistic that can compare across samples

46
Q

Convention for effect size in correlation

A

Convention
.1 = small effect
.3 = medium effect
.5 = large effect

47
Q

Correlation and Causality

A

Correlation is a necessary but not sufficient criteria for causality

Possible directions of causality:

X → Y

X ← Y

A third factor leads to changes in both x and y
Correlation is by coincidence

48
Q

Things needed to determine causality

A

Temporal precedence

Demonstrating empirical association between variables

Control for confounds

49
Q

Types of Correlations

A

Pearson’s Correlation (r)

Spearman’s p (greek rho) (rs)

Kendall’s Tau (t)

Point Biserial Correlation (rpb)

Biserial Correlation (rb)

Phi Coefficient (φ)

50
Q

Pearson’s Correlation (r)

A

For analyzing relationship between two continuous variables

Assumes normality, homogeneity of variance, independence of data, ANDlinear relationship

51
Q

Spearman’s p (greek rho) (rs)

A

When one or both variables are ordinal

E.g. SAT score and high school class standing

Nonparametric alternative to pearson’s coefficient

Does not assume linear relationship

52
Q

Kendall’s Tau (t)

A

Better for smaller samples

Possible to be helpful in cases of ranks?

53
Q

Point Biserial Correlation (rpb)

A

One continuous and one dichotomous variable

Used for continuous and binary variable that is quantitatively dichotomous

54
Q

Biserial Correlation (rb)

A

One continuous and one dichotomous variable

When the variable is not truly quantitatively dichotomous, but treated as such

E.g. pass/fail class grades (categories based on continuum)

Median split → create groups

55
Q

Phi Coefficient (φ)

A

Two categorical variables

56
Q

Procedure in Experimental research to Rule Out Confounding Variables

A

Random assignment

57
Q

Coefficient of Determination, R2

A

By squaring the value of r, you get the proportion of variance in one variable shared by the other(s), R2

Can only take the value of 0-1, because it is a squared value (must be positive)

57
Q

Spurious Relationship

A

That the two variables have no causal connection, yet it may be inferred that they do, due to an unknown confounding factor

I.e. ice cream sales and death by drowning → third confounding is temperature outside

58
Q

Caveat for biserial and point biserial correlations

A

when group sizes for binary variable are unequal → correlation values can become smaller