Final Exam Flashcards

1
Q

ANOVA (what does it do?)

A

Analysis of variance: tests differences among the means of multiple groups

Compares variance among subject within groups (the error mean square, MSerror) to the variation among the sampled individuals in different groups (the group mean square, MSgroups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Test Stat for ANOVA

What is it under Ho? Ha?

A

Test Stat : the F-ratio. (F=MSgroup/MSerror)

Under Ho: F-ratio should be about 1, except by chance
- MSgroups = MSerror

Under Ha: F-ratio will exceed 1.
- MSgroups > MSerror

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Assumptions of ANOVA

A
  • Normal Distribution in each of the k populations
  • Random Sampling
  • Variance is the same is all k populations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

SStotal equation

A

Total sum of squares

SStotal = SSerror + SSgroups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

SSerror equation (check notes sheet)

A

Error sum of squares

the sum of ((the standard dev of group i ^ squared) x (number of observations in group i minus 1)

the sum of (si^squared)x(ni-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Grand mean

A

(Y-bar) The mean of all the data from all groups combined

Y-bar = Add up all the data points from all groups / number of data points (N)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Finding F-Ratio

A
  1. Find the grand mean
  2. Calculate SSgroups and SSerror
  3. Calculate SStotal
  4. Calculate MSgroups and MSerror
  5. Calculate F-ratio
  6. Use F-distribution table to find our critical value with our numerator df, denominator df, and alpha level.
  7. Compare and find out p-value
  8. Reject / Fail to Reject Ho
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

MSgroups

A

Group mean square: observed amount of variation among the subjects from all the group sample means (among)

MSgroups = SSgroups / dfgroups

dfgroups = k - 1

k=number of groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

MSerror

A

error mean square: variance among subjects that belong to the same group (within)

MSerror = SSerror / dferror

dferror = N - k

N = total number of data points in all groups
k = number of groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Robustness of ANOVA

A

Robust to deviations from normality assumption.

Robust to deviations from equal variance assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

*Kruskal-Wallis test

A

nonparametric method based on ranks, or analysis of variance based on ranks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Planned Comparison vs. Unplanned Comparison

A

Planned: a comparison between means planned during the design of the study, identified before the data are examined

Unplanned: a comparison of multiple comparisons, such as between all pairs of means, carried out to help determine where differences between means lie. (Data dredging)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Tukey-Kramer Test

A
  • Used to test all pairs of means to find out which groups stand apart from the others
  • Type of unplanned comparison
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Assumptions of Tukey-Kramer Test

A
  • Normal Distribution
  • Random Sampling
  • Equal variance

*Not as robust as ANOVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

fixed effects vs. random effects

A

Fixed: Testing an explanatory variable using ANOVA on fixed groups - studies on predetermined groups and of direct interest.

Random: Testing an explanatory variable using ANOVA applied to random groups. groups are randomly sampled from a population of possible groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ANOVA on Random Groups

A
  • Planned and Unplanned comparisons are not used
  • Instead we use variance components : the amount of the variance in the data that is among random groups (sigma-A^squared)and the amount that is within groups (sigma^squared)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Repeatability

A

The fraction of the summed variance that is present among groups

Repeatability = s-A^squared / (s-A^squared + MSerror)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

k (ANOVA)

A

number of groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In ANOVA, if Ho is false

A

We expect to see MSgroups be greater than MSerror, so the F-ratio is greater than 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In ANOVA if Ho is true

A

Then the F-ratio will be about 1, except by chance.

MSgroups and MSerror should be about even

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does the ANOVA table include?

A
  • Source of variation (groups, error, total)
  • Sum of squares (g, e, tot)
  • df (g, e, tot)
  • mean squares (g, e)
  • F ratio
  • p value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Yij

A

The jth individual in the ith group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Group mean

A

(Y-bar sub i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

SSgroups equation (check notes sheet)

A

sum of squares groups:

The sum of (number of observations in group i (group i mean - grand mean))^squared

25
Q

N (ANOVA)

A

total number of data points in all groups

26
Q

ni (ANOVA)

A

number of observations in group i

27
Q

df (groups)

A

k-1

k=number of groups

28
Q

df (error)

A

N-k

N= total number of data points
k=number of groups

29
Q

F crit (ANOVA)

A

F(alpha)(# of tails)(numerator (k-1)), (denom (N-k))

30
Q

R^squared (ANOVA)

A

the group portion of variation expressed as a fraction of the total

SSgroups/SStotal = R^squared

When R^squared is close to 0, group means are all very similar, most of the variation is within groups.

When R^squared is close to 1, most of the variation is explained by the explanatory variable.

31
Q

p-value

A

the probability of obtaining a test statistic as large as or larger than (as extreme as or more extreme than) the critical value under Ho

32
Q

What quantity would you use to describe the fraction of the variation in expression levels explained by group differences?

A

R squared

33
Q

Regression

A

the method used to predict values of one numerical variable from values of another.

34
Q

Linear Regression

A

Draws a straight line through the data to predict the response variable from the explanatory variable (One type of study design)

35
Q

Slope of regression line

A

indicates the rate of change

36
Q

What does linear regression do?

A

Measures aspects of the linear relationship between two numerical variables

37
Q

Difference between regression and correlation

A

Regression - fits a line through the data to predict one variable from another and to measure how steeply one variable changes with changes in the other.

correlation - measures strength of association between two variables, reflecting the amount of scatter in the data.

38
Q

Assumptions of Linear Regression

A

-The relationship between the two variables is linear

39
Q

“Best Line”

A

Has the smallest deviations in Y (vertical axis, response var) between the data points and the regression line

40
Q

Least squares regression line

A

the line for which the sum of all the squared deviations in Y is the smallest.

41
Q

Regression Line Equation

A

Y = a + bX

Y - response variable
a - the Y-intercept
b - slope of the regression line
if b is (+), then larger values of X predict larger values of Y
if b is (-), then larger values of X predict smaller values of Y.

42
Q

Slope of a linear regression

A

rate of change in Y per unit of X

43
Q

a and b
alpha and beta

(Linear regression)

A

a and b are sample estimates

alpha and beta are population parameters

44
Q

Predictions

A

points on the line that correspond to specific values of X.

the predicted value of Y from a regression line estimates the mean value of Y for all individuals having a given value of X.

~ Y-hat

45
Q

How to find predictions

A

Plug the X value into the equation to find the Y-hat

46
Q

Residuals

A

Measure the scatter of points above and below the least-squares regression line. Crucial for evaluating the fit of the line to the data.

47
Q

MSresiduals

A

Quantifies the spread of the scatter of points above and below the line.

48
Q

Confidence Bands

A

measure the precision of the predicted mean Y for each value of X

49
Q

Prediction intervals

A

measure the precision of the predicted single Y-value for each X.

50
Q

Extrapolation

A

Attemping to predict the Y value for X values beyond the range of the data

51
Q

Normal Quantile Plot

A

Compares each observation in the sample with its quantile expected from the standard normal distribution. Points should fall roughly along a straight line if the data come from a normal distribution.

52
Q

Alternatives when Assumptions are Violated

A
  1. Ignore the violation of assumptions
    - Work well for data comparing means when the normality assumption is violated, especially with a large sample size and violations are not too drastic
  2. Transform the data : effective often
  3. Use a non-parametric method
  4. Use a permutation test: Uses a computer to generate a null distribution for a test stat.
53
Q

Shapiro-Wilk Test

A

evaluates the goodness of fit of a normal distribution to a set of data randomly sampled from a population

54
Q

Robust

A

A statistical procedure is robust if the answer it gives is not sensitive to violations of the assumptions of the method

55
Q

Transformation

A

changes each measurement by the same mathematical formula.

56
Q

Log Transformation

A
  • Used for ratios or products of variables
  • used when freq dist is skewed to the right
  • used when the group with the larger mean also has the larger standard dev
  • used when the data span several orders of mag
57
Q

Arcsine Transformation

A

Used almost exclusively on data that are proportions

58
Q

Square root Transformation

A

Used on count data.