Exam 3 Flashcards

1
Q

What is important to remember about the sampling distribution of means?

A

A population with a normal distribution has a distribution of sample means that is normal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What statistic can you use to test a distribution of sample means? (first one)

A
  • Z standardization, however you rarely know the population standard deviation so substitute and use a students t-distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe the student’s t distribution.

A
  • similar to standard normal distribution (z) but with fatter tails
  • as the sample size increases, the t distribution becomes more like the standard normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What can the t-distribution be used for?

A
  • It can be used to accurately calculate a confidence interval for the mean of a population with a normal distribution
  • (population mean) - (Tcritical value x SE(standard error of mean) )< (actual mean) < (population mean) + (Tcritical value x SE(standard error of mean) )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the standard error of the mean?

A

SE (y) = s / sqrt(n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a one sample t-test?

A

compares the mean of a random sample from a normal population with the population mean proposed in a null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the hypotheses and test statistic in a one sample t test?

A

H0 - true mean equals u0
Ha - true mean does not equal u0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you interpret the t-statistic in a one sample t-test?

A
  • compute the p value: probability of this t-statistic or more extreme given the null hypothesis is true
  • if p value is >.05 then you fail to reject the null hypothesis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does increasing sample size affect a one sample t test?

A
  • increasing sample size reduces the standard error of the mean
  • increase the probability of rejecting a false null hypothesis (power)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the assumptions of a one-sample t-test?

A
  • data are a random sample from the population
  • variable is normally distributed in the population (robust to departures)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the confidence intervals for variance and standard deviation? And what are the assumptions of these statistics?

A

Assumptions: random sample from the population, variable must have a normal distribution (formulas are NOT robust to departures from normality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are two different study designs when comparing two means?

A

two-sample and paired designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a two sample design?

A
  • two groups
  • each group is composed of independent sample of units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a paired designs?

A
  • two groups
  • each sampled unit receives both treatments
  • paired designs are usually more powerful because of control for variation among sampling units
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How are paired designs treated?

A
  • paired measurements are converted to a single measurement by taking the difference between them
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a paired t-test?

A
  • used to test the null hypothesis that the mean difference of paired measurements equals a specific value
  • null is often that the difference (change) is zero before and after treatment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How does a paired t-test compare to a one sample t-test?

A
  • The same process except the calculation of the test statistic occurs on the difference value (d)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What does the p-value indicate in a paired t-test statistic?

A
  • P-value >0.05
  • Fail to reject the null hypothesis that the mean change is zero
  • P-value <0.05
  • reject the null hypothesis that the mean change is zero
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the assumptions of a paired t-test?

A
  • sampling units are randomly sampled from the population
  • paired differences have a normal distribution in the population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What test is a formal test of normality?

A

the shapiro-wilk test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What are the hypotheses of the shapiro-wilk test? Why should it be used with caution?

A
  • H0 = sample has normal distribution
  • Ha = sample does not have normal distribution
  • Should be used with caution:
  • small sample sizes lack power to reject a false null (Type 2 error)
  • large sample sizes can reject null when the departure from normality is minimal and would not affect methods that assume normality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Under the null hypothesis, the sampling distribution of the one-sample t statistic follows a _________

A

t distribution with n-1 DOF

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Describe the t distribution.

A

-The area under the curve to the left (lower tail) of -t is the same as the area to the right (upper tail) of t
- t distribution is symmetrical around the mode of zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does a Shapiro Wilk test evaluate?

A
  • evaluates the goodness of fit of a normal distribution to a set of data randomly sampled from a population
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is something other than a t-test that you may be asked to do with paired measurements?

A
  • Can calculate the 95% CI for true mean difference
  • If the range includes zero (no difference)
  • other options: may be consistent with a decrease or increase
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you find the confidence interval for difference between two means (two sample t-test)?

A
  • statistic of interest (mean 1 - mean 2)
  • calculate pooled sample variance
  • then calculate confidence interval for difference between two means
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is pooled sample variance and what is it used for?

A
  • the averaged of the variances of the samples weighted by their degrees of freedom
  • used for calculating confidence interval for the difference between means in two sample test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How do you compare the means in a two sample test?

A

a two sample t-test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does a two sample t-test do? What are the null and alternative hypotheses?

A
  • compares the means of a numerical variable with two independent groups
  • H0: mean 1 = mean 2
  • Ha: mean 1 does not equal mean 2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are the assumptions of a two sample t-test?

A
  • each of the two samples is a random sample from its population
  • numerical variable is normally distributed in each population
    (robust to minor deviations)
    -standard deviation and variance of the numerical variable is the same in both populations (robust to some deviations if the sample sizes are approximately equal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the formal test of equal variance? What are the hypotheses?

A
  • levenes test
    H0: variances are equal
    Ha: variances of the two groups are not equal
    Can be extended to more than two groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What do you do to compare the means in a two sample t-test if the variances in the two groups are not equal?

A
  • standard t-test works well if both sample sizes are greater than 30 and there is a less than 3 fold difference in standard deviations
  • Welch’s t-test can be used even when the variances of the two groups are not equal - slightly less power compared to the standard t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is Welch’s t-test?

A
  • Welch’s t-test compares the means of two groups and can be used even when the variances of the two groups are not equal
  • slightly less power compared to the standard t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is important with units when comparing means of two groups?

A
  • correct sampling units
  • when comparing means of two groups an assumption is that the samples being analyzed are random samples, but often repeated measurements are taken on each sampling unit
  • fish in streams example, proportion of fish surviving in each stream
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the fallacy of indirect comparison?

A
  • compare each group mean to hypothesized value rather than comparing group means to each other
  • since group 1 is significantly different than zero, but group 2 is not then groups 1 and 2 are significantly different from each other
  • comparisons between two groups should be make directly, not indirectly by comparing group to the same hypothesized value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the four potential options to address violations of assumptions?

A
  • Ignore the violations
  • Transform the data
  • Use a non-parametric method
  • Use a permutation test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is a plot used to detect deviations from normality?

A

normal quantile plot
- may also used frequency distribution histograms, box plots, and strip plots - but sample distribution may look noisy due to sampling error

38
Q

What is a normal quantile plot?

A

compares each observation in the sample with its quantile expected from the standard normal distribution. Points fall roughly along a straight line if the data come from a normal distribution

39
Q

When can you ignore violations of normality for t-tests? When should you not?

A
  • When sample sizes are large the sampling distribution of means behaves roughly as assumed by t-distribution
  • Large sample size depends on the shape of the distribution:
    —If distribution of two groups being compared are skewed in different directions, then avoid t-test even for large samples
    —If distributions are similarly skewed then there is more leeway
40
Q

When can you ignore violations of equal variance/sd for two sample t-tests?

A

Two-sample t-tests assume equal standard deviations in the two populations
- If samples sizes are > 30 in each group and sample sizes in two groups are even, then even up to a 3x difference in standard deviation can be ok
- Otherwise use Welch’s t-test

41
Q

What is important to know about data transformations?

A
  • A data transformation changes each measurement by the same mathematical formula
  • Can make standard deviations more similar and improve the fit of the normal distribution to the data
42
Q

What are common uses of log transformations?

A

o Measurements are ratios or products
o Frequency distribution skewed to the right
o Group having larger mean also has larger standard deviation
o Data span several orders of magnitude

43
Q

What are five other data transformations and their uses?

A
  • Arcsine (best use: proportions)
  • Square-root (counts)
  • Square (skewed left)
  • Antilog (skewed left)
  • Reciprocal (skewed right)
44
Q

What are non-parametric alternatives to t-tests?

A
  • A nonparametric method makes fewer assumptions than standard parametric methods do about the distributions of the variables
  • Can be used when deviations from normality should not be ignored, and sample remains non-normal even after transformation
  • Don’t rely on parametric statistics like mean, sd, s^2
  • Usually base don the ranks of the data points rather than the actual values
45
Q

What are three non-parametric tests?

A
  • sign test (paired/one sample)
  • wilcoxian signed-rank test (paired)
  • Mann-Whitney U-Test (two-groups)
46
Q

What is a sign test? When is it used? How does it work? What are the drawbacks? What must be remembered?

A
  • The sign test compares the median of a sample to a constant specified in the null hypothesis. It makes no assumptions about the distribution of the measurement of the population
  • Each measurement is characterized as above (+) or below (-) the null hypothesis
  • If the null is true, then you expect half the measurements to be + and half to be –
  • Uses binomial distribution to test if the proportion of measurements above the null hypothesis is p=0.5
  • Can be replacement of one-sample or paired t-test
  • Low power compared to t-tests: Impossible to reject null when the sample size is <5
  • Typically you remove data points that indicate that the value is exactly null (difference of zero), this reduces n
47
Q

What is a Wilcoxian signed rank test?

A
  • Non-parametric sign test for paired/one sample t-test
  • More power than standard sign test because information about the magnitude away from the null for each data point
  • But test assumes that population is symmetric around the median (i.e., no skew)
  • Nearly as restrictive as normality assumption, thus NOT recommended
48
Q

What is the nonparametric test for two samples?

A

Mann-Whitney U-Test

49
Q

What is the Mann-Whitney U-test?

A
  • Nonparametric test for two samples
  • The Mann-Whitney U-Test compares the distribution of two groups. It does not require as many assumptions as the two sample t test
    H0: the means are the same
    HA: the means are different
    Functions through ranking of sums
  • Use Mann-Whitney U-distribution to calculate p-value
50
Q

What happens in Mann-Whitney U-test if you have tied ranks?

A

Assign all instances of the same measurement the average of the ranks that the tied points would have received

51
Q

What are the assumptions of nonparametric tests? Assumption for all? Wilcoxian? Mann-Whitney U?

A
  • Still assume that both samples are random samples from their populations
  • Wilcoxian signed-rank test assumes distributions are symmetrical (big limitation – not recommended)
  • Rejecting null hypothesis of Mann-Whitney U-test means two groups have different distributions of ranks, but does not necessary imply that means or medians of groups differ
  • To make this inference there is an assumption that the shapes of the distributions are similar
52
Q

What are the assumptions of a paired t-test?

A
  • Sampling units are randomly sampled from the population
  • Paired differences have a normal distribution in the population
53
Q

What is the problem with comparing multiple means at once?

A
  • It is tempting to do all possible pairwise comparisons
  • But the problem is that running multiple tests inflates the probability of getting at least one Type I error
54
Q

What does ANOVA do?

A
  • Analysis of Variance (ANOVA) compares the means of multiple groups simultaneously in a single analysis
  • Tests for variation of means among groups
  • Anova determines if there is more variance among sample means than we would expect by sampling error alone
55
Q

What are the hypotheses of ANOVA?

A

o H0: mean 1 = mean 2 = mean 3
o Ha: mean of at least one group is different from at least one other group
- Null assumption that all groups have the same true mean is equivalent to saying that each group sample is drawn from the same population

56
Q

What are anova’s two measures of variation?

A

Group mean square and error mean square

57
Q

What is group mean square? MSgroups

A
  • proportional to the observed amount of variance among group sample mean
  • variation among groups
58
Q

What is error mean square? MSerror

A
  • estimates the variance among subject that belong to each group
  • variation within groups
59
Q

What is the specific null and false null for anova?

A
60
Q

What are the two sums of squares in anova?

A
  • SSgroups= calculates sources of variation among groups
  • SSerror= calculates sources of variation within groups
61
Q

What is the F-ratio test statistic in anova? DOF?

A
  • F-statistic increases as the differences among sample means for treatment groups increases
  • The F statistic has a pair of degrees of freedom
  • Numerator: k -1
  • Denominator: N – k
62
Q

What is R^2 for anova?

A

measures the fraction of variation in Y that is explained by group differences

63
Q

What are the assumptions of Anova tests?

A
  • Measurements in every group represent a random sample from the corresponding population
  • Variable is normally distributed in each of the populations
    o Robust to deviations, particularly when sample size is large
  • Variance is the same in all k populations
    o Robust to departures if sample sizes are large and balanced, and no more than 10x differences among groups
64
Q

What is a nonparametric alternative to an ANOVA?

A

Kruskal-Wallis test

65
Q

What is a planned comparison?

A
  • A planned comparison is a comparison between means planned during the design of the study, identified before the data are examined
  • In circadian clock follow-up study, the planned (a priori) comparison was difference in means between knee and control group
  • Use SE and CI calculations for planned comparisons
66
Q

What is an unplanned comparison? What test is used?

A
  • Comparisons are unplanned if you test for differences among all means
  • Problem of multiple tests (increasing probability of Type I error) should be accounted for
  • With the Tukey-Kramer method the probability of making at least one Type I error throughout the course of testing all pairs of means is no greater than the significance level alpha
  • An unplanned comparison analysis applies to all mean comparisons among groups. The analysis is typically done with the Tukey-Kramer method, in which it is harder to reject the null hypothesis compared to a two-sample t-test.
67
Q

How does the Tukey-Kramer method function?

A
  • Works like a series of two-sample t-tests, but with a higher critical value to limit the Type 1 error rate
  • Because multiple tests are done, the adjustment makes it harder to reject the null
68
Q

What is the Kruskal-Wallis post-hoc test?

A

-Suppose your data:
o Fail normality even after transformation
o Generate a significant Kruskal-Wallis result
-So the interpretation is that the distribution of ranks differs for at least one group. But which one?
-Should not use Tukey-Kramer, which is a parametric test
- Dunn’s test is the appropriate analysis for a post-hoc analysis of groups following a significant Kruskal-Wallis result: Will compare all possible pairs of groups while controlling for multiple tests

69
Q

Which sums of squares ?

A

total

70
Q

Which sums of squares?

A

error

71
Q

Which sums of squares?

A

group

72
Q

What is correlation?

A

When two numerical variables are associated then they are correlated

73
Q

What is the correlation coefficient? What does it measure? Range? What does it depend on?

A
  • The correlation coefficient measures the strength and direction of the association between two numerical variables
  • Ranges from -1 to 1
  • Possible that two variables can be strongly associated but have no correlation (r=0) – example is a non-linear association
  • Correlation coefficient depends on range
74
Q

What are the two different correlation coefficients?

A

Correlation coefficient (statistic) r
Population correlation coefficient (parameter) p

75
Q

What is important about the standard error of the correlation coefficient?

A
  • Standard error of correlation coefficient
  • Can be calculated, but the sampling distribution of r is not normally distributed, so standard error of r is not used in calculating the 95% CI
76
Q

What is the process of hypothesis testing for correlation?

A
  • H0: p = 0
  • HA: p does not equal 0
  • Use standard error and get a test statistic, then use students t distribution
77
Q

What are the assumptions of hypothesis testing for correlation?

A
  • random sample from the population
  • bivariate normal distribution:
    bell shaped in two dimensions rather than one
78
Q

What are three deviations from bivariate normality?

A
  • funnel
  • outlier
  • non linear
79
Q

What is regression?

A
  • Regression is a method that predict values of one numerical variable from values of another numerical variable
  • Fits a line through the data
    o Used for prediction
    o Measures how steeply one variable changes with the other
80
Q

How does linear regression function?

A

Draws a straight line through the data to predict the response variable (y, vertical axis) from the explanatory variable (x, horizontal axis)

81
Q

What is the least-squared regression line?

A
  • line for which the sum of all the squared deviations in Y is smallest
  • Y = a +bx
  • A is the Y intercept; b is the slope
  • The slope of a linear regression is the rate of change in y per unit x
  • Once slope is calculated, getting intercept is straightforward because the least-squares regression always goes through the point (mean X, mean Y)
82
Q

Samples vs populations least squared regression line. What does regression assume?

A
  • The slope (b) and intercept (a) are estimated from a sample of measurements, hence these are estimates/statistics
  • The true population slope (beta) and intercept (alpha) are parameters
  • Regression assumes that there is a population for every value of X, and the mean Y for each of these populations lies on the regression line
83
Q

When predicting with regression line what are you predicting?

A
  • Predictions are mean Y for all individuals with value X
  • Designated Y-hat
84
Q

What are residuals? What measures their scatter?

A
  • The residual of a point is the difference between its measured Y value and value of Y predicted by the regression line
  • Can be positive or negative
  • Variance in residuals (MSresidual) quantifies the spread of the scatter
    —Residual mean square
    —Analogous to error mean square in Anova
    —Used to quantify the uncertainty of the slope
85
Q

How do you calculate the error in slope of a regression line?

A
  • Standard error of slope – uncertainty (precision) with the sample estimate b of the population slope beta
  • Can be used to calculate confidence interval of slope
86
Q

What are two types of predictions you can make with a linear regression line? Which is more precise?

A

Predict mean Y for a given X
o Confidence bands measure the precision of the predicted mean Y for each value of X
o What is the mean age of all male lions whose noses are 60% black?
Predict single Y for a given X
o Prediction intervals measure the precision of the predicted single Y-values for each X
o How old is that lion over there with a 60% black nose
Both predictions give the same value of Y, but they differ in precision
o Can predict mean with more certainty that a single value

87
Q

What should regressions be used for?

A

interpolation, not extrapolation

88
Q

How do you perform hypothesis testing on a linear regression slope?

A
  • H0: age cannot be predicted by proportion of black on nose (beta = 0)
  • HA: age can be predicted by proportion of black on nose (beta does not equal 0)
89
Q

What is R^2 for linear regression?

A

the fraction of variation in Y that is explained by x

90
Q

What is regression towards the mean?

A

results when two variables measured on a sample of individuals have a correlation
less than one. Individuals that are far from the mean for one of the measurements will, on average, lie closer to the mean for the other measurement

91
Q

What are the assumptions of linear regression?

A

At each value of X:
* there is a population of Y-values whose mean lies on the regression line
* the distribution of possible Y-values is normal (with same variance)
* The variance of Y-values is the same at all values of X
* the Y-measurements represent a random sample from the possible Y- values