Stat - Exam #3 Flashcards

1
Q

How do you make inferences on two DEPENDENT samples (paired samples)?

A
  • Need to convert the two population situation to a one population situation;
  • Take the difference between teh values of the two individuals in a pair and treat the mean of difference (d-bar) and the stat of choice
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the null or status quo of dependent samples?

A

d-bar = 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are dependent samples?

A
  • When the individuals selected from one sample influence which individuals are in the second sample;
  • Also called “matched-pair samples”
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are independent samples?

A

When the individuals selected for one sample do NOT influence which individuals are in the second group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the best stat for testing a paired sample?

A
  • the DIFFERENCE (d_i);
  • Subtract the value of one individual of the matched-pair from the value of the other individual in the matched-pair;
  • The mean (d-bar) and standard deviation (s_d) are calculated normally
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the Assumptions for DEPENDENT Samples?

A
  1. Random sample of matched pairs;
  2. Sample average of the difference data is normally distributed;
  3. Population standard deviations are NOT known (same for one-sample t-test)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the null hypothesis in a paired-sample hypothesis test?

A

The mean difference between the paired samples is zero (H_0: u_d=0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the confidence interval for a paired-sample?

A

Statistic +/- Critical Value;
-The Point-estimate is the sample average of the difference data (d-bar), the critical value is a t-value, and the standard error of the mean has usual form (S_d/(sqrt n))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When can you makes inference about two population means?

A

-When the individuals in the two samples are UNRELATED to each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you make inferences on two INDEPENDENT samples?

A
  • Calculate the mean of each population and treat the DIFFERENCE in the means as the stat of choice;
  • The status quo in this case is (u1-u2=0), meaning the mean of the two populations is the SAME
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is needed to conduct inferential stats on independent samples?

A

Sampling Distribution =
1. Mean and
2. Standard Deviation….
of the difference in the means of the samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the two possible situations for populations when the populations standard deviation is NOT known?

A

*Can ALWAYS calc the sample standard deviation;

  1. Equal population standard deviations;
    or
  2. Unequal population standard deviations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What test can be used to determine equality in variances, but is NOT recommended?

A

an “F-Test” can be used, but it is not robust to even account for small deviations from normality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the best estimate of the population standard deviation when the pop. standard deviations are EQUAL?

A
  • Pooling together the two sample standard deviations;

- a T-STAT using the pooled standard deviations , exactly follows t-distibution with (n-2) degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the best estimate of the population standard deviation when the pop. standard deviations are UNEQUAL?

A
  • an exact method of inference does NOT exist because you CANNOT determine degrees of freedom;
  • There is NO formula for a t-stat that follows the t-distribution;
  • But WELCH’S APPROX and SATTWERWAITE’S APPROX are close;
  • Both use the same formula with different degrees of freedom approximations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is Welch’s Approximation of Degrees of Freedom?

A
  • Take the SMALLER number of the observations (n1 or n2), then subtract 1 to determine the degrees of freedom for the t-test (t_0);
  • A conservative approximation;
  • Easy to use by hand
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Satterwaite’s Approximation of Degrees of Freedom?

A
  • A more exact approximation that uses an extensive equation;
  • Best used when calculated by machine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How many columns of data can ANOVA be used for?

A

-As many columns of data as needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the three possible hypotheses with 3 columns of data for a t-test?

A
  1. H0: u1=u2;
  2. H0:u1=u3
  3. H0:u2=u3
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the problem with running 3 independent t-tests?

A
  • Significance Level;
  • Becomes additive each time a t-test is run since the t-tests are essentially independent of one another;

alpha = .05+.05+.05 = .15;
*Chance of error increased 3x than preferred

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a significance level?

A

The chance of making an incorrect conclusion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How do you make inferences on 3 columns of data?

A
  • Use a stat test that compares two VARIANCES to each other, instead of two means to each other;
  • Called an F-TEST because an F-stat is calculated, and follows an F-distribution = critical values and P-values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is ANOVA?

A

A statistical method used to test whether the population means of three or more columns of data are equal to each other;

  • Uses the F-test to make decisions;
  • Requires only ONE hypothesis test so it controls the significance level at the level set no matter how many populations compared
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is One-Way ANOVA?

A

An ANOVA method with only one classification variable, called a FACTOR;
-The factor is DISCRETE data that can have as many levels as needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a FACTOR?

A

A classification variable used to separate data into several columns;
-EX: Age, gender, species, class, etc;

-LEVELS of a factor are the possible categories
-EX:
Factor = Classification
Levels= Freshman, Sophomore

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is an ANOVA Hypothesis test?

A

Conducts ONE hypothesis test to find out whether all three populations are EQUAL to each other;

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the alternative hypothesis test with ANOVA?

A
  • At least ONE population means is DIFFERENT from the others;
  • Cannot tell if only one population mean s different or if all are different
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What stat test does ANOVA use to test the hypothesis?

A

-F-test to find out which is closer to the truth

29
Q

What is an F-test?

A

A stat test to determine whether two VARIANCES are equal;

-Tests for the equality of two variances

30
Q

What are the hypotheses for an F-Test of ANOVA?

A

H0: sigma1 = sigma2;
H1: sigma1 >/= sigma2

31
Q

What is the F-stat?

A

-Determines which Hypothesis is closer to the truth and follows the f-distribution;

F0= (sigma1)^2 / (sigma2)^2

32
Q

What are the properties of an F-curve?

A
  1. The total area under the curve equals 1;
  2. The range starts at ZERO and goes to POSITIVE infinity;
  3. Is SKEWED to the RIGHT
33
Q

How many degrees of freedom are in the F-curves?

A

-Infinitely number of F-curves, but each F-curve has TWO degrees of freedom (F_alpha,df_n,df_d)

34
Q

What are the degrees of freedom of the F-curves?

A
  • *From the F-statistic =
    1. The first degree of freedom is the degrees of freedom of the variance in the NUMERATOR;
    2. The second degree of freedom is is the variance in the DENOMINATOR
35
Q

How will we used the F-curves?

A

-The computer will provide the P-value and use the P-value to make a conclusion

36
Q

What are the assumptions of ANOVA?

A
  1. Simple random samples;
  2. INDEPENDENT samples;
  3. NORMAL populations;
  4. EQUAL standard deviations
37
Q

Why is ANOVA a good method of stats?

A
  • Robust to moderate violations of assumptions;
  • Most robust when number of observations in each column is EQUAL = BALANCED sample;
  • Data do not have to distributed exactly normal;
  • Standard deviations do not have to be exactly equal as long as they are close by the “Rule of 2”
38
Q

What is the Rule of 2?

A

The ratio of the largest sample standard deviation to the smallest sample standard deviation is LESS than 2

**sigma_big/sigma_small </= 2

39
Q

What are the testing requirements for ANOVA?

A
  1. Normal shape = used KS-stat;

2. Equality of the population variances = Levene’s Stat

40
Q

How does ANOVA provide evidence of difference in sample means?

A

By dividing the variation of the means b the variation within within each sample;
-Bigger the number, the easier to see the means are different

**F(0) = (variation b/w means/variation w/n sample)

41
Q

What is the goal of the ANOVA method?

A
  1. an OVERALL test to see whether there is strong evidence of differences among the population means;
  2. a detailed follow-up analysis to decide which of the population means differ and to estimate how large the differences are (=POST HOC Tests)
42
Q

How do F-values determine the hypothesis test?

A
  • LITTLE F-values = DO NOT REJECT the null:

- LARGE F-values = REJECT the null

43
Q

How do you calculate the variation BETWEEN samples?

A
  • Uses the population variance between the population means to estimate the population variance (sigma^2);
  • Will NEVER know the true value of the population parameter, so must estimate a value for the population variance from the data by substituting the sample variance of the sample averages;
  • Calculate the variation between sample by treating the samples AVERAGES as data points and calculating their sample variance (similar to SUM of SQUARES) and multiply by the number of observations
44
Q

What is the formula for variation BETWEEN samples with ANOVA?

A

Variation betweens samples = (n_1)(S^2)

n_1 = number of observations;
S^2 (of x-bar) = sample variance of the sample averages

45
Q

How do you calculate the variation WITHIN samples?

A
  • One assumption of ANOVA is all populations have SAME pop. variance (sigma^2);
  • Combine all observations in populations and determine the variance of this ONE column of data = POOLED population variance;
  • Estimate if from a sample by using the POOLED SAMPLE VARIANCE;

**s^2_pooled = sigma^2

46
Q

What is the final equation for the variation WITHIN samples?

A

the F-stat!;

**F_0 = (ns^2_x-bar)/(s^2_pooled)

47
Q

How does the computer calculate an ANOVA?

A
  • Calculates several sum of squared terms, then divides by the appropriate degrees of freedom to get an average of the sum of squared terms;
  • Called the “sum of squares” and the “means squares”
48
Q

What is the “Mean Squares”?

A

-Used to calculated to calculate the F-stat

49
Q

What is the SST (total sum of squares)?

A
  • A measure of of the variation of the combined data around its sample average, called the GRAND MEAN (x-double bar);
  • How the computer measures the entire variation in a sample;
  • Breaks this variation into two components, sum of squares for the MODEL and sum of squares for the ERROR
50
Q

What is the formula for SS(total)?

A

SS(total) = SS(model) + SS(error)

51
Q

What is the SSM (sum of squares for the model)?

A
  • Estimate of the “variation between means” ;

- Sum of squares for the model is all so called the sum of squares for the TREATMENT (SSTr)

52
Q

What is the SSE (sum of squares of the error)?

A
  • Estimate of the “variation within the samples”;

- Sum of squares of the errors is the best measure of the population variance

53
Q

What is the SSE (sum of squares of the error)?

A
  • Estimate of the “variation within the samples”;

- Sum of squares of the errors is the best measure of the population variance

54
Q

What does adding more observations do to the sum of squares?

A
  • Only increases the sum of squares ;
  • *Need to normalize the variation in the observations by dividing by the DOF of each variation, and this gives TWO estimates of the population variance needed for the F-stat
55
Q

What is the total degrees of freedom?

A

(n-1);

the degrees of freedom for the MODEL and degrees of freedom for the ERROR add up to the TOTAL

56
Q

What is MSM?

A

*Mean Squares of Model;

  • DOF= (k-1)
  • Formula (SSM/(k-1))
57
Q

What is MSE?

A

*Mean Squares of Error;

  • DOF = (n-k) ;
  • Formula = SSE/(n-k)
58
Q

What is MST?

A

*Mean Squares of Total;

  • DOF = (n-1);
  • Formula = SST/(n-1)
59
Q

What does the mean squares give?

A

The info needed to calculate the F-STAT

**F_0 = (MSM/MSE)

MSM = variation in samples
MSE = variation within samples
60
Q

What does the mean squares give?

A

The info needed to calculate the F-STAT

**F_0 = (MSM/MSE)

MSM = variation in samples
MSE = variation within samples
61
Q

What is used to determine if all means are EQUAL through an ANOVA table?

A

*P-Value!!

62
Q

What are the hypotheses of ANOVA?

A

H0: u1 = u2 = u3;
H1: At least ONE population mean is different from the others

63
Q

How is a conclusion made with ANOVA?

A

A conclusion is made based on the value of the TEST STAT

64
Q

What does an F-value near 1 indicate?

A

*Small = mean that all pop. means are EQUAL

65
Q

What does an F-value far from 1 indicated?

A

*Large = means that the pop means are farther apart than would be considered reasonable due to sample variation

66
Q

What does an F-value far from 1 indicated?

A

*Large = means that the pop means are farther apart than would be considered reasonable due to sample variation

67
Q

What is a PREDICTOR Variable?

A

The variable that can be used to predict the value of the second variable;
—Can cause the second variable or be strongly related to the second variable;
— May come first temporally

68
Q

What is a RESPONSE Variable?

A

The variable whose value can be explained by a first variable;
— May come second temporally