Stats Exam 3 Flashcards

1
Q

Central Limit Theorem

A
  1. Random samples from a normally distributed population are normally distributed
  2. As n increases (> 30), random samples from skewed distributions become normally distributed
  3. The means of all sample means is the population mean. (also true for proportions)
  4. The standard deviation of normally distributed sample means and proportions are: ….
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What can Z scores tell you about sample means

A

Z scores can also tell us how far a sample mean is from the population mean, and therefore how likely or unlikely a given sample mean is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens to Z as n increases

A

Z approaches zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Central Limit Theorem also applies to…

A

CLT also applies to proportions which are used for categorical data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

3 forms of inference

A
  1. Point estimation
  2. Confidence intervals
  3. Hypothesis Testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Point Estimation

A

using a single values form a sample to estimate a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Confidence Intervals

A

(Interval Estimation)

  1. Using a range of values to estimate a parameter
  2. Stating our confidence that an interval captures a parameter
    * smaller interval/rang, less confidence
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Hypothesis Testing

A

using samples and probability to support or reject assumptions about population parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

sampling error

A

random sampling produces samples that aren’t exactly like the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

interval estimation

A

incorporates the likely size of the sampling error associated with the point estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

weakness of point estimation

A

without quantifying the likely among of estimation error, point estimates are of limited use (sampling error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Confidence Interval (CI)

A

a range of plausible values for a parameter in addition to the level of confidence that the parameter is included within the interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

2 components of CI

A
  1. Interval
  2. Confidence
    a range of values that is likely to include to u
    principle of “confidence” is the same in both scenarios
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

margin of error for means

A
-an absolute quantity 
Size of an interval (precision) 
partly chosen --> z score
partly natural --> sd 
partly experimentally determined --> n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confidence intervals downside

A
  1. created when we do NOT know the population mean
  2. Establish a range of values that “probably” includes the population mean
  3. how probable depends entirely on the choice of a z score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

steps to find z score?

A
  1. Find Z score for a … CI
  2. Calculate Standard Error
  3. Calculate Margin of Error
  4. Apply Margin of Error to Point Estimate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

hypotheses

A

claims or statement about population parameters (never about samples)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

null hypothesis

A

the no effect, no difference, nothing special difference

generally does not reflect the researchers belief

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

alternative hypotheses

A

Ha
3 possible forms:
1. a parameter is greater than some value Ha:u>#
2. a parameter is less than some values: Ha:u

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

one tailed Ha

A

can be supported by sample statistics from only one tail of a distribution (less/greater)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

two tailed Ha

A

can be supported by sample statistics from both tails (different)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

critical regions

A

tail regions of sampling distributions that contain unlikely values, that when observed lead us to reject Ho

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

critical values

A

specific standardized scores (like Z scores) that separate critical regions form the rest of the curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

alpha

A

=significance level

  1. a= the area under a normal curve with unlikely (extreme) observations, such that when observed, we reject the null hypothesis and support the Ha
  2. a= the acceptable rate of a type 1 error, mistakenly rejecting the null
  3. a & Ha determine the critical values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

One tailed Ha & alpha

A

C.V puts alpha in one tail

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Two tailed Ha & alpha

A

2 C.V.s that split alpha into 2 tails

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Hypothesis Testing Steps

A
  1. Write the Ha and Ho and statistical terms
  2. choose alpha and determine critical values
  3. calculate a test statistic. For tests with one sample, 3 choices
  4. Compare test statistic to a CV of calculate a p value
  5. State conclusions in context
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

p value

A

probability that the difference between the sample mean and the population mean occurred by chance alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

T scores for sample means

A
  • almost identical to Z scores (same assumptions)
  • used when we don’t know the population standard deviation
  • substitute the sample standard deviation into the standard error expression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

degrees of freedom

A

df=n-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

choose between t score and z score?

A

do z score it will be more accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Type 1 error

A

rejecting the null, when the null is actually true

  • occurs when we get an extreme test statistic by chance alone
  • p(type 1 error) = alpha
  • alpha is chosen in advance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Type 2 error

A

failing to reject the null, when the null is false

  • must be calculated
  • p(type 2 error)= beta
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Power

A

the ability to reject a false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Power calculations

A

are used to determine the sample size needed to reveal the smallest difference that is actually interesting between two hypothesized values of a parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

ANOVA

A

Analysis of Variance

-used to compare >2 sample means

37
Q

ANOVA hypotheses

A

Ho: the means all come from the same population
Ha: the means do not all come from the same population

38
Q

why not multiple t-test?

A

the total risk of a type 1 error for a group of related tests is more important than the type 1 error risk for any one test
–multiple tests increase the total risk of a type 1 error

39
Q

familywise error rate

A

the probabililty of observing at least one type 1 error for a group of related tests

40
Q

ANOVA Ho

A

differences in sample means are explained by random sampling variation within groups

41
Q

ANOVA Ha

A

differences in sample means are due in part to real variation between groups, because at least one group comes from a different population

42
Q

F test

A

variation between groups/ variation within groups

43
Q

F test statistic

A
  • a ratio of any two variances
  • F=1 means the variances are no different
  • F is not =1 means the variances are different
44
Q

If ANOVA Ho is true we usually see…

A

F= 1 or F>1

45
Q

If ANOVA Ho is false we usually see..

A

F>1 or F&raquo_space;»1

46
Q

How much bigger must F be for an ANOVA?

A
  1. ANOVA is always a right tailed test

2. Like for t statistic, different F distributions and critical values exist for different degrees of freedom

47
Q

F distribution degrees of freedom

A

numerator: k-1
denominator: N-k

48
Q

Assumptions for ANOVA to be valid

A
  1. Normal distributions

2. Sample standard deviations are roughly equal: Largest sd/ smallest sd

49
Q

SS(total) total sum of squares

A

is a measure of the total variation (around the grand mean) in all the sample data combined

50
Q

SS(model)

A

=SS(groups) the variation between sample means, weighted by sample size

51
Q

SS(within groups)

A

=SS(error) =SS(residuals)

variability common to all the populations being considered

52
Q

SS(total) =

A

SS(model) + SS(residuals)

53
Q

MS (something) =

A

Average variation. gives us a measure of relative variation allowing us to compare variation in different parts of a model

54
Q

Post hoc tests (multiple comparisons)

A

are mostly modified t tests that reduce type 2 error rate

55
Q

sampling variability

A

sample results change from sample to sample

56
Q

parameter

A

a number that describes the population

57
Q

statistic

A

a number that is computed from a sample

58
Q

statistical inference

A

inferring something about the population based on what is measured in the sample

59
Q

unbiased estimator

A

-x- can be an unbiased estimator for u and p^ can be an unbiased estimator for p if the distribution of sample means is exactly centered at the value of population mean

60
Q

census

A

sample the whole population

61
Q

margin of error (m)

A

represents the maximum estimation error for a given level of confidence

62
Q

statistical hypothesis testing

A

assessing evidence provided by the data in favor or against some claim about the population

63
Q

p value

A

reject Ho and accept Ha

results are statistically significant

64
Q

p value >0.05

A

can’t reject H0 and reject Ha

results are not statistically significant

65
Q

test statistic

A

a measure of how far the sample proportion is from the null value po, the values that the null hypothesis claims is the value of p

66
Q

two independent samples

A

comparing two means

67
Q

matched pairs

A

paired t test

samples are dependent

68
Q

1 way ANOVA

A

comparing more than two means that are independent of each other

69
Q

repeated measures ANOVA

A

comparing more than two means that are dependent on each other

70
Q

three types of t-tests

A

1-sample
2-independent samples
2-dependent samples

71
Q

independent samples

A

the individuals of one sample are not meaningfully connected to those of another sample
-two randomly selected groups

72
Q

matched pairs (dependent) samples

A

the observations of one sample are somehow paired or related to those of another sample
-pre/post -twins -parents &children

73
Q

homoscendasticity

A

assumes equal variance in our 2 samples

74
Q

2 scenarios for dependent sample

A
  1. repeated measures

2. matched pairs

75
Q

parameter

A

A characteristic of a population

76
Q

statistic

A

A characteristic of a sample

77
Q

sampling variability

A

Multiple samples taken from the same population will vary from each other due to chance events.

78
Q

sampling distribution

A

The shape, center and spread of the values taken by all of the samples of a certain size taken from a single population

79
Q

Proportion

A

A probability that a member from a population takes on a certain characteristic. Proportions are based on countable, categorical data = the number of individuals that display a characteristic / the total number of individuals.

80
Q

Standard Deviation of Sample Proportions (AKA standard error)

A

When sample proportions are normally distributed they will vary from the population proportion with a standard error

81
Q

Statistical Inference

A

The Process of Inferring something about a population based on something known about a sample

82
Q

Confidence

A

In the context of interval estimation, “confidence” is the probability that our interval actually captures the population parameter. Confidence is, unfortunately, inversely related to the size (precision) of our interval.

83
Q

Margin of Error

A

The maximum amount of error we give to a point estimate, and which is used to build a confidence interval. The margin of error size is influenced by two things: 1) our confidence, which in turn determines a z score, and 2) the sampling variation, which is determined by a standard error

84
Q

Estimating the sample size needed to create a specific CI

A

This is something researchers do in the planning stages of research and can be used as justification for a certain amount of money in a research proposal. Simply solve the margin of error equation for n. The researcher must also decide on a confidence (see above), choose an acceptable margin of error, and plug in a best guess standard error value taken from previous research, or in the case of proportions, .25 can be used as a conservative estimate of the p*(1-p). Note: Whenever n is not a whole number, round up

85
Q

statistical hypothesis testing

A

Assessing evidence provided by the data in favor of, or against some claim about the population.

86
Q

4 steps to hypothesis testing

A

1) Stating the claims
a. Claim 1 (the null hypothesis, Ho): The mean or proportion is equal to some value.
b. Claim 2 (alternate hypothesis, Ha): The mean or proportion is is less than, or is greater than, or is not equal to some value.
2) Choosing a sample and collecting data
3) Assessing the evidence. Calculating the probability of observing a sample statistics at least as extreme as the one observed, if the null hypothesis (claim 1) is true and the alternate hypothesis is false.
4) Making conclusions. Choosing whether to reject, or fail to reject the null hypothesis (claim 1), or to support, or fail to support the alternate claim (claim 2)

87
Q

assumptions of the independent sample t-test

A
  1. The 2 samples are independent
  2. The distribution of the response Y in both populations is normal
  3. Both samples are random
  4. The two populations being compared should have similar variances. If the sample sizes of the 2 groups are equal, the t-test is robust to the presence of unequal variances.
88
Q

Assumptions of the matched pairs t-test

A
  1. The sample data consist of matched pairs
  2. Both samples are simple random samples
  3. The number of matched pairs is > 30 and/or the pairs of the values have differences that are normally distributed.