Exam 2 Flashcards

1
Q

Practices that lead to misleading graphs

A
  1. truncated graphs

2. improper scaling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a truncated graph? what precaution should be taken with them?

A

A graph where the vertical axis does not start at 0, that causes bars to be out of proportion. The illustrator should include a special symbol to signify that graph truncation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Where does improper scaling occur the most?

A

pictograms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Guidelines for constructing effective graphs

A
  1. Title and axes labels
  2. Start vertical axis at 0 if possible
  3. Use caution with figures and pictograms
  4. If variables differ greatly, consider another graph or plotting relative sizes
  5. Use simplicity and clarity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Parts of a graph analysis

A
  1. purpose of graph
  2. are results observational or experimentally obtained
  3. what variable is measured and is it quantitative or categorical
  4. what type of data display?
  5. Can SOCS be used to describe the data if it’s numerical
  6. Is data displayed correctly and is the graph misleading?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

explanatory variable

A

variable that is manipulated/experimented with

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

response variable

A

variable that measures the outcome of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

lurking variable

A

unobserved variable that influences the association between explanatory and response variables and is associated with both of those variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Designed experiment

A

An experiment where researchers impose treatments and controls. These can help establish causation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Observational study

A

A study where researchers observe characteristics and take measurements, these can only reveal association or correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Advantages of experiments

A
  1. Reduces chance of lurking variables affecting results
  2. Effect of an explanatory variable on a response variable is more accurately determined, it is easier to adjust for lurking variables
  3. best method for determining causality
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

sampling frame

A

a list of all members of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sampling design

A

method used to obtain a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

random sampling

A

employs a random device to select a sample, each member of a population has an equal chance of being selected for the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Simple random sample

A

(SRS) each possible sample of a given size has the same chance of being selected, can be done with or without replacement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the difference when SRS is performed with replacement vs. without replacement?

A

With replacement: a member of a population can be chosen more than once

Without replacement: a member of the population can only be selected once

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Margin of error

A

Gives a range of plausible values for the population parameter, helps you determine how accurate results are, denoted by E, represents precision at a confidence level, half the width/length of a confidence interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How to find the range of plausible values using a margin of error

A

Add and subtract the margin of error from the middle value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Approximate margin of error formula

A

1/(n)^(1/2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Potential sources of bias in surveys(just a list of the types not definitions)

A
  1. sampling bias
  2. nonresponse bias
  3. response bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Sampling bias

A

Bias that occurs in surveying when the sampling method does tends to obtain non-representative samples, including under coverage and overcoverage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Undercoverage

A

occurs when sampling frame does not represent parts of a population, some portion(s0 of the population are not sampled or get smaller representation than it has int he population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Overcoverage

A

Occurs when members that are not in the population of interest are included in the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Nonresponse bias

A

Bias that occurs in surveying when sampled subjects can’t be reached or refuse to participate, including when those who respond do not respond to certain questions resulting in missing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Response bias

A

Bias that occurs in surveying when the wording of a question is confusing, the question is asked in a misleading way, or subjects lie because they think their response is socially unacceptable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

LIST of poor ways to sample

A
  1. Convenience sample
  2. Volunteer sample
  3. Large, non-representative sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Convenience sampling

A

a poor method of sampling, includes individuals who are easy to sample and therefore, may not represent the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Volunteer sample

A

a poor method of sampling, most common type of convenience sample, difficult to define sampling frame, may not represent the population because people who volunteer tend to have stronger opinions about the issue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Large non-representative sample

A

a poor method of sampling, sample size doesn’t matter if it’s not representative of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Questions to be asked when assessing the validity of surveys

A
  1. How was the sample selected?
  2. Sample size?
  3. Nonresponse rates?
  4. How are the questions worded-how many, confusing, misleading, controversial?
  5. Who sponsored the study?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

treatment group

A

group that receives the treatment or experimental condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

placebo

A

a “fake” treatment that looks just like the treatment being tested, ensures that treatments appear the same to the subjects so that control subjects don’t know they are in the control group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

placebo effect

A

subjects treated with a subject sometimes improve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

single blind

A

subjects don’t know which groups they’re in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

double blind

A

subjects and data collectors don’t know which group the subjects are in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Perks of randomization

A

eliminates bias, balances the groups on variables that may affect the groups, both known and unknown by researchers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

statistically significant

A

when differences in an experiment are larger than the differences that result from randomization alone

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Four principles of good experimental design

A
  1. control
  2. randomization
  3. replication
  4. Blocking(optional)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

experimental units

A

people in the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Things that can go wrong in an experiment

A
  1. making generalizations out of convenience
  2. sample isn’t representative
  3. no volunteers
  4. carefully evaluate displays
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Systematic sampling characteristics

A
  1. Less expensive
  2. order of a list can not be associated in any way with the responses sought
  3. beware of confounding variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

When is cluster random sampling preferred?

A

when a reliable sampling frame is not available or when the cost of an SRS is too high

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Cluster random sampling sampling

A
  1. Split the population into representative, heterogenous groups called clusters
  2. Use random sampling to select several clusters
  3. Perform a census of each selected cluster
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Stratified random sampling

A
  1. stratify the population into homogenous groups
  2. SRS is used to choose members from each strata
  3. Combine the groups from each strata to form your sample
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Multistage sampling

A

sampling schemes combining several methods

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Types of observational studies

A

Retrospective observational studies: Look into the fast

Case-control study: a type of retrospective study, often used in medical research. Subjects who have a response outcome are referred to as cases and subjects who have the other response outcomes are referred to as cases

Prospective observational study: looks into future, aka cohort studies

Cross-sectional: sample survey of a cross section of a population in current time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Experimental design diagrams

A

enables a quick comparison of results, can use only number of groups for the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Purpose of matching and blocking

A

There are two ways researchers can balance the effects of potential lurking variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Matching

A

used in observational studies, attempts to achieve the balance that randomization achieves, subjects are paired due to similarites not being studied, includes case control studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Matched-pairs

A

used in experiments, subjects paired with themselves, each treatment is observed for each subject, pre test/ post test/ cross-over designs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Blocking

A

used in experiments, groups similar experimental units together, randomized, reduce potential bias, treatments are usually randomly assigned within a block

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What does a statistic describe?

A

a sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Statistical inference

A

uses sample data to draw conclusions about a population, involves probability calculations on a sampling distribution of a statistic, requires random sampling or randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

point estimate

A

single number, representing our best guess for the parameter, for any particular parameter, there are several possible point estimates depending on the sample selected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Interval estimate

A

a range of plausible values for the parameter, consists of a point estimate and margin of error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Properties of point estimates

A
  1. unbiased
  2. small standard deviation
  3. likely precision
  4. high confidence level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

standard error,

A

abbreviated SE, use of a statistic to compute the standard deivation of the sampling distribution, different for means and proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Z-scores for 0.90, 0.95 and 0.99 confidence levels

A
  1. 90: 1.645
  2. 95: 1.96
  3. 99: 2.576
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Steps for constructing a confidence interval for one population proportion

A
  1. Check assumptions
  2. Calculate confidence interval
  3. Interpret confidence interval
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

Confidence interval assumptions

A
  1. Data is obtained by randomization

2. Large enough sample sizeL at least 15 successes and failures

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

What determines length of a confidence interval?

A

the precision of the estimate(wider=less precise)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Relationship between confidence interval and precision

A

indirect

63
Q

when do we use a t-distribution?

A

when we estimate the population standard deviation with the sample standard deviation

64
Q

Total area of z and t distributions

A

1 for both

65
Q

Curve shape of z and t distributions

A

extends indefinitely in both directions and approaches the horizontal axis asymptotically for both

66
Q

distribution of z and t curves

A

z is normally distributed and t is not normally distributed

67
Q

Mean and standard deviation of z and t distributions

A

Mean=0 for both. standard deviation=1 for z and is greater than 1 for t

68
Q

What causes variation in values on z and t curves?

A

Z: caused solely by variation of sample means

t: variation of sample means and sample standard deviations

69
Q

How does sample size effect distribution in z and t curves?

A

Z: same distribution regardless of size

t: different distribution for each sample size, identified by degrees of freedom or n-1

70
Q

One mean t-interval assumptions

A
  1. Data obtained randomly

2. Normal population or large enough sample size

71
Q

What is useful about t-intervals in terms of assumptions?

A

it is robust to moderate violations of the normality assumption

72
Q

What may cause a t-interval to not work even though it is robust?

A

outliers since sample mean and sample standard deviation are both susceptible to outliers

73
Q

one mean t-interval procedure

A
  1. Check assumptions
  2. Calculate confidence interval
  3. Interpret CI in context
74
Q

How is an outlier handled in good statistical procedure

A

Inferential procedures are run with and without outliers

75
Q

What affect margin of error?

A

Margin of error is affected directly by standard error and indirectly by sample size, because sample size affects standard error

76
Q

sample size factors

A
  1. Margin of error
  2. Desired precision
  3. Confidence level
  4. Variability in the data
  5. Cost of obtaining a sample
77
Q

Sample size formula and how to use it when determining sample size for p

A

n=P(1-P)Z^2 / E^2

Use the probability you have. If you don’t have one, use 0.50 because it will give you the largest possible sample size so you will definitely have a large enough sample

78
Q

relationship between confidence interval and sample size

A

direct when precision is held constant

79
Q

Sample size formula for u

A

n=(standard deviation)^2(z)^2 / E^2

80
Q

limitation of t-procedures for one mean

A

Valid for any n, use caution for small sample sizes, does not work with outliers or highly skewed data

81
Q

limitation of CI for one proportion

A

requires at least 15 successes and failures, if sample size is too small, P will not be normal

82
Q

1-proportion plus-4 z-interval

A

CI formula is still valid if we use it after adding 2 to the original number of successes and failures, adding 4 total to the sample size. This moves the sample proportion towards 1/2, used with confidence levels over 90% and sample sizes over 10

83
Q

Things not to say about confidence intervals

A
  1. Suggest that a parameter varies
  2. Making claims about a sample proportion
  3. asserting that the population proportion cannot be outside your interval
  4. Overgeneralizing results
84
Q

CI flaws

A
  1. Not all confidence intervals will capture the true parameter value
  2. Whole interval is not treated equally
  3. Margin of error is not small enough to be useful
  4. assumptions are violated
85
Q

Bootstrap

A

Allow you to construct confidence intervals when: it is difficult to find the SE, and when the CI interval doesn’t work well, it is a simulation method where the population is viewed as many, many copies of the original sample(data distribution)

86
Q

Bootstrap method

A
  1. Resample with replacement the original sample to produce a bootstrap sample, compute point estimate of parameter
  2. Resample a very large number of sets of n observations from the original data distribution(at least 10,000) with point estimates for each of the samples
  3. Create a distributions of the point estimates to produce a bootstrap distribution
87
Q

Methods of using a bootstrap method to estimate a CI

A
  1. Standard error method

2. Percentile method

88
Q

Standard error method

A

estimate SE by using standard deviation of bootstrap distribution

89
Q

Percentile method

A

use a percentile of the middle area of the distribution to create a confidence interval(usually 95%)

90
Q

Significance tests

A

aka hypothesis test, uses sample data to decide between two competing claims about a population characteristic, uses probability to determine the plausability of a parameter, consider evidence based on sample data

91
Q

Two possible conclusions of a significance test

A

Reject Ho or fail to reject Ho

92
Q

Null hypothesis

A

Ho, always specifies a single value for the parameter, it is a claim about a population parameter that is initially assumed to be true

93
Q

Alternative hypothesis

A

Ha, depends on purpose of hypothesis test, number appearing in alternative hypothesis is identical to number appearing in null hypothesis,

94
Q

Two-Tailed test

A

Test to determine whether a population proportion is difference from a specified value

95
Q

left tailed test

A

test to determine whether a population proportion is less than a specified value

96
Q

right tailed test

A

test to determine whether a population proportion is greater than specified value

97
Q

Assumptions/conditions of CLT

A
  1. representative sample
  2. independent sample values
  3. sample size if sufficiently large
  4. We have sampled less than 10% of the population
98
Q

steps of a significance test

A
  1. Assumptions
  2. Define parameter and hypotheses before gathering or looking at data
  3. Calculate a test statistic
  4. Obtain a p-value
  5. State a conclusion in context
99
Q

steps of a significance test

A
  1. check assumptions
  2. hypotheses/significance level
  3. compute test stat
  4. Find the p-value
  5. State the conclusions: report test stat and p-value, interpret results in context
100
Q

Other names for a significance test for a one population proportion

A

one-proportion z-test, one-sample z-test for a population proportion

101
Q

Assumptions for a one population proportion significance test

A
  1. Variable is categorical
  2. data is obtained randomly
  3. sample size is sufficiently large
102
Q

P-value definition

A

probability of obtaining values that are more extreme than the observed test statistic value if the null hypothesis is true

103
Q

other names for a significance test for one population mean

A

one-mean t-test, one-sample t-test for a population mean

104
Q

Assumptions for a significance test for one population mean

A
  1. Variable is quantitative
  2. data is obtained randomly
  3. Population distribution is approximately normal according to CLT
105
Q

What does a p-value that is greater or less than the significance level indicate?

A

P-value less than significance level: unsual result, reject null hypothesis

P-value greater than significance level: sample data is not unsual, fail to reject null

106
Q

Relationships between conclusions of two-sided significance tests and CI’s

A

they are consistent

107
Q

What must be true of a two-sided test says you can reflect the hypothesis that = 0?

A

then 0 is not in the corresponding confidence interval

108
Q

What must be true of the p-value is >0.05 in a two-sided test?

A

a 95% confidence interval will contain the H0 value

109
Q

What must be true of the p-value is <0.05 in a two-sided test?

A

a 95% confidence interval will not contain the H0 value

110
Q

In what situation will a confidence interval definitely not contain H0?

A

i the p-value is less than the significance level in a two-sided test, a (1-a)x100 CI does not contain the H0 value

111
Q

with what samples will a one-mean t-test work best?

A

large samples

112
Q

What can cause a one-mean t-test to not work?

A

Skewed distributions and outliers

113
Q

relationship between sample size and p-value

A

indirect

114
Q

What causes decisions in significance tests to have uncertainty?

A

sampling variability

115
Q

4 possible results in the decision of a significance test?

A
  1. do not reject true H0
  2. Reject true H0
  3. Do not reject false H0
  4. Reject false H0
116
Q

Type 1 error

A

error when a true H0 is rejected

117
Q

Type 2 error

A

error when a false H0 is not rejected

118
Q

What is the only sure way to prevent Type 1 and 2 errors?

A

a census

119
Q

What is the biggest issue with type 1 and 2 errors?

A

you have no way of knowing you committed one before consequences are experienced

120
Q

What does the alpha symbol stand for? what rule comes with it?

A

a=the probability of committing a type 1 error, the significance level of a hypothesis test

the significance level must be set before running the test

121
Q

What does the Beta symbol stand for? What are some of its characteristics?

A

B=probability of committing a type 2 error

this value is usually not known, has an inverse relationship with alpha

122
Q

What determines the reasoning behind setting a significance level?

A

trying to balance the risk of committing a type 1 or type 2 error, which consequences are more serious is taken into account

123
Q

Is a CI or significance test more informative? why or why not?

A

A CI is more informative because because it displays the entire set of plausible values, while a significance test just tells you whether a specific value for H0 is plausible

124
Q

What must you always do after a null hypothesis is rejected?

A

use a CI to estimate the value of the parameter, then compare the hypothesized value to the CI values to determine how far the parameter is from the hypothesized value

125
Q

What if the parameter value is very close to the hypothesized value of significance test?

A

the result may not be practically significant

126
Q

Two proportion z-tes5/z-interval assumptions

A
  1. Variables are categorical
  2. Independent random samples from both groups
  3. 10 successes and failures
127
Q

What if 0 is in the CI for proportion intervals?

A

it is plausible that P1=P2

128
Q

What if 0 is not int he CI for proportion intervals?

A

If all values in the CI are positive, P1>P2. If all values in the CI are negative, P1

129
Q

What does the magnitude of a CI tell you? What do small magnitudes indicate?

A

how large the true difference is, small magnitude equals a small difference in practical terms

130
Q

Tests for comparing two proportions with a categorical response

A
  1. Two-proportion z-test

2. Two-proportion z-interval

131
Q

Tests for comparing two means with a quantitative response

A
  1. Independent samples t-test

2. independent samples t-interval

132
Q

independent samples t-test/t-interval assumptions

A
  1. Quantitative response variable for both groups
  2. independent random samples
  3. Normal or large enough samples, no outliers
133
Q

Pooled t-test assumptions

A
  1. Quantitative response variable
  2. independent random samples
  3. normal population or large enough sample
  4. equal standard deviations
134
Q

Pooled t-interval assumptions

A
  1. quantitative response variable for two groups
  2. independent random samples
  3. normal or large enough distribution
  4. standard deviations are equal
135
Q

What purpose do BOTH pooled and non-pooled t-tests share?

A

both tests are used to compare the means of 2 populations based on independent random samples

136
Q

What is the purpose of using a pooled t-test? what is its drawback?

A

increases statistical power, decrease chance of type 2 error, increases chance of type 1 error when used when standards deviations are not equal

137
Q

simple random paired sample

A

each possible paired sample is equally likely to be selected

138
Q

Purpose of using dependent samples

A

used when members of 2 populations have natural pairings, remove extraneous sources of variation, most likely to detect difference between population means when such difference exist

139
Q

Paired t-test/t-interval+assumptions

A

d=difference
mean=u1-u2 difference between two population means

Assumptions

  1. Simple random paired sample
  2. Normal distribution of differences or large enough sample for what is believed to be the shape of the distribution of the differences
140
Q

assumptions for just a confidence interval

A
  1. Sample is randomly obtained

2. Large enough sample: At least 15 successes and 15 failures

141
Q

F distribution

A

infinitely many F distributions identified by its number of degrees freedom, has a numerator and denominator degree of freedom, total area under curve=1, curve is right skewed

142
Q

What does ANOVA stand for?

A

Analysis of variance

143
Q

How does ANOVA work?

A

ANOVA compares means of a variable for populations from a classification by a categorical explanatory variable(factor) and level(possible values of a factor)

144
Q

What variation does ANOVA compare?

A

It compares variation between samples to variation within samples

145
Q

What information can you get from comparing “between: and “within” variation?

A

If “between” > “within” , not all population means are equal

146
Q

total sum of squares

A

measures total variation, can be partitioned into between(SSTR) and within (SSE) samples. SST=SSTR+SSE

147
Q

Assumptions for one way ANOVA

A
  1. data is randomly obtained
  2. independent samples
  3. normal or large enough samples(robust)
  4. Equal population standard deviations
    - Largest/smallest<2
    - robust provided sample sizes are equal
148
Q

How to find F, df1 and df2 in ANOVA analysis

A

F=MSTR/MSE
df1=g-1
df2=N-g

149
Q

post hoc analyses

A

Analyses conducted after the initial analysis to see which means actually differ, compare each mean as a pair using their confidence intervals

150
Q

Why is random sampling important?

A

it ensures samples are independent

151
Q

What sample size is large enough to ensure normality?

A

30

152
Q

how to determine which confidence interval to use for a one sided significance test?

A

1-2(significance level)

153
Q

How do we know which confidence interval to use for a two sided test?

A

1-significance level=CI

154
Q

Difference between lurking and confounding variables

A

lurking variables are not accounted for in the study, confounding variables are