Exam 2 Flashcards

Question 1

Q

Practices that lead to misleading graphs

Answer

A

truncated graphs

2. improper scaling

Question 2

Q

What is a truncated graph? what precaution should be taken with them?

Answer

A

A graph where the vertical axis does not start at 0, that causes bars to be out of proportion. The illustrator should include a special symbol to signify that graph truncation

Question 3

Q

Where does improper scaling occur the most?

Answer

A

pictograms

Question 4

Q

Guidelines for constructing effective graphs

Answer

A

Title and axes labels
Start vertical axis at 0 if possible
Use caution with figures and pictograms
If variables differ greatly, consider another graph or plotting relative sizes
Use simplicity and clarity

Question 5

Q

Parts of a graph analysis

Answer

A

purpose of graph
are results observational or experimentally obtained
what variable is measured and is it quantitative or categorical
what type of data display?
Can SOCS be used to describe the data if it’s numerical
Is data displayed correctly and is the graph misleading?

Question 6

Q

explanatory variable

Answer

A

variable that is manipulated/experimented with

Question 7

Q

response variable

Answer

A

variable that measures the outcome of interest

Question 8

Q

lurking variable

Answer

A

unobserved variable that influences the association between explanatory and response variables and is associated with both of those variables

Question 9

Q

Designed experiment

Answer

A

An experiment where researchers impose treatments and controls. These can help establish causation

Question 10

Q

Observational study

Answer

A

A study where researchers observe characteristics and take measurements, these can only reveal association or correlation

Question 11

Q

Advantages of experiments

Answer

A

Reduces chance of lurking variables affecting results
Effect of an explanatory variable on a response variable is more accurately determined, it is easier to adjust for lurking variables
best method for determining causality

Question 12

Q

sampling frame

Answer

A

a list of all members of a population

Question 13

Q

sampling design

Answer

A

method used to obtain a sample

Question 14

Q

random sampling

Answer

A

employs a random device to select a sample, each member of a population has an equal chance of being selected for the sample

Question 15

Q

Simple random sample

Answer

A

(SRS) each possible sample of a given size has the same chance of being selected, can be done with or without replacement.

Question 16

Q

What is the difference when SRS is performed with replacement vs. without replacement?

Answer

A

With replacement: a member of a population can be chosen more than once

Without replacement: a member of the population can only be selected once

Question 17

Q

Margin of error

Answer

A

Gives a range of plausible values for the population parameter, helps you determine how accurate results are, denoted by E, represents precision at a confidence level, half the width/length of a confidence interval

Question 18

Q

How to find the range of plausible values using a margin of error

Answer

A

Add and subtract the margin of error from the middle value

Question 19

Q

Approximate margin of error formula

Answer

A

1/(n)^(1/2)

Question 20

Q

Potential sources of bias in surveys(just a list of the types not definitions)

Answer

A

sampling bias
nonresponse bias
response bias

Question 21

Q

Sampling bias

Answer

A

Bias that occurs in surveying when the sampling method does tends to obtain non-representative samples, including under coverage and overcoverage

Question 22

Q

Undercoverage

Answer

A

occurs when sampling frame does not represent parts of a population, some portion(s0 of the population are not sampled or get smaller representation than it has int he population

Question 23

Q

Overcoverage

Answer

A

Occurs when members that are not in the population of interest are included in the sample

Question 24

Q

Nonresponse bias

Answer

A

Bias that occurs in surveying when sampled subjects can’t be reached or refuse to participate, including when those who respond do not respond to certain questions resulting in missing data.

Question 25

Q

Response bias

Answer

A

Bias that occurs in surveying when the wording of a question is confusing, the question is asked in a misleading way, or subjects lie because they think their response is socially unacceptable

Question 26

Q

LIST of poor ways to sample

Answer

A

Convenience sample
Volunteer sample
Large, non-representative sample

Question 27

Q

Convenience sampling

Answer

A

a poor method of sampling, includes individuals who are easy to sample and therefore, may not represent the whole population

Question 28

Q

Volunteer sample

Answer

A

a poor method of sampling, most common type of convenience sample, difficult to define sampling frame, may not represent the population because people who volunteer tend to have stronger opinions about the issue

Question 29

Q

Large non-representative sample

Answer

A

a poor method of sampling, sample size doesn’t matter if it’s not representative of the population

Question 30

Q

Questions to be asked when assessing the validity of surveys

Answer

A

How was the sample selected?
Sample size?
Nonresponse rates?
How are the questions worded-how many, confusing, misleading, controversial?
Who sponsored the study?

Question 31

Q

treatment group

Answer

A

group that receives the treatment or experimental condition

Question 32

Q

placebo

Answer

A

a “fake” treatment that looks just like the treatment being tested, ensures that treatments appear the same to the subjects so that control subjects don’t know they are in the control group

Question 33

Q

placebo effect

Answer

A

subjects treated with a subject sometimes improve

Question 34

Q

single blind

Answer

A

subjects don’t know which groups they’re in

Question 35

Q

double blind

Answer

A

subjects and data collectors don’t know which group the subjects are in

Question 36

Q

Perks of randomization

Answer

A

eliminates bias, balances the groups on variables that may affect the groups, both known and unknown by researchers

Question 37

Q

statistically significant

Answer

A

when differences in an experiment are larger than the differences that result from randomization alone

Question 38

Q

Four principles of good experimental design

Answer

A

control
randomization
replication
Blocking(optional)

Question 39

Q

experimental units

Answer

A

people in the study

Question 40

Q

Things that can go wrong in an experiment

Answer

A

making generalizations out of convenience
sample isn’t representative
no volunteers
carefully evaluate displays

Question 41

Q

Systematic sampling characteristics

Answer

A

Less expensive
order of a list can not be associated in any way with the responses sought
beware of confounding variables

Question 42

Q

When is cluster random sampling preferred?

Answer

A

when a reliable sampling frame is not available or when the cost of an SRS is too high

Question 43

Q

Cluster random sampling sampling

Answer

A

Split the population into representative, heterogenous groups called clusters
Use random sampling to select several clusters
Perform a census of each selected cluster

Question 44

Q

Stratified random sampling

Answer

A

stratify the population into homogenous groups
SRS is used to choose members from each strata
Combine the groups from each strata to form your sample

Question 45

Q

Multistage sampling

Answer

A

sampling schemes combining several methods

Question 46

Q

Types of observational studies

Answer

A

Retrospective observational studies: Look into the fast

Case-control study: a type of retrospective study, often used in medical research. Subjects who have a response outcome are referred to as cases and subjects who have the other response outcomes are referred to as cases

Prospective observational study: looks into future, aka cohort studies

Cross-sectional: sample survey of a cross section of a population in current time

Question 47

Q

Experimental design diagrams

Answer

A

enables a quick comparison of results, can use only number of groups for the explanatory variable

Question 48

Q

Purpose of matching and blocking

Answer

A

There are two ways researchers can balance the effects of potential lurking variables

Question 49

Q

Matching

Answer

A

used in observational studies, attempts to achieve the balance that randomization achieves, subjects are paired due to similarites not being studied, includes case control studies

Question 50

Q

Matched-pairs

Answer

A

used in experiments, subjects paired with themselves, each treatment is observed for each subject, pre test/ post test/ cross-over designs

Question 51

Q

Blocking

Answer

A

used in experiments, groups similar experimental units together, randomized, reduce potential bias, treatments are usually randomly assigned within a block

Question 52

Q

What does a statistic describe?

Question 53

Q

Statistical inference

Answer

A

uses sample data to draw conclusions about a population, involves probability calculations on a sampling distribution of a statistic, requires random sampling or randomization

Question 54

Q

point estimate

Answer

A

single number, representing our best guess for the parameter, for any particular parameter, there are several possible point estimates depending on the sample selected

Question 55

Q

Interval estimate

Answer

A

a range of plausible values for the parameter, consists of a point estimate and margin of error

Question 56

Q

Properties of point estimates

Answer

A

unbiased
small standard deviation
likely precision
high confidence level

Question 57

Q

standard error,

Answer

A

abbreviated SE, use of a statistic to compute the standard deivation of the sampling distribution, different for means and proportions

Question 58

Q

Z-scores for 0.90, 0.95 and 0.99 confidence levels

Answer

A

90: 1.645
95: 1.96
99: 2.576

Question 59

Q

Steps for constructing a confidence interval for one population proportion

Answer

A

Check assumptions
Calculate confidence interval
Interpret confidence interval

Question 60

Q

Confidence interval assumptions

Answer

A

Data is obtained by randomization

2. Large enough sample sizeL at least 15 successes and failures

Question 61

Q

What determines length of a confidence interval?

Answer

A

the precision of the estimate(wider=less precise)

Question 62

Q

Relationship between confidence interval and precision

Question 63

Q

when do we use a t-distribution?

Answer

A

when we estimate the population standard deviation with the sample standard deviation

Question 64

Q

Total area of z and t distributions

Answer

A

1 for both

Answer 63

A

extends indefinitely in both directions and approaches the horizontal axis asymptotically for both

Answer 64

A

z is normally distributed and t is not normally distributed

Answer 65

A

Mean=0 for both. standard deviation=1 for z and is greater than 1 for t

Answer 66

A

Z: caused solely by variation of sample means

t: variation of sample means and sample standard deviations

Answer 67

A

Z: same distribution regardless of size

t: different distribution for each sample size, identified by degrees of freedom or n-1

Answer 68

A

Data obtained randomly

2. Normal population or large enough sample size

Answer 69

A

it is robust to moderate violations of the normality assumption

Answer 70

A

outliers since sample mean and sample standard deviation are both susceptible to outliers

Answer 71

A

Check assumptions
Calculate confidence interval
Interpret CI in context

Answer 72

A

Inferential procedures are run with and without outliers

Answer 73

A

Margin of error is affected directly by standard error and indirectly by sample size, because sample size affects standard error

Answer 74

A

Margin of error
Desired precision
Confidence level
Variability in the data
Cost of obtaining a sample

Answer 75

A

n=P(1-P)Z^2 / E^2

Use the probability you have. If you don’t have one, use 0.50 because it will give you the largest possible sample size so you will definitely have a large enough sample

Answer 76

A

direct when precision is held constant

Answer 77

A

n=(standard deviation)^2(z)^2 / E^2

Answer 78

A

Valid for any n, use caution for small sample sizes, does not work with outliers or highly skewed data

Answer 79

A

requires at least 15 successes and failures, if sample size is too small, P will not be normal

Answer 80

A

CI formula is still valid if we use it after adding 2 to the original number of successes and failures, adding 4 total to the sample size. This moves the sample proportion towards 1/2, used with confidence levels over 90% and sample sizes over 10

Answer 81

A

Suggest that a parameter varies
Making claims about a sample proportion
asserting that the population proportion cannot be outside your interval
Overgeneralizing results

Answer 82

A

Not all confidence intervals will capture the true parameter value
Whole interval is not treated equally
Margin of error is not small enough to be useful
assumptions are violated

Answer 83

A

Allow you to construct confidence intervals when: it is difficult to find the SE, and when the CI interval doesn’t work well, it is a simulation method where the population is viewed as many, many copies of the original sample(data distribution)

Answer 84

A

Resample with replacement the original sample to produce a bootstrap sample, compute point estimate of parameter
Resample a very large number of sets of n observations from the original data distribution(at least 10,000) with point estimates for each of the samples
Create a distributions of the point estimates to produce a bootstrap distribution

Answer 85

A

Standard error method

2. Percentile method

Answer 86

A

estimate SE by using standard deviation of bootstrap distribution

Answer 87

A

use a percentile of the middle area of the distribution to create a confidence interval(usually 95%)

Answer 88

A

aka hypothesis test, uses sample data to decide between two competing claims about a population characteristic, uses probability to determine the plausability of a parameter, consider evidence based on sample data

Answer 89

A

Reject Ho or fail to reject Ho

Answer 90

A

Ho, always specifies a single value for the parameter, it is a claim about a population parameter that is initially assumed to be true

Answer 91

A

Ha, depends on purpose of hypothesis test, number appearing in alternative hypothesis is identical to number appearing in null hypothesis,

Answer 92

A

Test to determine whether a population proportion is difference from a specified value

Answer 93

A

test to determine whether a population proportion is less than a specified value

Answer 94

A

test to determine whether a population proportion is greater than specified value

Answer 95

A

representative sample
independent sample values
sample size if sufficiently large
We have sampled less than 10% of the population

Answer 96

A

Assumptions
Define parameter and hypotheses before gathering or looking at data
Calculate a test statistic
Obtain a p-value
State a conclusion in context

Answer 97

A

check assumptions
hypotheses/significance level
compute test stat
Find the p-value
State the conclusions: report test stat and p-value, interpret results in context

Answer 98

A

one-proportion z-test, one-sample z-test for a population proportion

Answer 99

A

Variable is categorical
data is obtained randomly
sample size is sufficiently large

Answer 100

A

probability of obtaining values that are more extreme than the observed test statistic value if the null hypothesis is true

Answer 101

A

one-mean t-test, one-sample t-test for a population mean

Answer 102

A

Variable is quantitative
data is obtained randomly
Population distribution is approximately normal according to CLT

Answer 103

A

P-value less than significance level: unsual result, reject null hypothesis

P-value greater than significance level: sample data is not unsual, fail to reject null

Answer 104

A

they are consistent

Answer 105

A

then 0 is not in the corresponding confidence interval

Answer 106

A

a 95% confidence interval will contain the H0 value

Answer 107

A

a 95% confidence interval will not contain the H0 value

Answer 108

A

i the p-value is less than the significance level in a two-sided test, a (1-a)x100 CI does not contain the H0 value

Answer 109

A

large samples

Answer 110

A

Skewed distributions and outliers

Answer 111

A

sampling variability

Answer 112

A

do not reject true H0
Reject true H0
Do not reject false H0
Reject false H0

Answer 113

A

error when a true H0 is rejected

Answer 114

A

error when a false H0 is not rejected

Answer 115

A

you have no way of knowing you committed one before consequences are experienced

Answer 116

A

a=the probability of committing a type 1 error, the significance level of a hypothesis test

the significance level must be set before running the test

Answer 117

A

B=probability of committing a type 2 error

this value is usually not known, has an inverse relationship with alpha

Answer 118

A

trying to balance the risk of committing a type 1 or type 2 error, which consequences are more serious is taken into account

Answer 119

A

A CI is more informative because because it displays the entire set of plausible values, while a significance test just tells you whether a specific value for H0 is plausible

Answer 120

A

use a CI to estimate the value of the parameter, then compare the hypothesized value to the CI values to determine how far the parameter is from the hypothesized value

Answer 121

A

the result may not be practically significant

Answer 122

A

Variables are categorical
Independent random samples from both groups
10 successes and failures

Answer 123

A

it is plausible that P1=P2

Answer 124

A

If all values in the CI are positive, P1>P2. If all values in the CI are negative, P1

Answer 125

A

how large the true difference is, small magnitude equals a small difference in practical terms

Answer 126

A

Two-proportion z-test

2. Two-proportion z-interval

Answer 127

A

Independent samples t-test

2. independent samples t-interval

Answer 128

A

Quantitative response variable for both groups
independent random samples
Normal or large enough samples, no outliers

Answer 129

A

Quantitative response variable
independent random samples
normal population or large enough sample
equal standard deviations

Answer 130

A

quantitative response variable for two groups
independent random samples
normal or large enough distribution
standard deviations are equal

Answer 131

A

both tests are used to compare the means of 2 populations based on independent random samples

Answer 132

A

increases statistical power, decrease chance of type 2 error, increases chance of type 1 error when used when standards deviations are not equal

Answer 133

A

each possible paired sample is equally likely to be selected

Answer 134

A

used when members of 2 populations have natural pairings, remove extraneous sources of variation, most likely to detect difference between population means when such difference exist

Answer 135

A

d=difference
mean=u1-u2 difference between two population means

Assumptions

Simple random paired sample
Normal distribution of differences or large enough sample for what is believed to be the shape of the distribution of the differences

Answer 136

A

Sample is randomly obtained

2. Large enough sample: At least 15 successes and 15 failures

Answer 137

A

infinitely many F distributions identified by its number of degrees freedom, has a numerator and denominator degree of freedom, total area under curve=1, curve is right skewed

Answer 138

A

Analysis of variance

Answer 139

A

ANOVA compares means of a variable for populations from a classification by a categorical explanatory variable(factor) and level(possible values of a factor)

Answer 140

A

It compares variation between samples to variation within samples

Answer 141

A

If “between” > “within” , not all population means are equal

Answer 142

A

measures total variation, can be partitioned into between(SSTR) and within (SSE) samples. SST=SSTR+SSE

Answer 143

A

data is randomly obtained
independent samples
normal or large enough samples(robust)
Equal population standard deviations
- Largest/smallest<2
- robust provided sample sizes are equal

Answer 144

A

F=MSTR/MSE
df1=g-1
df2=N-g

Answer 145

A

Analyses conducted after the initial analysis to see which means actually differ, compare each mean as a pair using their confidence intervals

Answer 146

A

it ensures samples are independent

Answer 147

A

1-2(significance level)

Answer 148

A

1-significance level=CI

Answer 149

A

lurking variables are not accounted for in the study, confounding variables are