Final Exam Flashcards
Just for stats final, notecards from summaries of chapters
what are marginal distributions in tables?
row totals and column totals, can be presented as percent of table total
what are conditional distributions?
distributions of row variable for each value of column variable, and column variable for each value of row variable
4 step process for statistical problems?
state, plan, do, conclude
SPDC
what is simpsons paradox?
an association between 2 variables that holds for each value of a third variable can be changed or reversed when the data for all values of 3rd variable are combined. example: helicopter rescues have more deaths even though the care is more advanced, why? They are used for more severe rescues (severity of rescue can be 3rd variable here)
know what a dotplot, stemplot, and histogram are.
Show the distribution of a quantitative variable. A dotplot shows values on a number line. Stemplots separate each observation into a stem and a one digit leaf. Histograms plot the counts or percents of values in equal width classes.
other words for counts and percents?
frequency and relative frequency.
How to describe patterns of distributions?
SOCS:
s: shape
o:outliers
c: center
s:spread
simple shapes for distributions?
symmetric or skewed… # of modes can also be used to describe shape (unimodal, bimodal, multimodal)
Mean vs median?
mean is the average of the observations, median is midpoint of listed values in numerical order
Five Number summary?
Median: middle of values
quartile 1: split data values into 4 sections, this is the second
quartile 3: this is the second section, median divides the two
Maximum: highest value
minimum: lowest value
Interquartile range?
Q3 - Q1
how to use IQR to find outliers?
it is an outlier if:
-smaller than Q1-(1.5IQR)
-larger than Q3+(1.5IQR)
Box plots
draw lines at Q1, median, and Q3 and make a divided box with them. Whiskers go to min and max values. Outliers are separate plot points to prevent skew of shape.
What are variance and standard deviation?
common measures of spread about the mean as its center.
Resistant vs nonresistant measures
resistant: not largely affected by extreme observations
example: median, IQR
nonresistant: affected by extreme observations
example: mean, standard deviation
transforming data? how do addition/subtraction compare to multiplication/division in how they affect measures of data?
adding a constant to all the values in a data set, measures of center and location increase by a. measures of spread unaffected
multiplying all values in a data set by a constant measures of center and location are multiplied by b, but also measures of spread.
density curve
total area 1 underneath, an area under it gives proportion of data in that region.
how to locate mean and median on a density curve?
mean is balance point, median is where area under it is .5
normal distributions
bell shaped, symmetric density curves.
mean: μ
standard dev: σ
mean is the center of the curve, and stddev can be used to divide graphs into sections with predictable area.
percent rule for normal distributions
-68% of values lie within one stddev
-95% of values lie within two
-99.7 lie within three
z score equation? for normal distributions
z = (x-μ)/σ
What is a scatterplot?
displays relationship between TWO quantitative variables
explanatory and response variables on scatterplot
if we think that a variable x may help explain, predict, or even cause variable y we call x explanatory variable and y a response variable. Always plot explanatory on x.
how to explain a scatterplot? hint: DOFS
D: direction
O: outliers
F: form
S: strength
How to explain direction? of a scatterplot
positive association: high values occur together, positive slop on LOBF
negative association: low values occur when the other variable is high, negative slope
how to explain form of a scatterplot?
linear relationships, points show a straight line pattern
Curved and clustered are also good ways to describe form!
how to explain strength of a scatterplot?
determined by how close the points in the scatterplot lie to a simple form such as a line
what does correlation r value measure with two variables?
the strength and direction of linear association between two quantitative variables x and y. r only measures straight line. is between -1 and 1. indicates strength by how close it is to -1 or 1 (-1 for neg ass, 1 for pos ass). CORRELATION IS NOT RESISTANT
what is a regression line?
straight line that describes how a response variable y changes as explanatory variable x changes. You can use it to PREDICT value of y for value of x.
what is the form of a regression line?
y=a+bx
what is the least squares regression line? how does it work?
a straight line y = a+bx that minimizes sum of squares of vertical distances of observed points from line.
what is extrapolation? should you avoid it?
use of a regression line for prediction using values outside range of data from which the line was calculated. YES AVOID
what are residuals on a scatterplot?
differences between observed point and predicted values of y.
what does the standard deviation of residuals (s) measure?
average size of the prediction errors when using regression line
what is the coefficient of determination?
r^2. fraction of variation in one variable that is accounted for by least squares regression on other variable.
Example:
(r^2*100)% of y’s variation can be explained by least square regression of x!
important stuff about correlation and regression:
always interpret with caution. look for outliers that could affect regression line. do not conclude cause and effect between two variables JUST because of a strong correlation.
What is a sample survey?
selects a sample from the population of all individuals about which we ant info from.
what is random sampling?
uses chance to select a sample
what is a simple random sample (SRS)?
gives every possible sample of a given size the same chance to be chosen (do not mix with individuals). Choose an SRS by labeling members with numbers and use random digits to select the sample.
what is a stratified random sample?
divide population into strata, groups of individuals that are similar in some way that might affect their responses. Choose a separate SRS from each strata.
what is a cluster sample?
divide population into groups or clusters. randomly select some of these clusters. All individuals in the chosen clusters are included in the sample.
when to use Simple random, stratified random, or clustered samples?
Use a Simple Random Sample (SRS) when you want every member of the population to have an equal chance of being selected, while a stratified sample is best when you want to ensure representation from different subgroups within the population, and a cluster sample is ideal when you need to study large, geographically dispersed populations by randomly selecting groups (clusters) to sample from
what is bias in sampling? two examples?
systematic errors in the way the sample represents the population.
voluntary response samples: respondents choose themselves, can cause bias
convenience samples: individuals are close by and included in sample, prone to large bias.
What is sampling error? two types?
errors that come from the act of choosing your sample
random sampling error: sampling is not truly random
under coverage: some members of population are left out of sampling frame, the list from which the sample is chose.
what are two nonsampling errors?
nonsampling errors. have nothing to do with choosing sample.
this happens with nonresponse, when people cant be contact or choose not to answer. Incorrect answers can lead to response bias.
also happens with wording of questions, can influence answers.
What is an observational study?
gathers data on individuals as they are
what is an experiment?
actively do something to measure a response.
what are confounded variables?
when effects on a response can’t be distinguished from each other. observational studies and uncontrolled experiments often fail to show changes in an explanatory variable actually causes changes in a response variable because explanatory variable is confounded with lurking variables.
what are treatments?
a combination of values of the explanatory variables.
what are experimental units?
the smallest unit a treatment of an experiment is applied to.
what is control, random assignment of treatments, and replication in experiment?
control prevents lurking variables that are confounded with explanatory variable. random assignment of treatments is just randomly assigning treatments to an experimental unit. replication is doing it over and over and getting consistent results.
double blind and single blind treatments?
DB: when neither party knows who has what treatment in an experiment
SB: when on party knows who has the treatment.
what is blocking in an experiment?
individuals that are similar in some way important to experiment
what does making an inference about a population require?
the individuals taking part in the study be randomly selected from this large population. Doing this allows inference for cause and effect.
law of large numbers in probability?
the proportion of times that a particular outcome occurs in many repetitions will approach a single number.
what is a simulation?
imitation of chance behavior. follows 4 step process SPDC
what is a probability model?
describes chance behavior by listing possible outcomes in the sample space S and giving the probability of each outcome.
what is an event?
a subset of possible outcomes.
complement rule?
P(Ac) = 1-P(A)
mutually exclusive events?
events A and B are mutually exclusive if they have no outcomes in common.
addition rule for mutually exclusive events.
P(A or B) = P(A) + P(B)
what does P(A U B) mean? P(A ∩ B?)
P(A or B), P(A and B)
general addition rule can be used to find P(A U B) “P(A or B)”
P(A U B) = P(A) + P(B) - P(A ∩ B)
what is conditional probability? Notation?
if one event has happened, the chance another will happen is a conditional prob. Notation P(B|A) represents prob of B given A has happened
what are independent events?
the chance that event B occurs is not affected by whether or not A has occurred.
P(B|A) = P(B) and P(A|B) = P(A)
if events are mutually exclusive, they cannot be independent.
general multiplication rule for probability (for independent events too)
P(A ∩ B) = P(A)(B|A)
for independent:
P(A ∩ B) = P(A)P(B)
conditional probability formula?
divide both sides of general multiplication rule by P(A) and we get
P(B|A) = P(A ∩ B) / P(A)
what is a binomial setting? acronym? binomial random variable?
consists of n independent trials with the same chance process, each resulting in a success or failure, prob of success = p. The count X of successes is a binomial random variable. Its probability distribution is a binomial distribution.
BINS
B: Binomial (2 outcomes)
I: independent trials
N: trials fixed in advanced
S: success? (sample value of p for all trials)
binomial probability of observing K successes in n trials?
P(X=k) = (nk)pk(1-p)n-k
mean and stddev of binomial random sample?
μx= np
σx= sqrt(np(1-p))