Final Exam Flashcards
Just for stats final, notecards from summaries of chapters
what are marginal distributions in tables?
row totals and column totals, can be presented as percent of table total
what are conditional distributions?
distributions of row variable for each value of column variable, and column variable for each value of row variable
4 step process for statistical problems?
state, plan, do, conclude
SPDC
what is simpsons paradox?
an association between 2 variables that holds for each value of a third variable can be changed or reversed when the data for all values of 3rd variable are combined. example: helicopter rescues have more deaths even though the care is more advanced, why? They are used for more severe rescues (severity of rescue can be 3rd variable here)
know what a dotplot, stemplot, and histogram are.
Show the distribution of a quantitative variable. A dotplot shows values on a number line. Stemplots separate each observation into a stem and a one digit leaf. Histograms plot the counts or percents of values in equal width classes.
other words for counts and percents?
frequency and relative frequency.
How to describe patterns of distributions?
SOCS:
s: shape
o:outliers
c: center
s:spread
simple shapes for distributions?
symmetric or skewed… # of modes can also be used to describe shape (unimodal, bimodal, multimodal)
Mean vs median?
mean is the average of the observations, median is midpoint of listed values in numerical order
Five Number summary?
Median: middle of values
quartile 1: split data values into 4 sections, this is the second
quartile 3: this is the second section, median divides the two
Maximum: highest value
minimum: lowest value
Interquartile range?
Q3 - Q1
how to use IQR to find outliers?
it is an outlier if:
-smaller than Q1-(1.5IQR)
-larger than Q3+(1.5IQR)
Box plots
draw lines at Q1, median, and Q3 and make a divided box with them. Whiskers go to min and max values. Outliers are separate plot points to prevent skew of shape.
What are variance and standard deviation?
common measures of spread about the mean as its center.
Resistant vs nonresistant measures
resistant: not largely affected by extreme observations
example: median, IQR
nonresistant: affected by extreme observations
example: mean, standard deviation
transforming data? how do addition/subtraction compare to multiplication/division in how they affect measures of data?
adding a constant to all the values in a data set, measures of center and location increase by a. measures of spread unaffected
multiplying all values in a data set by a constant measures of center and location are multiplied by b, but also measures of spread.
density curve
total area 1 underneath, an area under it gives proportion of data in that region.
how to locate mean and median on a density curve?
mean is balance point, median is where area under it is .5
normal distributions
bell shaped, symmetric density curves.
mean: μ
standard dev: σ
mean is the center of the curve, and stddev can be used to divide graphs into sections with predictable area.
percent rule for normal distributions
-68% of values lie within one stddev
-95% of values lie within two
-99.7 lie within three
z score equation? for normal distributions
z = (x-μ)/σ
What is a scatterplot?
displays relationship between TWO quantitative variables
explanatory and response variables on scatterplot
if we think that a variable x may help explain, predict, or even cause variable y we call x explanatory variable and y a response variable. Always plot explanatory on x.
how to explain a scatterplot? hint: DOFS
D: direction
O: outliers
F: form
S: strength
How to explain direction? of a scatterplot
positive association: high values occur together, positive slop on LOBF
negative association: low values occur when the other variable is high, negative slope
how to explain form of a scatterplot?
linear relationships, points show a straight line pattern
Curved and clustered are also good ways to describe form!
how to explain strength of a scatterplot?
determined by how close the points in the scatterplot lie to a simple form such as a line
what does correlation r value measure with two variables?
the strength and direction of linear association between two quantitative variables x and y. r only measures straight line. is between -1 and 1. indicates strength by how close it is to -1 or 1 (-1 for neg ass, 1 for pos ass). CORRELATION IS NOT RESISTANT