DECK 6: LIFELINE UNIT 1 AND 2 REVIEW Flashcards
How do you describe CENTER for bimodal or multimodal?
talk about the modes (the lumps, the clusters)
Interpret r squared
r squared % of variability in y can be explained by the model with x. The rest is in residuals…
How are mean, median and mode positioned in a skewed left histogram?
goes in that order, mean median mode
When drawing a normal model, what are the PERCENTILES from left to right?
2.5, 16, 50, 84, 97.5
Give an example of independent variables
If 80% prefer cheese and only 20% prefer pepperoni IN EACH GRADE AT BHS…then they all have the same preference, so grade doesn’t matter. We say “school year and pizza choice are independent”
what is the LSRL
the “least squares regression line”
that line you plot
OR
That equation
How can you match boxplots to histograms?
USE THE FISH TANK METHOD!
If the mean is above the median, the distribution may be
skewed right… the mean follows the tail
mean/SD/median/IQR. How do I know which ones to use?
when unimodal and symmetric, mean and sd. If skewed or outliers? Median and IQR. If bimodal? Talk about the MODES
Interpret residual: Points below the line/negative resid
“the model overpredicted”
or
“Actual value was below the the expected (or predicted)”
What is a CUMULATIVE FREQUENCY GRAPH?
An OGIVE. It shows the added up totals as you go left to right.
not independent is the same as
associated
What point is on every regression line?
the mean-mean point. (x bar, y bar).
This point is generally not one of the points on the scatterplot.
Usually none of the scatterplot points are on the regression line.
Compare population to sample
populations are generally large, and samples are small subsets of these population. We take samples to make inferences about populations. We use statistics to estimate parameters.
When there is no relationship between two variables, we say they are
independent (or not associated)
If something is correlated is it associated?
Yes.
If it is correlated then it must be associated.
However, if it is associated,it may not be correlated.
Does the IQR capture 68% of the data?
NO. it catches the middle 50%.
What do OGIVES look like?
They all start at the bottom left (0%) and go to top right (100%)
which is response?
y variable,
the Vertical axis..
It “responds” to the x
where are the “outlier fences?”
1.5 IQR above Q3 and 1.5 IQR below Q1. Just a rule of thumb.
If the distribution is skewed (or outliers/not symmetric) what would you use for center and spread statistics?
Median (center) and IQR (spread)
What symbols do we use for population standard deviation and sample standard deviation?
Sigma for population and s for sample.
describe a scatterplot’s strength?
give the r value (if straight),
or say…
“tightly packed… loosely packed”
Use the following words in one sentence: population, parameter, census, sample, data, statistics, inference, population of interest.
I was curious about a population parameter, but a census was too costly so I decided to choose a sample, collect some data, calculate a statistic and use that statistic to make an inference about the population parameter (aka the parameter of interest).
Compare DATA-STATISTIC-PARAMETER using categorical example
Data are individual measures? like meal preference: ?taco, taco, pasta, taco, burger, burger, taco?? Statistics and Parameters are summaries. A statistic would be ?42% of sample preferred tacos? and a parameter would be ?42% of population preferred tacos.?
how do you describe direction?
positive or negative
How do you describe SPREAD for skewed distributions (or distributions with outliers?)
Use the IQR
What is meant by relative frequency?
The PERCENT of time something comes up (frequency/total)
Give a simple example showing that adding a constant doesn’t change the spread, but changes the center. (this always happens)
Data set: 1,2,3,4,5 Spread (range):4, Center: 3 add three and get new data set: 3,4,5,6,7 spread:4 Center: 5 (center went up, spread stayed the same). The IQR and SD will stay the same, but median and mean go up 3. Called shifting, or sliding the data.
How do you find the median from an OGIVE?
go halfway up the y axis, then shoot across to the curve, then straight down. It’s at the 50th percentile (halfway up)
What is the five number summary?
min, Q1 , Q2(median), Q3 and max
What is data?
Any collected information. Generally each little measurement? Like, if it is a survey about liking porridge? the data might be ?yes, yes, no, yes, yes? if it is the number of saltines someone can eat in 30 seconds, the data might be ?3, 1, 2, 1, 4,3 , 3, 4?
how do you describe form of a scatterplot?
straight orcurved?
What is a standard deviation?
average (typical) distance to the mean (about). It is how far you expect a random value to be away from the middle.
How do youinterpret slope EQUATION?
rSy/Sx
for each increase of 1 st dev in x direction,
you go r st dev in y direction.
2st dev in x, you go 2r st. devin y.
3st dev in x, you go 3r st. dev in y.
How do you find outliers in regression?
they don’tfollow the “flow”
pinky trick, cover with you pinky.. Then uncover.. Does it follow the flow?
What values can r be?
from -1 to +1
r near 0 is WEAK
What are the two types of observational studies?
Retrospective, and Prospective
What is variability?
Differences? how things differ. There is variability everywhere.. We all look different, act different, have different preferences? Statisticians look at these differences.
What should we look for in resid plot?
Curve or pattern means a linear model is NO GOOD.
Also, it should have equalish scatter from left to right
It should look RANDOM
What is extrapolation?
Making predictions outside of the x values you have.
What is a Z score?
The number of standard deviaiton away from the mean
What does normcdf do?
It gives you the area under the normal curve between any two z scores
If I take a random sample of 20 hamburgers from FIVE GUYS and count the number of pickles on a bunch of them? and one of them had 9 pickles, then the number 9 from that burger would be calle
a datum, or a data value.
How do you describe CENTER for skewed or distributions with outliers?
use the MEDIAN
Association and Independence. How are they related?
Variables are either independent or associated. Meaning: if one impacts the other then we say there is an association. If not, Then they are independent.
Give a quick example of associated variables
A higher percentage of boys play video games than girls so we say “gender and video game playing are associated” or “gender and video game playing are not independent”
What is the difference between categorical VARIABLES and categorical DATA?
The Variable is the overall category. Like “EYE COLOR”. The data is the actual measurement from the subjects. Like “blue, brown, blue”
What is the IQR?
Interquartile range… a measure of spread. Q3-Q1. The distance from Q1 to Q3. The regular range is Hi-Lo, this is the inner range, the interquartile range.
what does influential mean?
It impacts the SLOPE.
It means that the point, when added or removed to data, will influence the SLOPE.
Generally these are outliers in the x direction. Far left or right.
what is leverage?
Far right or far left from the middle.
leverage just means it is far away from x-bar
Some leverage points are not influential if they go along with the flow of the scatter.
Compare DATA-STATISTIC-PARAMETER using quantitative example
Data are individual measures, like how long a person can hold their breath: ?45 sec, 64 sec, 32 sec, 68 sec.? That is the raw data. Statistics and parameters are summaries like ?the average breath holding time in the sample was 52.4 seconds? and a parameter would be ?the average breath holding time in the population was 52.4 seconds?