Final Exam Flashcards
Every statistical conclusion is stated in terms of ________
Probability
Statistical calculations do NOT yield DEFINITE _______
Conclusions
Central tendency
Middle of the data
How do you find the arithmetic mean/ what is the equation?
Add up all the values and divide by the number of observations / mean=sum(values)/n OR
The arithmetic mean is also known as the?
Average
Is the arithmetic mean tolerant to outliers?
NO!!!
How do you find the median if there is a even number of values?
Rank the values from lowest to highest and average the middle two
How do you find the median if there is an odd number of values?
Rank the values from lowest to highest and the center one is the median
Which is the better choice when quantifying the central tendency of a data set: Mean or median?
The median bc it is more tolerant to outliers
What are the three main steps to finding the geometric mean? -> Think logs
1) Transform all the values into their logarithms
2) Compute the mean of these logarithms
3) Take the antilog of that mean
Is the geometric mean tolerant to the presence of outliers?
YES
When calculating the geometric mean, all the values must be _______
Positive
What is the equation for the geometric mean?
Geometric mean= ((X1)(X2)(X3)……(XN))^1/N
What are the three main steps to finding the harmonic mean? -> Think reciprocals
1) Transform each value to its reciprocal
2) Compute the arithmetic mean of those reciprocals
3) Take the reciprocal of this mean
The harmonic mean can’t be computed when the values are _______ or ______
Zero / negative
Is the harmonic mean more stable when outliers are present?
YES
What is the equation for the harmonic mean
N/(1/X1 + 1/X2 + 1/X3 + 1/X4 + ……….. + 1/XN)
How do you find the trimmed mean?
Ignore or “trim off” the highest and lowest values and take the arithmetic mean of the values
What is the purpose of the trimmed mean?
It’s a simple/ primitive way of getting rid of outliers
Is the trimmed mean tolerant to outliers?
Kind of? It’s more tolerant than the arithmetic mean, but not as tolerant as the other methods
What are the 5 ways of quantifying the central tendency of of a data set?
1) Arithmetic mean
2) Median
3) Geometric mean
4) Harmonic mean
5) Trimmed mean
How do you find the mode?
It’s the values that occurs most commonly in the data set
Does the mode always assess the center of the distribution?
No, not always
Can you use the mode (. ) function in R to find the mode?
NO: This tells you what kind of data it is. To find the mode you’d have to use other functions
What is the difference between a continuous and discrete variable?
A continuous variable is well connected with time -> Think of a continuous variable as a straight line and a discrete variable as individual spots or data points
What is an interval variable? (3)
1) An interval variable is a type of continuous variable
2) You can calculate the difference of the two of this kind of variables, but can’t make any sense of the ratio bc you don’t get the same thing
3) On the scale the zero is defined arbitrarily
When should you use the harmonic mean?
When you are dealing with proportion, rates, and ratios
What are examples of an interval variable?
Temp in degrees C and F, pH, credit score
What is a ratio variable (3)
1) A type of continuous variable
2) You can calculate both the difference and the ratio of the data and have it make sense/ be meaningful
3) Zero on the scale is not defined arbitrarily
What are examples of a ratio variable?
Temp in Kelvin, distance, length, height, weight (in any units- metric and english system)
What are the two types of continuous variables we’ve disucssed?
Interval and ratio variables
What is an ordinal variable? (3)
1) It is not a type of continuous variable
2) It MUST express rank
3) The order of the values matter, but not the exact number
2 star vs 4 star hotel ranking, credit scores, and test scores are an example of what type of variable?
Ordinal variable
What is a nominal variable? (2)
1) It is not a type of continuous variable
2) It is used to describe the data with multiple categorical outcomes
Passing and failing classes is an example of what type of variable?
Nominal variable
What kind of variable is the color spectrum?
Some variables like the color spectrum can be quantified and treated as a ratio variable (in terms of the wavelength of each color) or as an ordinal variable (in terms of the normal orders of the colors)
What does the point prevalence tell us?
it refers to the proportion of participants with a risk factor or disease at a particular point in time
What is the equation for point prevelance?
PP= number of people with the disease/ number of people examined (usually at baseline)
What does relative risk or risk ratio (RR) tell us?
It is a useful measure to compare the prevalence or incidence of disease between two groups
What is the risk ratio essentially?
Just a ratio of the point prevalences between a group and a reference group
What is the equation for relative risk (risk ratio, RR)?
Relative risk (risk ratio, RR)= Point prevalence of exposed or experimental arm/ Point prevalence of exposed or control AKA reference group
What does it mean if you get a relative risk or risk ratio of 1?
A relative risk of 1 means exposure to the risk factor is UNRELATED to the risk of the disease .: If a risk factor is related to the risk of disease, relative risk will not equal 1.
What is the 2 equations to find the odds?
Odd= Number of event/ number of nonevent / Odd= Point prevalence exposed or unexposed experimental arm or the control/ 1- Point Prevalence exposed or unexposed experimental arm of the control
How is the odds in statistics different from regular probability?
Odds in statistics= number of event / number of NONEVENT, whereas a regular probability= number of event/ TOTAL NUMBER OF EVENTS
What is the equation for the odds ratio?
Odd exposed or experimental arm/Odd unexposed or the control
What does it mean if the odds ratio is 1?
It means that exposure to the risk factor is unrelated to the risk of disease .: If a risk factor is related to the risk of disease, odds ratio will not equal 1.
What units does standard deviation have?
The same ones as the data
What is the rule of thumb for interpreting standard deviation?
The mean plus or minus 1 SD houses about 2/3 of the data (68/3%) and the mean plus or minus 2 SD houses about 95-95.4% of the data
What are the 6 main steps you use to calculate calculate the standard deviation of a sample?
1) Calculate the arithmetic mean
2) Calculate the difference bt each value and the mean -> Called the deviation
3) Square those differences
4) Add up those squared differences
5) Divide that sum by ( n-1), where n is the number of values -> Called the variance
6) Take the square root of the variance you calculated
When calculating standard deviation, why do we use n-1, instead of n
n-1 is called the degrees of freedom and it’s used bc the sample variance (the square of the SD) computed using (n-1) is an unbiased estimate of the population variance
Is the SD computed with n-1 as the denominator, the most accurate estimate of the population SD?
No, it is a bias estimate of the population SD, which means, on average, it does not equal to the population SD. It’s used bc the sample variance (the square of the SD) computed using (n-1) is an unbiased estimate of the population variance
Can the mean or median equal zero or negative?
YES
Can the standard deviation be zero?
Yes, when all the data values are the same
Can the standard deviation be negative?
No
Can the mean and median be computed when n=1? When n=2?
The mean and median can be computed when n=2, but it does not make sense to calculate them when n=1 bc the mean/ median would just be equal to that data value
Can SD be calculated when n=1? When n=2?
SD can be calculated when n=2, but not when n=1
What kind of variable is the coefficient of variation (CV) for?
Ratio variables
When would you prefer to use CV to display variabilities?
When you’re trying to compare 2 or more data sets
Does the coefficient of variation (CV) have units?
NO
What is the equation for coefficient of variation (CV)?
CV= SD/mean
What does a larger CV indicate?
If the CV of one data set is larger than the CV of another data set, then the data set with a larger CV has greater variability, meaning the data points are relatively more different from each other
What is the equation for variance?
It is the square of SD
What are the units for variance?
The same units as the data but squared
What are quantiles?
The values or cuts made dividing the range of a data set into q continuous intervals/ subsets with equal size
If there are q subsets, how many quantiles?
q-1
What is a good way to think of quantiles?
Think of them as the cuts made to cut data into subsets w/ equal observations
What are quantiles used for?
To quantify scattered data
In order to make 3 subsets, how many quantiles or cuts do we have to make?
2
In order to make 4 subsets, how many quantiles do we have to make?
3
What is a quartile?
A quartile is just a special type of quantile that divides the data up 3 times so that we have 4 subsets (q=4), creating the 25th, 50th (where the median occurs), and 75th percentile
What is a percentile?
A percentile is just a special type of quantile that divides the data up into 100 equal subsets (q=100), creating 99 percentiles
Is there such thing as a 100th percentile?
NO
At what percentile does the median occur?
The 50th
How do you find the interquartile range?
You subtract the first quartile (25th percentile) from the third quartile (75th percentile)
What units does the interquartile rang ehave?
The same units as the data
What does Xth percentile tell us?
That that percentile is a value where X% of the values in a data set lie below
How many different ways are there to calculate the interquartile range and which one should you use to calculate it on the exams?
3 / The 1st