20. Statistics Flashcards
What is the mean, median, and mode?
mean - the average. Susceptible to extreme values (outliers)
median - the middle of the data set. Misses outliers.
mode - the most repeated value of a data set
How do you find the median of an even numbered data set?
You take the mean of the two most middle numbers. Thus, similar to the mean, the median does not have to be a number in the data set. In contrast, the mode is always a number in the data set.
What is a standard deviation? On a normal curve, explain percentage of occurrences.
SD explains how much given numbers differentiate from the mean. On a normal curve, 34% of numbers fall within 1 SD to each side of the mean (meaning 68$ of numbers fall within 1 SD of the mean). 95% of numbers fall within 2 SD’s of the mean, 99% within 3.
The lower the standard deviation, the closer all the numbers are to the mean. A high SD indicates the numbers are more varied.
t or f, the 50th percentile is the same thing as the mean.
Sometimes, this is the peak of a normal curve. the median is always 50th percentile
What is statistical power?
Statistical power is the probability of a true positive result. Sample size is correlated with power. Increased sample size increases the likelihood of a true positive.
i.e. it is the probability that the test correctly rejects the null hypothesis. In inferential statistics, the hypothesis is always no change (null).
What is sampling bias?
To test for some variable, we must randomly select people from an eligible population. If this process is not truly random, our results may have confounding variables. This is sampling bias.
What is specific real area bias self-selection bias pre-screening (advertising) bias healthy user bias
specific real area bias: the physical space people are selected from is disproportionate.
self-selection bias: If the people studied have some control over their level of participation (e.g. surveys)
pre-screening (advertising) bias: e.g. advertising in specific targeted areas.
healthy user bias: if the sample is likely healthier than the population (recruiting marine officers).
t or f, the self-selection bias occurs when the level of participation of subjects skews the results
true, like how surveys may only get highly opinionated peoples responses.
A t-test compares the means of two groups, typically for normally distributed data. What is the p-value for significance?
p < 0.05
What is reliability? Explain test-retest reliability and inter-rater reliability.
reliability is the degree to which an assessment tool produces reliable and replicable results.
test-retest reliability –> if the same person takes the same test 5 times, their result should be consistent (i.e. obtaining same results over time)
inter-rater reliability –> the degree to which two judges using the assessment tool agree on the results of one subject. If many researchers using the same assessment tool disagree wildly, the tool is not reliable.
Validity explains how well an experiment measures what its trying to measure. Explain internal, external, and construct validity.
internal validity refers to whether the results of the study properly demonstrate a causal relationship between the 2 variables. Affected by confounding variables. Is Y truly dependent on X?
external validity asks whether the results can be generalized to other situations.
construct validity asks whether a tool is measuring what it is intended on measuring. Has it been constructed properly?