Statistics (Brilliant) Flashcards
Mathematical definition of mean
Sum of the set divided by the number of elements in the set
Definition of median
Middle element of a set, with equal number of elements above and below
How is the median determined if the cardinality of the set is even?
Mean of the middle two elements
What is necessary regarding the arrangement of elements in a set in order to determine the median?
The data must be sorted
Definition of mode
The value that appears most
Can a set have more than one mode?
Yes
What advantages does mode have over other measures of centrality?
It can evaluate non-numerical data and it can identify lurking variables in bimodal distributions
What disadvantage does mode have over other measures of centrality?
It’s the least useful for inferring about the rest of the data
What advantage does median have over mean?
It’s less affected by outliers
How are the 1st, 2nd, and 3rd quartiles defined?
1st quartile is the median of the lower half of the data with 25% of data points before it; 2nd quartile is the median; 3rd quartile is the median of the upper half of the data with 25% of the data points above it
What is the interquartile range?
The difference between the 1st and 3rd quartiles
What is the definition of an outlier?
Less than the 1st quartile - 1.5 x interquartile range, or greater than the 3rd quartile + 1.5 x interquartile range
Definition of expected value
Sum of (value x probability) for all values in set
Definition of complement
All of the events that aren’t the event in question
Sum rule #1 (for mutually exclusive events): P(AvB) = ?
P(AvB) = P(A) + P(B)
Sum rule #2 (for non mutually exclusive events): P(AvB) = ?
P(AvB) = P(A) + P(B) - P(A&B)
Product rule: P(A&B) = ?
P(A&B) = P(A) x P(B)
What is Simpson’s Paradox?
A reversal of outcomes between individual cases and the overall total.
What is the definition and range of the correlation coefficient r?
-1 <= r <= 1, representing the strength and direction of a linear relationship
What is a residual?
Difference between actual y-value and predicted y-value
What is the residual sum of squares (SSR)?
Sum of the squares of the residuals
What is the total sum of squares (SST)?
Sum of the squares of the residuals relative to the mean value of y
What is the coefficient of determination (R2)
1 - (SSR/SST)
What is a linear regression?
The line that minimizes the coefficient of determination
What is regression to the mean?
If a variable is measured at an extreme value, the next measurement will likely be closer to the mean.
What is Kelley’s formula for estimated ability?
Estimated ability = (reliability)(score) + (1 - reliability)(average group score)
Which show more variation, smaller or larger data sets?
Smaller