Stats Vocab List Flashcards
Sensitivity
A test’s ability to identify someone as positive, or true positives: (Positive Given Disease Present)
Specificity
A test’s ability to identify exclusively the agent, to not be misled by alternative cases. (Negative Given Disease Absent)
False Postive
A test’s chance of overcoverecting, detecting disease that is not present. (Positive Given Disease Absent)
False Negative
A test’s chance of missing disease, of undershooting. (Negative Given Disease Absent.)
Disjointed
Mutually exclusive outcomes. Ex: Getting a Head and a Tail on the same coin flip.
OR
One or the other, not both. Contrast: At least.
Addition Rule
Chance of either event happening (non-disjointed): P (A or B) = P(A) + P(B) - P(A & B).
Disjoint: P (A or B) = P(A) +P (B)
Dependent
Used in conditional probability to describe when events influence each other. Someone who plays basketball is more likely to be taller then the average American. P (A & B) = P (A) times P(B given A) or the chance that A & B happen is equal to the chance A happens times B happens when A has already happened.
Complete
Every possible outcome is in the sample space - you’ve represented everything.
Two-Way Table
A type of table where two variables are represented in frequencies i.e. Women who like pokemon vs digimon as a yes/no question.
Stem & Leaf Plot
Type of quantitative plot where a stem plot representing the ten’s place, possibly with a half-stem section, and the rest of the data behind it as leaves. Like an enumerated dot plot. Best for single quantifiable variables with small amount of values, and good individual values, with options for comparing groups and shapes.
Histogram
Type of graph for two quantitative variables, commonly including frequency. Big bars. Best for large amounts of numbers in small distributions where we care most about the shape.
Independent
Events that do not impact the other. The probability of P(A given B) = P(A) * P(B).
Cases
Experimental units, what data is collected from.
Variable
The value (quantity/quality) changing and measured through statistics.
Simulation
Running a theoretical match to compare to real data, and the likelihood of that occurring, several times.
Skew
Data significantly trailing off from the median. Side of the trail, often against walls.
Normal Distribution
Describes a common distribution pattern with a single median/mode peak and an equivalent mean (affected by skew), distribution symmetrical. SD: 68, 95, 99.7 (Empirical Rule)
Uniform
Describes a distribution pattern of flatness, with no real mode or more likely distributions.
Standard Normal Distribution
Idealized normal curve with equivalent m’s at 0, an SD of 1, and perfect symmetry.
Standard Deviation
Describes the distance of a point from the center (median), as a measure of spread. For a normal curve, this is the distance to the inflection point from the center.
Calculated as the sum of difference from mean for each value squared (squared first, then added), all over some n or n-1, dependent on sample versus population.
Uni/Bimodal Distribution
Pattern of distribution with several peaks or modes. A common sign of several distinct groups smushed into a sample space.
Quartiles
Separations of the normal distribution. Lower quartile is 25 to 50%, and the upper quartile is 50 to 75%, or Q1-Med-Q3.
Quantitative
Distinctions in variables that are quantifiable. Running speed of predators.
Categorical
Distinctions in variables that are typal. Species of predator.
Cumulative Frequency
Type of graph where cumulative or total frequency forms the y-axis. Easier to find quartiles, median and such.
Bar Graph
Histogram look-a-like using categorical data.
Rescaling
Multiplying every value in a graph by the same non-zero number - center is old center * d, spread is changed (shrink or stretch). Shape constant
Recentering
Adding a constant to all values. Median and mean grow by c. Shape and spread constant.
Sensitivity to Outliers
Vulnerable summary statistics are highly affected - i.e. mean, standard distribution. Median, quartiles and IQR are resilient.
Trend, Strength, Linerality:
Used for two variable quantitative plots like scattergrams. Trend describes relationship as positive or negative, linearity for what shape the relationship takes (line, curved) , and strength how close the relationship fits to the data. May vary - heteroscedasticity.
Lurking Variable
A hidden third variable muddling relationship between dependent and independent.
Midrange
Midpoint between minimum and maximum in a data set.
Range
Numerical distance between minimum and maximum
Binomial Distribution
X is acting as the number of successes in n independent trials, with p probability of success. P (x=K successe) = nCx p^k (1-p)^n-k.
Binomial Mean (Expected Value)
=np.
Binomial Standard Deviation
= (np(1-p))^0.5
Geometric Distribution:
X acts as the number of trials until success. With P (x=k successes) = (1-p)^k-1 p
Geometric Mean (Expected Value)
=1/p. Trials till nth success is mean * n.
Geometric Standard Deviation
= (1-p)^0.5 / p
Confidence Interval
A P confident interval consists of those population percentages p where sample proportion p hat is reasonably likely. = p hat + or minus margin of error.
Rate of Capture
For 95%, 95 out of 100 times the true proportion of the population should be within the calculated interval.
Margin of Error:
= Z score of desired confidence interval * (pq/n) ^0.5.