Statistics Flashcards
Nominal Data
Data classified into mutually exclusive categories lacking intrinsic order. I.E. Phone numbers, colors, types of plants, etc.
Ordinal Data
Ordered categories that imply ranking. I.E. Letter grades, race times, best voted restaurants, etc.
Interval Data
Ordered numerical data where the difference between each point is equal from one another. I.E temperature, time, mark grading (1-100).
Ratio Data
Numerical data where there is equal distance between adjacent values and it has a true 0. I.E. Temperature in Kelvin, height, age in years.
Variable
A quantity that can be assumed to vary or be capable of varying in value. Such as X=2
Quantitative variable
A variable in which the actual numerical value is meaningful. Represents an interval or ratio measurement.
Qualitative variable
A variable in which the numerical value is not meaningful. Represents an nominal or ordinal measurement.
Population
The total of some group. I.E. People on earth, ducks candles in a candelabra, ducks in a pond, etc.
Sample
A subset of a population. I.E. single ducks in a pond, melted candles in a candelabra, etc.
Descriptive statistics
Statistics that describe the characteristics of some values in a population or a sample of a population. I.E. A mean, median, or mode.
Inferential Statistics
Statistics that use probability to determine population characteristics. Taking a sample and making inferences about a population.
Distribution
The overall shape of all observed data. How it looks when put into a histogram, density plot, scatter plot, etc.
Range
The difference between the largest and the smallest value in a data set.
Normal distribution / Gaussian distribution / Bell Curve
Distribution is symmetrical - An equal number of observations fall above and below the mean.
Asymmetrical distribution / skewed distribution
More observations fall to one side or the other of the mean. They skew right or left when the large outliers are above or below the mean.
Central Tendency
A single value that attempts to describe a data set by identifying the the central position within that set of data. I.E. Mean, Median, and mode.
Mean
The average of a distribution. I.E. (2 + 3 + 4 + 5)/4,
Median
The middle value of a ranked distribution. If there are two middle values, it would be the average of the two values.
Mode
The most frequent number in a distribution.
Inter-quartile range (IQR)
The difference in values between the 75th and 25th percentile in a distribution. the 1/4 and 3/4 cutoff points.
Variance (the math kind)
a measure of how data points differ from the mean.
Hypothesis test
A way of testing a hypothesis. You’re basically testing whether your results are valid by figuring out the odds that your results have happened by chance. Disprove a null hypothesis.
Null hypothesis
the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error.
Standard deviation
A measure of the amount of variation in a set of values. low = more values closer to the mean. High = more values further from the mean.
Test statistic
Number calculated from a statistical test of a hypothesis. It shows how closely your observed data match the distribution expected under the null hypothesis.
The test statistic is used to calculate the p-value of your results, helping to decide whether to reject your null hypothesis.
Confidence interval
The confidence interval is the range of values that you expect your estimate to fall between a certain percentage of the time if you run your experiment again or re-sample the population in the same way.
T-test
A t-test is a statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.
ANOVA analysis
Anlysis of variance test. used to analyze the difference between the means of more than two groups.
Chi square test
A chi-square test is a statistical test used to compare observed results with expected results.