Stats Flashcards
Nominal data
are classified into mutually exclusive groups or categories and lack intrinsic order. A zoning classification, social security number, and sex are examples of nominal data. The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched.
Ordinal data
are ordered categories implying a ranking of the observations. Even though ordinal data may be given numerical values, such as 1, 2, 3, 4, the values themselves are meaningless, only the rank counts. So, even though one might be tempted to infer that 4 is twice 2, this is not correct. Examples of ordinal data are letter grades, suitability for development, and response scales on a survey (e.g., 1 through 5).
Interval data
is data that has an ordered relationship where the difference between the scales has a meaningful interpretation. The typical example of interval data is temperature, where the difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.
Ratio data
is the gold standard of measurement, where both absolute and relative differences have a meaning. The classic example of ratio data is a distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition, 40 miles is twice as far as 20 miles.
Continuous variables
can take an infinite number of values, both positive and negative, and with as fine a degree of precision as desired. Most measurements in the physical sciences yield continuous variables.
Discrete variables
can only take on a finite number of distinct values. An example is the count of the number of events, such as the number of accidents per month. Such counts cannot be negative, and only take on integer values, such as 1, 28, or 211.
binary or dichotomous variables
only take on two values, typically coded as 0 and 1.
Descriptive Statistics
describe the characteristics of the distribution of values in a population or in a sample. For example, a descriptive statistic such as the mean could be applied to the age distribution in the population of AICP exam takers, providing a summary measure of central tendency (e.g., “on average, AICP test takers in 2018 are 30 years old”).
Inferential Statistics
use probability theory to determine characteristics of a population based on observations made on a sample from that population. We infer things about the population based on what is observed in the sample. For example, we could take a sample of 25 test takers and use their average age to say something about the mean age of all the test takers.
Distribution
is the overall shape of all observed data. It can be listed as an ordered table, or graphically represented by a histogram or density plot. A histogram groups observations in bins represented as a bar chart. A density plot is a smooth curve.
range
the difference between the largest and the smallest value.
Normal or Gaussian distribution
also referred to as the bell curve. This distribution is symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean
Symmetric distribution
is one where an equal number of observations are below and above the mean (e.g., this is the case for the normal distribution).
An asymmetric distribution
where there are either more observations below the mean or more above the mean is also called skewed.
Skewed to the right
when the bulk of the values are above the mean. This tends to happen when the distribution is dominated by a few very large values (outliers)