Statistical Analysis Flashcards
3 Steps in the Statistical Process
1) collect data (e.g., surveys); 2)describe and summarize the distribution of the values in the data set and 3) interpret by means of inferential statistics and statistical modeling (i.e., draw general conclusions for the population on the basis of the sample)
Nominal Data
are classified into mutually exclusive groups or categories and lack intrinsic order. A zoning classification, social security number, and sex are examples of nominal data. The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched
Ordinal Data
Are ordered categories implying a ranking of the observations. Even though ordinal data may be given numerical values, such as 1,2,3, and 4, the values themselves are meaningless. Only the rank counts. It would be incorrect to infer, for example, that 4 is twice 2, despite the temptation. Examples of ordinal data include letter grades, suitability for development, and response scales on a survey (e.g, 1 through 5)
Interval Data
has an ordered relationship where the difference between the scales has a meaningful interpretation. the typical example of interval data is temperature, where the difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.
Ratio Data
the gold standard of measurement, where both absolute and relative differences have a meaning. The classic example of ratio data is a distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition 40 miles is twice as far as 20 miles.
Variable
a mathematical representation of a concept, and thus also of the measurement of that concept.
Quantitative Variables
(e.g., household income, level of a pollutant in a river). Represent an interval or ratio measurement
Qualitative Variables
E.g., a zoning classification. Correspond to nominal or ordinal measurement.
Continuous Variables
can take an infinite number of values, both positive and negative, and with as fine a degree of precision as desired. most measurements in the physical sciences yield continuous variables.
Discrete Variables
Can only take on a finite number of distinct values. An example is the count of the number of events, such as the number of accidents per month. such counts cannot be negative, and only take on integer values, such as 1, 28, or 211.
Population
the totality of some entity. For example, the total number of planners preparing for the 2018 AICP exam would be a population.
Sample
a subset of the population. For example, 25 candidates selected at random out of the total number of planners preparing for the 2022 AICP exam.
Descriptive Statistics
describe the characteristics of the distribution of values in a population or in a sample. For example, a descriptive statistic such as the mean could be applied to the age distribution in the population of AICP exam takers, providing a summary measure of central tendency (eg, “on average, AICP test takers in 2022 are 30 years old”). The context will make clear whether the statistic pertains to the population (all values known), or to a sample (only partial observations). The latter is the typical case encountered in practice.
Inferential Statistics
use probability theory to determine characteristics of a population based on observations made on a sample from that population. We infer things about the population based on what is observed in the sample. For example, we could take a sample of 25 test takers and use their average age to say something about the mean age of all the test takers.
Distribution
the overall shape of all observed data. It can be listed as an ordered table, or graphically represented by a histogram or density plot. A histogram groups observations in bins represented as what’s commonly referred to as a bar chart. A density plot shows a smooth curve. The full distribution of data typically overwhelms, so characteristics are summarized by descriptive statistics. In addition to central tendency and dispersion, other characteristics are symmetry or lack thereof (skewness), and the presence of thick tails (kurtosis), i.e., a higher likelihood of extreme values.