Qual/Quant Analysis Flashcards
3 Steps in the Statistical Process
1) Collect Data
2) Describe & Summarize the Distribution
3) Interpret - draw general conclusion for the pop on the basis of the sample
Nominal Data
Mutually Exclusive groups, lack intrinsic order.
Zoning classification, social security numbers, sex.
Ordinal Data
Ordered implying a ranking of observations. Values are meaningless - rank is important.
Letter grades, response scales on a survey 1-5, suitability for development
Interval data
Data with ordered relationship where the difference between scales has meaning.
Temperature. Diff between 40 and 30 degrees is the same as 30 and 20 but 20 degrees is not twice as cold as 40 degrees.
Ratio Data
Gold standard of measurement. Absolute and relative difference have meaning.
Distance measurement. 40 - 30 miles is the same difference as 30-20 miles and 40 miles is twice as far as 20 miles.
Quantitative Variables
Variables where numerical value is meaningful.
Interval or ratio measurement.
HH income, level of pollution in river
Qualitative Variables
Variables where numerical value is not meaningful.
Nominal/Ordinal measurement.
Zoning classification.
Continuous Variables
Infinite number of values.
Positive & negative.
Most measurements in physical sciences yield continuous variables.
Discrete variables
Finite number of distinct values.
Accidents per month - can’t be negative.
Binary/dichotomous variables
Special case of discrete variables which can only take on two values - 0/1 typically.
Descriptive variables
Describe the characteristics of the distribution of values in a population or sample.
Ex: on average, AICP test takers in 2018 are 30 years old
Inferential Statistics
use probability to determine characteristics of a pop based on a sample.
Distribution
the overall shape of observed data.
Ordered table, or histogram, or density plot
Normal or Gaussian Distribution
the bell curve.
Distribution is symmetric. The spread around the mean can be related to the proportion of observations.
More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean
Symmetric distribution
equal number of observations are below and above the mean
Central tendency
Typical or representative value for the distribution of observed values
Coefficient of Variation
the relative dispersion from the mean by taking the standard deviation and dividing by the mean.
z-score
This is a standardization of the original variable by subtracting the mean and dividing by the standard deviation.
The z-score in effect transforms the original measure into standard deviation units.
inter-quartile range or IQR.
Alternative measure of dispersion.
Breaks things into quartiles.
This is visualized in a box plot (also called box and whiskers plot).
confidence interval.
this constitutes a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%.