Vocab / Key Terminology Flashcards
Comparative Analysis
analyzing data from different settings or grounds at the same point in time OR same settings or groups over a period of time to find similarities/differences
Discourse Analysis
this is “theory” stuff: semiotics, deconstructions, narrative analysis, etc. Studying the way versions of the world (society, events, psyche) are produced in language and discourse within various forms of knowledge/power
Ethnography
about observing/interviewing people in their “naturally occurring settings” (researcher is present in these settings with subjects of the research)
Grounded Theory
“inductive” form of qualitative research → data collection + analysis are conducted together. You don’t go in with any preconceived hypothesis about the outcome, and are not concerned with validation or description. Instead, you allow the data you collect to guide your analysis and theory creation.
Narrative Analysis
qualitative research approach whereby the researcher analyzes the stories people create, to understand the meaning of events in a person’s life. Respondents give detailed accounts of their experiences and stories, rather than answer a predetermined list of questions.
Statistical Process
(1) Collect data; (2) Describe and summarize; (3) Interpret
Types of Measurement: Nominal Data
mutually exclusive groups or categories and lack intrinsic order; for example, zoning classifications or social security numbers
Types of Measurement: Ordinal Data
ordered categories implying a ranking of the observations; the values themselves are meaningless, only the rank counts; for example, letter grades or response scales on a survey
Types of Measurement: Interval Data
an ordered relationship where the difference between the scales has a meaningful interpretation; for example, temperature
Types of Measurement: Ratio Data
the gold standard for measurement; both absolute and relative differences have meaning; for example, distance
Types of Variables: Quantitative
represents an interval or ratio measurement
Types of Variables: Qualitative
represents a nominal or ordinal measurement
Types of Variables: Continuous
can take an infinite number of values, positive or negative, and with as much precision as desired
Types of Variables: Discrete
can take a finite number of distinct values
Types of Variables: Binary/Dichotomous
a special case of discrete variables; can only take on two values typically coded as 0 and 1
Statistical Concepts: Descriptive Statistics
describe the characteristics of the distribution of values in a population or a sample
Statistical Concepts: Inferential Statistics
use probability theory to determine the characteristics of a population based on observations made on a sample of the population
Distribution: Range
the difference between the largest and smallest value
Distribution: Symmetric
where an equal number of observations are below and above the mean
Distribution: Skew
an asymmetrical distribution where there are more observations either above or below the mean
Distribution: Normal/Gaussian
the gold standard in statistical analysis, the bell curve; symmetric distribution where the spread around the mean can be related to the proportion of observations
Basic Descriptive Statistics: Central tendency
a typical or representative value for the distribution of observed values
Mean
the average of a distribution; appropriate for interval and ratio scaled data not ordinal or nominal
Weighted mean
greater importance is placed on specific entries or when values are used for groups of observations
Population weighted mean
when computing the measure for a mean value among multiple countries, the value of each country would be multiplied by its population
Median
the middle value of a ranked distribution
Mode
the most frequent number in a distribution; there can be more than one
Basic Descriptive Statistics: Central tendency: Symmetry
mean and median are affected by the symmetry of the distribution; very close if symmetric; different if skewed
Dispersion
characterizes how values are spread around the central tendency
Variance
the average squared difference from the mean; large variance means a greater spread or flatter distribution; small variance means a narrower spread or a spikier distribution
Function - (value - mean)2 for each value and then average all of those values together
Standard deviation
the square root of the variance; in a normal distribution 95% of the values fall within 2 standard deviations of the mean; the symbol is a little o with a tail to the top right, σ
Degree of freedom correction
necessary for finding the variance and standard deviation of a sample group because a sample mean is estimated; when averaging the squared differences subtract one from the number of observations to divide the sum by
Outliers
in a normal distribution, values that fall outside of two standard deviations above or below the mean
Coefficient of variation
measures the relative dispersion from the mean by taking the standard deviation and dividing by the mean
Z-score
a standardization of the original value by subtracting the mean and dividing by the standard deviation; once all values are standardized, the mean of the group is 0 and the variance and standard deviation are 1; transforms all values into standard deviation units - example: a z-score of more than 2 would mean an observation is more than 2 standard deviations away from the mean, an outlier
Inter-quartile range (IQR)
an alternate measure of dispersion; the difference in value between the 75th percentile and the 25th percentile in a set of ranked values; forms the basis of an alternate concept of outliers
Inter-quartile range (IQR): Fences
two fences are the 25th percentile value minus 1.5 times the IQR and the 75th percentile value plus 1.5 times the IQR
Inter-quartile range (IQR): Box/Whisker plots
visualization summarizing a set of data; the shape of the boxplot shows how the data is distributed and any outliers; useful way to compare different sets of data as you can draw more than one boxplot per graph
Statistical Inference
the process of drawing conclusions about the characteristics of a distribution from a sample of data
Hypothesis test
finding evidence in the data to reject the null hypothesis statement in the direction of the alternative hypothesis; statistical evidence only provides support to reject the null hypothesis never to accept the alternative hypothesis
Null hypothesis
the point of departure or reference; typically consists of setting characteristics of the distribution, such as the mean, equal to a given value, often zero
Alternative hypothesis
the research hypothesis wanted to support rejecting the null hypothesis
Two-sided - differences in both directions are considered
One-sided - only differences in one direction are considered, i.e. only larger or smaller than, but not both
Test statistic
provides a way to operationalize a hypothesis test
Sampling error or Sampling distribution - the random variation caused because a sample does not contain all the information of the population therefore any statistic computed from the sample will not be identical to the population statistic
Systematic error
model misspecification which occurs because the model or assumptions are wrong
Standard error
essentially the same concept as standard deviation and computed in the same way, but pertains to the distribution of a statistic that is computed from a sample; for example, the sample average has a standard error which is the same as the standard deviation of its sampling distribution
Statistical decision
the rejection of a null hypothesis
Significance/P-value/Type I Error
the probability that the null hypothesis is rejected when in fact it is correct; ideally this probability is small, typically a significance of 5% or 1% is used as a benchmark
Confidence interval
a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%; instead of rejecting the null hypothesis; the range of the confidence interval depends on the sampling error, i.e. large sampling error means there isn’t much information in the sample relative to the population, so the statements about the population will be vague (large confidence interval)
Common Statistical Tests: T-test
an inferential statistic used to determine if there is a significant difference between the means of two groups and how they are related; used when a data set follows a normal distribution and has unknown variances; commonly used to test the significance of a regression coefficient (see below)
One sample - compares the sample average to a hypothesized value for the mean
Two-sample - used to compare the means of two populations based on their sample averages
Common Statistical Tests: Analysis of variance (ANOVA)
a more complex form of testing the equality of means between groups; typical application is in treatment effects analysis where the outcome of a variable is compared between a treatment group and a control group; for example, comparing the average speed of cars on a street before (control) and after (treatment) street calming infrastructure
Common Statistical Tests: F-test
a simple case of ANOVA; a statistical test used to compare the variances of two samples or the ratio of variances between multiple samples;
Common Statistical Tests: Chi Square test
a measure of fit; a test that assesses the difference between a sample distribution and a hypothesized distribution; to determine if a difference between observed data and expected data is due to chance, or if it is due to a relationship between the variables
Chi Square distribution - a skewed distribution that is obtained by taking the square of a standard normal variable
Bivariate Relationships: Correlation coefficient
measures the strength of a linear relationship between two variables; does not imply causation; computed by standardizing each of the variables and its value is between -1 and +1; the square of the correlation coefficient is often referred to as r-squared
Positive correlation - high values of one variable match high values of the other and low values match low values
Negative correlation - high values of one variable match low values of the other and vice versa
Bivariate Relationships: Linear regression
hypothesizes a linear relationship between a dependent variable and one or more explanatory variables; coefficients are estimated using least squares and their significance is interpreted by a t-test
Dependent variable - the variable trying to be explained or predicted
Explanatory variable - the variable used to explain or predict the dependent variable
y = a + b1x1 + b2x2 + e - typical regression equation; y is the dependent variable; x1 and x2 are the explanatory variables; e is a random error term since the variables observed are a sample from the population; a is the intercept; b1 and b2 are the slope coefficients
Least squares - a form of regression analysis used to determine the line of best fit for a set of data
TIGER
Topographically Integrated Geographic Encoding and Referencing map. Made by the census and includes streets, railroads, zip codes, and landmarks.
Light Direction and Ranging (LIDAR)
uses laser instead of radio waves from airplane to provide detailed topographic information.
Simulation Programs
UrbanSim
software that models planning and urban development; free and designed to be used by MPO’s
Simulation Programs
CommunityViz
ESRI software environment to analyze land use scenarios to create 3D images
Simulation Programs
Urban Footprint
uses a library of place types, block types, and building types to support interactive scenario building. Developed by Peter Calthrope & Associates
Survey
research method that allows one to collect data on a topic that cannot be directly observed, like opinions and characteristics!
Cross-Sectional Survey
gathers info on a population at a single point in time
Longitudinal Surveys
gathers info on a population over a period of time
Group-administered surveys
one of many ways of administering surveys (mail, phone, internet, etc.) this one is about having everyone together in a small group to complete the survey – like a survey at the end of a class, for example
Sampling Frame
a sample of a population used in a survey
Sample Design
attention to how representative a population sample is of the whole you’re trying to study – there are statistical concepts and sample size calculators to help with this
Probability Sampling
direct mathematical relationship between sample and population to draw precise conclusions (like an error rate of +/- 2%)
Random Sampling
everyone has same chance of being selected
Non-Probability Sampling
no precise connection between sample and population, results must be interpreted with caution!
Systematic/Stratified/Cluster Sampling
where special groups are targeted. In a Stratified sample, the population is divided into groups/classes, and representative samples drawn from each. A Cluster sample is where a specific target group is sampled from, such as elderly or people in a specific neighborhood.
Convenience Sample
go for individuals that are readily available
Snowball Sample
one interviewed person suggests other potential interviewees
Volunteer Sample
self-selected respondents (ex. volunteered geographic information (VGI), when participants enter information on a web map)
Decennial Census Trends: 2020
24th US Census; first time administering census online; population grew, significant increase in hispanic and asian populations; urbanization continues
Decennial Census Trends: 2010
discontinued the long form Census
US Pop grew around 10%, 308M people (slower rate than 2000’s); people moving to cities/suburbs; increase in hispanic, asian, and mixed-race populations; aging population impacts healthcare, housing and social services; sun belt states experienced rapid population growth
Decennial Census Trends: 2000
overall population growth, increased diversity (primarily Hispanic and Asian populations); urbanization continues; aging pop (boomer generation); rapid growth in south and west, decline along the rustbelt
Urban Area
new term in 2020 census; are with at least 2,000 housing units or a population of at least 5,000.
Urban Cluster
previous term for “urban area”; had 2,500 - 50,000 people with 1,000 people per square mile density
Metropolitan Statistical Area (MSA)
city with 50,000 or more inhabitants, total metropolitan population of at least 100,000
Micropolitan Statistical Area
Population between 10,000 - 50,000 people.
Census Designated Place (CDP)
equivalent of an incorporated place; used for settled concentrations of populations that are not incorporated.
Consolidated MSA
several PMSA’s; e.g. Dallas-fort Worth CMSA (Dallas and Fort Worth are their own primary MSA’s)
Core Based Statistical Area (CSSA)
defined by Office of Budget to provide data description for areas where there is a core area with at least 10,000 people
Megalopolis
any many-centered, multi-city, urban area with more than 10 million inhabitants, generally low-density settlement and complex networks of economic specialization; 1961 book by Jean Gottman about 300 miles between Boston and Washington DC
Census Tract
smallest area where all information is released; typ population between 2,000 - 8,000
Census Block
smallest level of data collected for Census; typ 400 housing units/block
Census Block Group
group of census blocks; generally contains 600-3,000, used to present data and control block numbering.
Minor Civil Division
unit only used in 29 states, usually corresponds to a municipality
Census County Divisions
used in the 21 states that do not have Minor Civil Division
Tribal Designated Statistical Area
unit drawn by tribes that do not have recognized land area; defined independently of the standard country-based census delineation
Threshold Population
government term to help determine program eligibility (i.e. threshold pop to quality to receive Block Grant funds)
Population Trends – 2010 - 2020
Texas experienced largest numeric increased, followed by Florida, California, Georgia, and Washington
American Community Survey (ACS)
smaller sample of the population (vs. decennial census) and projects findings to the whole population. Began nationwide in 2005 and reaches 2.5% of the population each year (1 in 40 addresses). Confidentiality of respondents is released after 72 years.
Population Groups
Generation Z
1997 - 2012
Population Groups
Millennials
1981 - 1996
Population Groups
Generation X
1965 - 1980 – period of low birthrates
Population Groups
Baby Boomers
1946 - 1964
Population Groups
Silent Generation
1928 - 1945
Population Groups
Greatest Generation
1901 - 1927
Population Groups
Lost Generation
1883 - 1900
Population Estimation: Linear Method
uses the change in population over time to extrapolate that change into the future in a linear fashion
Population Estimation: Exponential and Modified Exponential Method
the rate of growth or decline in a population over time to estimate the current or future population; the result is a curved line; a modified projection assumes there is a cap to the change and growth with slow or stop at some point
Population Estimation: Symptomatic Method
uses any available data indirectly related to population size to estimate the population using a ratio; for example, average household size at 2.5 and data on 100 new single-family building permits issued that year, would yield an estimate of 250 new people added to the population
Population Estimation: Step-Down Ratio Method
uses the ratio of the population in a city and a county or a larger geographical unit at a known point in time; for example, the population of a city is 20% of the county population in 2000, and if we know the county population is 20,000 in 2005 then we estimate the city population to be 4,000 (20% of 20,000)
Population Estimation: Distributed Housing Unit Method
multiplies Census Bureau data for the number of housing units by the occupancy rate and persons per household; reliable for slow growth or stable communities
Population Estimation: Cohort Survival Method
uses the current population plus natural increase (birth/death rates) and net migration (in-migration vs. out-migration) to calculate the future population; calculated for men and women in specific age groups
Population Pyramid
a graphic representation with male age cohorts on one side and female age cohorts on the other; the bottom is the “birth cohort” or youngest and the number of people in each group typically declines with age
Natural Increase
the difference between the number of children born and the number of people who die in the one-time interval
Death Rate - number of deaths per 1,000 people
Crude Birth Rate - number of births per 1,000 people
General Fertility Rate - the number of babies born per 1,000 females of child bearing age
Age-Specific Fertility Rate - the number of babies born per 1,000 females in a given age group
Net Migration
the difference between the number of people moving in and the number of people moving out
Economic Base Analysis
Separate the Economy into Basic (export, brings in money from the outside) and Non-Basic (local/service, recirculates the outside money)
Total = basic + non-basic
Economic Base Multiplier
Multiplier = total / basic
The indirect effect of $1 additional basic (direct) activity on the economy = Multiplier - 1
Location Quotient
relative share of sector in region compared to a relative share of sector across the nation, based on employment figures, identifies the “export” activities or activities where the region has more jobs in the sector than would be expected
LQi = (Locali/Local)/(Nationali/National)
LQi > 1, i is an export/basic sector (‘‘strong’’)
LQi < 1, i is a local/non-basic sector (‘‘weak’’)
Shift-share Analysis