Research and assessment methods Flashcards
Mean (average)
sum values, then divide by count
```
median
middle number in ranked data
mode
most frequent number or value
variance
average squared deviation from the mean
- calculate mean
- calculate the squared deviation for each observation (observation - mean)^2
- sum squared deviations
- divide by count of observations
note - if the observations are from a sample, rather than the whole population, in step 4, divide by one less than the count of observations
squared deviation
(observation - mean)^2
standard deviation
square root of variance
sqrt(variance)
coefficient of variation
standard deviation divided by the mean
standard deviation/mean
z-score
- standardization of original variable
- subtract mean and divide by standard deviation
- mean of z-score is 0 and variance is 1
- z-score greater than 2 indiciates observation is more than 2 standard deviation from the mean
z = (observation - mean)/standard devation
interquartile range and fences
- difference in value of 75th percentile and 25th percentile
- fences = 1st quartile range minus 1.5x the interquartile range and 3rd quartile plus 1.5x the interquartile range
- outliers are outside the fences
for example, in a set of 20 observations, subtract the 5th value from the 15th value to get the interquartile range
P-value, type 1 error
- false positive
- probability we reject the null hypothesis when it is actually correct
- want 5% or 1% or smaller (0.05 or 0.01)
t-test
compare means of two populations based on their sample averages
ANOVA
- analysis of variance
- more compelx form of testing equality of means between groups
- more than 2 groups
- compare means of different groups
chi squared test
- measures fit
- tests relationship betwen z variables
- observed proportions compared to what is expected if variables are independent
- chi squared distribution: skewed, square of standard normal variable
correlation coefficient
- measures strength of linear relationship of 2 variables
- between -1 and 1
- r-squared is square of correlation coefficient
linear regression
hypothesizes relationship between a dependent variable and one or more explanatory variables
y = a +bx + e
y = dependent variable
x = independent variable
e = random error
a = intercept
b = slope coefficient
what are the 3 measures of central tendency?
mean
median
mode
what are the 3 measures of dispersion?
range
variance
standard deviation
Linear Method
Population Estimation
- uses change in population over a period of time to determine change into the future in a linear fashion
- example: population growth historically 1,000 people per year; assume future growth to be 1,000 people per year
- results in a straight line
Exponential Method
Population Estimation
- uses rate of population change to estimate current or future population
- for example: growth historically at 2% per year; growth in the future will be 2% per year
- results in a curved line
Modified Exponential Method
Population Estimation
- like exponential method, it uses rate of change in population historically to predict future population
- assumes there is a cap to the change and at some point growth will slow or stop
- results in an S-shaped curve
Gompertz Projection
Population Estimation
- variation of exponential and modified exponential methods of estimating population
- growth is slowest at the beginning and speeds up over time
Symptomatic Method
Population Estimation
- uses available data indirectly related to population size, such as housing starts, new drivers licenses, water taps, phone lines, voter registration, utility connections, etc.
- population estimate based on data and the average houeshold size (or other relevant ratio)
- for example: if 100 new single family building permits are issued in a year, and average household size is 2.5, estimate 250 new people in community.
Step-Down Ratio Method
Population Estimation
- uses the ratio of population of a smaller geography to a larger geography, such as city to county, at a known time to estimate current or future population
- example: city makes up 20% of population of county in 2000. If county population in 2005 is 20,000, then 20% of that is the estimated city population (4,000)
Distributed Housing Unit Method
Population Estimation
- multiples number of housing units by occupancy rate and persons per household
- reliable for slow growth or stable communities, less so for quickly changing communities
Cohort Survival Method
Population Estimation
- uses current population plus net natural increase (births minus deaths) plus net migration (in-migration minus out-migration) to calculate future population
- calculated for men and women in specific age groups
- uses specific time intervals - smallest interval is based on the time it takes for all members of a cohort to age to the next cohort (typically 5 years)
- natural increase = children born minus deaths during the time interval
- death rate = number of deaths per 1,000 people
- crude birth rate = total number of births per 1,000 people
- general fertility rate = number of births per 1,000 females of childbearing age
- age-specific fertility rate = number of births per 1,000 females in a given age group
- net migration = difference between number of people moving in and moving out
Discrete variable
- a numerical variable that can be counted, and comes in distinct values with nothing in between (ie. no fractions, certain increments, etc)
- example: the number of accidents (come in increments of one)