Research and assessment methods Flashcards

1
Q

Mean (average)

A

sum values, then divide by count

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

```

median

A

middle number in ranked data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

mode

A

most frequent number or value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

variance

A

average squared deviation from the mean

  1. calculate mean
  2. calculate the squared deviation for each observation (observation - mean)^2
  3. sum squared deviations
  4. divide by count of observations
    note - if the observations are from a sample, rather than the whole population, in step 4, divide by one less than the count of observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

squared deviation

A

(observation - mean)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

standard deviation

A

square root of variance

sqrt(variance)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

coefficient of variation

A

standard deviation divided by the mean

standard deviation/mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

z-score

A
  • standardization of original variable
  • subtract mean and divide by standard deviation
  • mean of z-score is 0 and variance is 1
  • z-score greater than 2 indiciates observation is more than 2 standard deviation from the mean

z = (observation - mean)/standard devation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

interquartile range and fences

A
  • difference in value of 75th percentile and 25th percentile
  • fences = 1st quartile range minus 1.5x the interquartile range and 3rd quartile plus 1.5x the interquartile range
  • outliers are outside the fences

for example, in a set of 20 observations, subtract the 5th value from the 15th value to get the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

P-value, type 1 error

A
  • false positive
  • probability we reject the null hypothesis when it is actually correct
  • want 5% or 1% or smaller (0.05 or 0.01)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

t-test

A

compare means of two populations based on their sample averages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

ANOVA

A
  • analysis of variance
  • more compelx form of testing equality of means between groups
  • more than 2 groups
  • compare means of different groups
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

chi squared test

A
  • measures fit
  • tests relationship betwen z variables
  • observed proportions compared to what is expected if variables are independent
  • chi squared distribution: skewed, square of standard normal variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

correlation coefficient

A
  • measures strength of linear relationship of 2 variables
  • between -1 and 1
  • r-squared is square of correlation coefficient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

linear regression

A

hypothesizes relationship between a dependent variable and one or more explanatory variables

y = a +bx + e
y = dependent variable
x = independent variable
e = random error
a = intercept
b = slope coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what are the 3 measures of central tendency?

A

mean
median
mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what are the 3 measures of dispersion?

A

range
variance
standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Linear Method

Population Estimation

A
  • uses change in population over a period of time to determine change into the future in a linear fashion
  • example: population growth historically 1,000 people per year; assume future growth to be 1,000 people per year
  • results in a straight line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Exponential Method

Population Estimation

A
  • uses rate of population change to estimate current or future population
  • for example: growth historically at 2% per year; growth in the future will be 2% per year
  • results in a curved line
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Modified Exponential Method

Population Estimation

A
  • like exponential method, it uses rate of change in population historically to predict future population
  • assumes there is a cap to the change and at some point growth will slow or stop
  • results in an S-shaped curve
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Gompertz Projection

Population Estimation

A
  • variation of exponential and modified exponential methods of estimating population
  • growth is slowest at the beginning and speeds up over time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Symptomatic Method

Population Estimation

A
  • uses available data indirectly related to population size, such as housing starts, new drivers licenses, water taps, phone lines, voter registration, utility connections, etc.
  • population estimate based on data and the average houeshold size (or other relevant ratio)
  • for example: if 100 new single family building permits are issued in a year, and average household size is 2.5, estimate 250 new people in community.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Step-Down Ratio Method

Population Estimation

A
  • uses the ratio of population of a smaller geography to a larger geography, such as city to county, at a known time to estimate current or future population
  • example: city makes up 20% of population of county in 2000. If county population in 2005 is 20,000, then 20% of that is the estimated city population (4,000)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Distributed Housing Unit Method

Population Estimation

A
  • multiples number of housing units by occupancy rate and persons per household
  • reliable for slow growth or stable communities, less so for quickly changing communities
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Cohort Survival Method

Population Estimation

A
  • uses current population plus net natural increase (births minus deaths) plus net migration (in-migration minus out-migration) to calculate future population
  • calculated for men and women in specific age groups
  • uses specific time intervals - smallest interval is based on the time it takes for all members of a cohort to age to the next cohort (typically 5 years)

  • natural increase = children born minus deaths during the time interval
  • death rate = number of deaths per 1,000 people
  • crude birth rate = total number of births per 1,000 people
  • general fertility rate = number of births per 1,000 females of childbearing age
  • age-specific fertility rate = number of births per 1,000 females in a given age group
  • net migration = difference between number of people moving in and moving out
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Discrete variable

A
  • a numerical variable that can be counted, and comes in distinct values with nothing in between (ie. no fractions, certain increments, etc)
  • example: the number of accidents (come in increments of one)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Binary variable

dichotomous variable

A
  • only offers two choices
  • example: 1 or 0
28
Q

Continuous Variable

A
  • variables that can have any number of value, in any increment/fraction
  • example: temperature can be 51 degrees or 51.23 degrees
29
Q

Nominal data

A
  • mutually exclusive groups or categories
  • lack intrinsic order
  • examples: zoning districts, social security number, gender

for example, the labels do not matter, and do not imply an order or specifical numerical value

30
Q

Ordinal Data

A
  • ordered categories implying a ranking of observations
  • may be given numerical values, but the values themselves are meaningless, only the rank matters
  • examples: letter grades, suitability for development, response scales on a survey

for example, a rank of 2 versus 4 only implies that 2 is better/before 4, but not that 2 is half as much as 4.

31
Q

Interval Data

A
  • has an ordered relationship where the difference between the scales has a meaningful interpretation
  • example: temperature - the difference between 30 and 40 degrees is the same as the difference between 20 and 30 degrees, but 20 degrees is not twice as cold as 40 degrees
32
Q

Ratio Data

A
  • both absolute and relative differences have a meaning
  • for example: distance - the difference between 30 and 40 miles is the same as 20 to 30 miles AND 40 miles is twice as far as 20 miles
33
Q

Population versus sample

A
  • population = the entire group you want to draw conclusions about
  • sample = the specific group you will collect data from to inform conclusions about the entire group
34
Q

Do the American Community Survey (ACS) and the decennial census measure the entire population or a sample?

A
  • decennial sensus measures data about the entire population
  • ACS only measures a sample, a small percentage of the entire population
35
Q

Descriptive statistics

A
  • draw conclusions on data that has been observed
  • can be for a sample or a population
  • organized and presented as purely factual
  • examples: mean, median, mode, standard deviation, quantiles etc.
36
Q

Inferential Statistics

A
  • describes or predicts what has not been observed
  • when using a sample to generalize about the full population, or when you are trying to describe/predict behavior of a new population
  • present results in form of probabilities
  • draw conclusions beyond available data
  • examples: hypothesis testing, confidence intervals, regression, correlation
37
Q

Normal or Gaussian Distribution

bell curve

A
  • symmetric
  • spread around the mean can be related to the size of samples
  • 95% of observaions are within 2 standard deviations of the mean
38
Q

95% confidence interval

A

there is a 95% chance that, given your sample, the sample results are within two standard deviations of the actual number

39
Q

Margin of error

A
  • expresses the amount of random sampling error in the results of a survey
  • larger margin of error means less confidence
  • 2x the standard deviation
40
Q

Hypothesis test

A
  • null hypothesis and alternative hypothesis
  • goal is to reject the null hypothesis
41
Q

Economic Base Analysis

A
  • looks at basic and non-basic activities
  • exporting industries make up economic base of a region
  • calculate location quotient for each industry - less than 1 indicates importing economy, greater than 1 indicates exporting economy

  • basic industry= can be exported - make up economic base of a region
  • non-basic industry = locally-oriented, cannot be exported
  • location quotient = ratio of an industry’s share of local employment divided by its share of the nation (or other geography)
42
Q

Basic activity/industry

A
  • can be exported
  • make up economic base of a region
43
Q

Non-basic activity/industry

A
  • locally oriented
  • cannot be exported
44
Q

location quotient

A
  • ratio of an industry’s share of local employment divided by its share of the nation (or other geography)
  • less than 1 indicates importing economy
  • greater than 1 indicates exporting economy
45
Q

Shift-Share analysis

A
  • analyzes regional economy in comparison with national economy
  • determines what portion of local economic growth or decline can be attributed to national, industry-specific, or regional factors
  • Industrial mix effect, national growth effect, expected change, and regional competitive effect
  • actual change - expected change = competitive effect
46
Q

industrial mix effect

Shift Share Analysis

A
  • the number of jobs expected to be added or lost within an industry in the region based on the industry’s national growth or decline
  • (industry growth rate - national economy growth rate) X Number of regional industry jobs
47
Q

National Growth Effect

Shift Share Analysis

A
  • the number of jobs an industry is expected to gain or lose according to the nation’s job growth
  • national growth rate X number of regional industry jobs
48
Q

Input-Output analysis

A
  • determine the employment effect that a particular project has on a local economy
  • utilizes a series of multipliers to estimate employment, direct, indirect, and induced effects.
  • identify primary suppliers, intermediate suppliers, intermediate purchasers, and final purchasers
  • economy’s total output is equal to total production plus intermediate sales
  • three tables: transactions, direct requirements, and total requirements
  • requires a lot of data, costly

  • primary supplier = purchase inputs for the production final goods
  • intermediate suppliers = purchase inputs for the production of intermediate goods
  • intermediate purchaser = buy intermediate goods and use them for the production of final goods
  • final purchasers = purchase final goods for their own use, not production
49
Q

North American Industry Classification System

NAICS

A
  • standard used by Federal statistical agencies in classifying business establishments for the purpose of collecting, analyzing, and publishing statistical data about the U.S. economy
  • developed by the Office of Management and Budget and in 1997 it replaced the Standard Industrial Classification (SIC) system.
  • developed in partnership with Canada and Mexico
  • The first two digits designate the largest business sector, the third digit designates the subsector, the fourth digit designates the industry group, the fifth digit designates the NAICS industries, and the sixth digit designates the national industries.
50
Q

Survey

A
  • research method that allows one to collect data on a topic that cannot be directly observed, such as opinions on downtown retailing opportunities
  • typically taken of a sample of a population
51
Q

cross-sectional survey

A
  • gathers information about a population at a single point in time
52
Q

longitudinal survey

A
  • conducted over a period of time at specific time intervals
53
Q

Written surveys

A
  • mailed, printed, or administered in group setting
  • large/broad sample size
  • low-cost
  • low response rate - around 20%
  • requires literacy of respondents
54
Q

Group administered surveys

A
  • small sample size
  • high and quick response rate
  • challenge getting everyone together to complete

example: survey at the end of a workout class

55
Q

drop-off survey

A
  • survey his hand-delivered or dropped off at respondent’s residence or business
  • personal contact increases response rates (compared to typical mail surveys)
  • expensive - time and people to deliver surveys
  • smaller sample size than mail survey
56
Q

phone survey

A
  • best for yes/no questions; longer questions or multiple answers harder to administer
  • allow follow-up or further explanation on answers
  • response rate varies greatly, and declining with less land-line phones
  • more expensive than mail or online
  • can be biased from interaction with interviewer
57
Q

online survey

A
  • inexpensive and quick responses
  • higher response rate than written and interview surveys
  • will not reach people without internet access
58
Q

Probability sampling

A
  • there is a direct mathematical relation between the sample and the population so that precise conclusions can be drawn
  • examples: random, systematic, stratified, cluster samples
59
Q

Non-probability sampling

A
  • no precise connection between sample and population
  • results must be interpreted with caution since they are not neccesarily representative of the population
  • examples: convenience, snowball, or volunteer samples
60
Q

Random sample

A
  • everyone has the same chance of being selected to participate
  • best when little information about the data population, there are too many differences to divide into subsets, etc.
61
Q

systematic sampling

A
  • random sample with a fixed periodic interval is selected from a larger population

steps:
1. define your population
2. settle on a sample size
3. assign every member of the population a number
4. divide population by the desired sample size to determine sampling interval
5. choose a starting point
6. identify every nth member of the population (n being sampling interval) to be members of the sample

62
Q

Stratified Random Sampling

A
  • population is divided into separate groups or classes, from which a sample is drawn such that the classes in the population are represented by the classes in the sample.

  1. divide population into homogeneous groups called strata (age, income, etc)
  2. select random samples from each stratum in a number proportional to the stratum’s size compared to the population
63
Q

Cluster Sample

A
  • a form of stratified sampling where a specific target group out of the general population is sampled from, such as the elderly, or residents of a specific neighborhood
64
Q

Convenience Sample

A
  • sampling individuals readily available
  • non-probability sample, not necessarily representative of population
65
Q

Snowball sample

A
  • when one interviewed person suggests other potential interviewees
  • non-probability sample, not necessarily representative of population
66
Q

volunteer sample

A
  • sample consists of self-selected respondents

one specific example is volunteered geographic information (VGI) - when participants enter information on a web map