Research and Assessment Methods Flashcards

1
Q

Qualitative research

A
  • An approach for understanding the meaning individuals and groups ascribe to a human or social problem
  • Emerging questions
  • Flexible written report
  • Analysis building from particular data to general themes (inductive)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative research

A
  • An approach for testing objective theories by examining the relationships among variables (deductive)
  • Numbered data which can be analyzed using statistical procedures
  • Structured written report
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Mixed methods research

A
  • Collection of both qualitative and quantitative data
  • Integrating the two forms of data
  • May involve both philosophical assumptions and theoretical frameworks
  • Assumes a more complete understanding of a research problem than using one of the approaches alone
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Case Study Method

A

A research method focusing on the study of a single case. Usually it is not designed to compare one individual or group to another, although sometimes a case study may be included in comparative analysis as a key or illustrative example.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Comparative analysis

A

Analysis where data from different settings or groups at the same point in time or from the same settings or groups over a period of time are analyzed to identify similarities and differences.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Discourse Analysis

A

A study of the way versions or the world, society, events and psyche are produced in the use of language and discourse. It is often concerned with the construction of subjects within various forms of knowledge/power. Semiotics, deconstruction and narrative analysis are forms of discourse analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

e-Research

A

Also known as e-Science or e-Social Science, it is the harnessing of any digital technology to undertake and promote social research. This includes treating the digital sphere as a site of research by examining social interaction in the e-infrastructure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Ethnography

A

A multi-method qualitative (participant observation, interviewing, discourse analyses of natural language, and personal documents) approach that studies people in their “…naturally occuring settings or ‘fields’ by means of methods which capture their social meanings and ordinary activities, involving the researcher participating directly in the setting…”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Field Research

A

Field research is when a researcher goes to observe an everyday event in the environment where it occurs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Grounded theory

A

An inductive form of qualitative research where data collection and analysis are conducted together. Theories remain grounded in the observations rather than generated in the abstract. Grounded theory is an approach that develops the theory from the data collected, rather than applying a theory to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Narrative analysis

A

Narrative analysis is a form of discourse analysis that seeks to study the textual devices at work in the constructions of process or sequence within a text.

In narrative research the respondent gives a detailed account of themselves and is encouraged to tell their story rather than answer a predetermined list of questions. This method is more successful when people are discussing a life changing event.

Analysis of the narrative tells the researcher about the person’s understanding of the meaning of events in their lives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the three important steps in the statistical process?

A

(1) collect data (e.g., surveys), covered in Lesson 2; (2) describe and summarize the distribution of the values in the data set; (3) interpret by means of inferential statistics and statistical modeling, i.e., draw general conclusions for the population on the basis of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 4 different types of measurement?

A

Nominal data
Ordinal data
Interval data
Ratio data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nominal data

A

are classified into mutually exclusive groups or categories and lack intrinsic order. A zoning classification, social security number, and sex are examples of nominal data. The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ordinal data

A

are ordered categories implying a ranking of the observations. Even though ordinal data may be given numerical values, such as 1, 2, 3, 4, the values themselves are meaningless, only the rank counts. So, even though one might be tempted to infer that 4 is twice 2, this is not correct. Examples of ordinal data are letter grades, suitability for development, and response scales on a survey (e.g., 1 through 5).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Interval data

A

is data that has an ordered relationship where the difference between the scales has a meaningful interpretation. The typical example of interval data is temperature, where the difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Ratio data

A

is the gold standard of measurement, where both absolute and relative differences have a meaning. The classic example of ratio data is a distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition, 40 miles is twice as far as 20 miles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quantitative variables

A

the actual numerical value is meaningful

represent an interval or ratio measurement

(e.g., household income, level of a pollutant in a river)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Qualitative variables

A

numerical value is not meaningful

correspond to nominal or ordinal measurement

(e.g., a zoning classification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Continuous variables

A

can take an infinite number of values, both positive and negative, and with as fine a degree of precision as desired. Most measurements in the physical sciences yield continuous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Discrete variables

A

can only take on a finite number of distinct values. An example is the count of the number of events, such as the number of accidents per month. Such counts cannot be negative, and only take on integer values, such as 1, 28, or 211. A special case of discrete variables is binary or dichotomous variables, which can only take on two values, typically coded as 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Binary variables

A

dichotomous variables, which can only take on two values, typically coded as 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

population

A

is the totality of some entity. For example, the total number of planners preparing for the 2018 AICP exam would be a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

sample

A

is a subset of the population. For example, 25 candidates selected at random out of the total number of planners preparing for the 2018 AICP exam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Descriptive Statistics

A

describe the characteristics of the distribution of values in a population or in a sample. For example, a descriptive statistic such as the mean could be applied to the age distribution in the population of AICP exam takers, providing a summary measure of central tendency (e.g., “on average, AICP test takers in 2018 are 30 years old”). The context will make clear whether the statistic pertains to the population (all values known), or to a sample (only partial observations). The latter is the typical case encountered in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Inferential Statistics

A

use probability theory to determine characteristics of a population based on observations made on a sample from that population. We infer things about the population based on what is observed in the sample. For example, we could take a sample of 25 test takers and use their average age to say something about the mean age of all the test takers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Distribution

A

is the overall shape of all observed data. It can be listed as an ordered table, or graphically represented by a histogram or density plot. A histogram groups observations in bins represented as a bar chart. A density plot is a smooth curve. The full distribution is typically too overwhelming so that its characteristics are summarized by descriptive statistics.

In addition to central tendency and dispersion, other characteristics are symmetry or lack thereof (skewness), and the presence of thick tails (kurtosis), i.e., a higher likelihood of extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

skewness

A

lack of symmetry in dispersion of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

kurtosis

A

the presence of thick tails in dispersion of data, i.e., a higher likelihood of extreme values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

range

A

An important characteristic of the distribution is the range of the data, i.e., the difference between the largest and the smallest value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Gaussian distribution

A

Normal or Gaussian distribution, also referred to as the bell curve. This distribution is symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean (see below, for further discussion). The normal distribution is often used as the reference distribution for statistical inference (see below).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Normal distribution

A

Normal or Gaussian distribution, also referred to as the bell curve. This distribution is symmetric and has the additional property that the spread around the mean can be related to the proportion of observations. More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean . The normal distribution is often used as the reference distribution for statistical inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Symmetric distribution

A

is one where an equal number of observations are below and above the mean (e.g., this is the case for the normal distribution).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Asymmetric distribution

A

where there are either more observations below the mean or more above the mean is also called skewed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

skewed to the right

A

when the bulk of the values are above the mean. This tends to happen when the distribution is dominated by a few very large values (outliers). For example, this is often the case for housing values in a community, where a few multi-million dollar homes can pull the distribution to the right.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

skewed to the left

A

is the opposite phenomenon, where small values (such as zero) pull the distribution to the left.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Central tendency

A

is a typical or representative value for the distribution of observed values. There are several ways to measure central tendency, including mean, median, and mode. The central tendency can be applied to the population as a whole, or to a sample from the population. In a descriptive sense, it can be applied to any collection of data. Typically, the terminology will make clear what the context is, i.e., a population mean or a sample average (mean).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

mean

A

is the average of a distribution. It is computed by adding up the values and dividing by the number of observations. For example, the mean of [2, 3, 4, 5] is (2 + 3 + 4 + 5)/4, or 14/4 = 3.5. weighted mean is when there is a greater importance placed on specific entries or when representative values are used for groups of observations. For example, when computing a measure for the mean income among a number of counties, the value for each county could be multiplied by the number of people of the county, yielding a population-weighted mean. The mean is appropriate for interval and ratio scaled data, but not for ordinal or nominal data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

median

A

is the middle value of a ranked distribution. The median of [2, 3, 4, 6, 7] is 4. When the number of observations is even, there is no exact middle, and typically the average of the two values just below and just above the middle is used. So, for [2, 3, 4, 5] the median would be (3 + 4)/2, or 3.5. The median is the only suitable measure of central tendency for ordinal data, but it can also be applied to interval and ratio scale data after they are converted to ranked values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

mode

A

is the most frequent number in a distribution. There can be more than one mode for distribution. For example, the modes of [1, 2, 3, 3, 5, 6, 7, 7] are 3 and 7. The mode is the only measure of central tendency that can be used for nominal data, but it can also be applied to interval and ratio scale data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

symmetry

A

The mean and median are affected by the symmetry of a distribution. For a symmetric distribution, they tend to be very close, but for skewed distributions, they tend to be different. Specifically, for a distribution that is skewed to the right (more large values), the mean will tend to be larger than the median, and in a distribution that is skewed to the left (more small values), the mean will tend to be smaller than the median. In both these cases, the median is typically the preferred measure of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Basic Descriptive Statistics - Dispersion

A

An important characteristic of distribution is how its values are spread around the central tendency.

The two most commonly used measures to assess this are the variance and the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

standard deviation

A

based on the squared difference from the mean.

the standard deviation is the square root of the variance. As a result, the standard deviation is in the same units as the original variable and is therefore often preferred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

variance

A

based on the squared difference from the mean

The variance is the average squared deviation from the mean. A larger variance means a greater spread
around the mean (flatter distribution), a smaller variance a narrower spread (a spikier distribution).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

calculating variance and standard deviation for [1, 2, 3, 4, 5]

A
  1. first calculate the mean: (1 + 2 + 3 + 4 + 5)/5 = 15/5 = 3
  2. the squared deviation for each observation is (1 - 3)^2 = 4, (2 - 3)^2 = 1, (3 -3)^2 = 0, (4 -3)^2 = 1, and (5 - 3)^2 = 4.
  3. The sum of these squared deviations is 4 + 1+ 0 + 1 + 4 = 10.
  4. The variance is this value divided by the number of observations, or 10/5 = 2.
  5. The standard deviation is the square root of the variance or √ 2= 1.41…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

n

A

the number of observations in statistics

This is the correct expression for the population variance (or standard deviation), where the mean is assumed to be known.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

degree of freedom correction

A

in practice, we work with samples, where the mean is estimated and not known. Because we have to compute the mean first, we subtract 1 from the number of observations and divide by n - 1.

In essence, because we already used the data once to compute the mean, we have to correct for that when we compute an estimate for the variance. As a result, the variance calculated with a degree of freedom correction n - 1 will be slightly larger than the one that uses n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

outliers

A

in a normal or Gaussian distribution, 95% of the distribution is within two standard deviations below and above the mean. In practice, therefore, observations that lay outside this range are often referred to as outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Coefficient of Variation

A

measures the relative dispersion from the mean by taking the standard deviation and dividing by the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Which type of variables can be used with variance, standard deviation, and coefficient of variation calculations?

A

interval and ratio scaled variables

NOT ordinal or nominal variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

z-score

A

This is a standardization of the original variable by subtracting the mean and dividing by the standard deviation. As a result, the mean of the z-score is 0 and the variance (and standard deviation) is 1. The z-score in effect transforms the original measure into standard deviation units. For example, a z-score of more than 2 would mean the observation is more than two standard deviations away from the mean, or, it is an outlier in the sense just defined.

52
Q

inter-quartile range (IQR)

A

the difference in value between the 75 percentile and the 25 percentile, i.e., the 1/4 cut-off value and 3/4 cut-off value in a set of ranked values.

For example, if we have 20 observations ranked in increasing order, we take the fifth and fifteenth observation and compute the difference between those values. This is the inter-quartile range.

The IQR forms the basis for an alternative concept of outliers. Two fences are computed as the first quartile less 1.5 times the IQR and the third quartile plus 1.5 times the IQR. Observations that are outside these fences are termed outliers. This is visualized in a box plot (also called box and whiskers plot).

53
Q

box plot

A

(also called box and whiskers plot) is used to visualize inter-quartile range (IQR)

iqr is the difference in value between the 75 percentile and the 25 percentile, i.e., the 1/4 cut-off value and 3/4 cut-off value in a set of ranked values.

Two fences are computed as the first quartile less 1.5 times the IQR and the third quartile plus 1.5 times the IQR. Observations that are outside these fences are termed outliers.

54
Q

Statistical inference

A

the process of drawing conclusions about the characteristics of a distribution from a sample of data.

For example, we estimate the mean from a sample of data and make a statement about the value of the population mean.

55
Q

hypothesis test

A

a statement about a particular characteristic of a population (or several populations). We distinguish between the null hypothesis (H0), i.e., the point of departure or reference, and the alternative hypothesis (H1), or the research hypothesis one wants to find support for by rejecting the null hypothesis.

  1. set up a condition that is used as a reference but is not that useful in and of itself. Typically, this consists of setting a characteristic of the distribution (such as the mean) equal to a given value (often zero).
  2. A hypothesis test then consists of finding evidence in the data that rejects this statement in the direction of the alternative (typically, an inequality).

*The statistical evidence only provides support to reject the null hypothesis, never to accept the alternative hypothesis (the latter is just used as a means to help in rejecting the null). An alternative hypothesis can be two-sided (differences in both directions are considered), or one-sided (only differences in one direction are considered, i.e., only larger than or smaller than, but not both).

For example, a common research question is whether the mean of the two populations is the same, e.g., the mean wage between male and female workers in the same occupation. We may suspect wage inequality, in that the male wage would be higher than the female wage. One population is the male workers, the other the female workers. We take a sample with an equal number of each category and compute the average wage. The null hypothesis would be that the mean wages are equal. A one-sided alternative hypothesis would be that the male wage is higher. In contrast, a two-sided alternative hypothesis would simply state that the wages are different, but say nothing about the direction of the difference.

56
Q

null hypothesis

A

(H0), i.e., the point of departure or reference

57
Q

alternative hypothesis (H1)

A

the research hypothesis one wants to find support for by rejecting the null hypothesis.

An alternative hypothesis can be two-sided (differences in both directions are considered), or one-sided (only differences in one direction are considered, i.e., only larger than or smaller than, but not both).

58
Q

test statistic

A

provides a way to operationalize a hypothesis test. A key concept related to this is the sampling error, which provides the connection between the sample and the population. Because a sample does not contain all the information in the population, any statistic computed from the sample will not be identical to the population statistic, but show variation. That random variation is the sampling error or sampling distribution. The sampling error, which is random, should be distinguished from a systematic error or model misspecification, which occurs because our model (or assumptions) are wrong. It is unrelated to the sample as such.

59
Q

sampling error

A

which provides the connection between the sample and the population. Because a sample does not contain all the information in the population, any statistic computed from the sample will not be identical to the population statistic, but show variation. That random variation is the sampling error or sampling distribution.

60
Q

systematic error

A

model misspecification, which occurs because our model (or assumptions) are wrong. It is unrelated to the sample as such.

61
Q

standard error

A

essentially the same concept (and are computed in the same way) as standard deviation, but the standard error pertains to the distribution of a statistic that is computed from a sample. For example, the sample average has a standard error, which is the same as the standard deviation of its sampling distribution.

62
Q

statistical decision

A

The rejection of a null hypothesis is a statistical decision. Because the values we use are not fixed, but random (they have a sampling distribution), this decision is not exact but done with uncertainty. In other words, we could be wrong.

63
Q

p-value

A

the significance or p-value of a test, also called Type I Error.

This is the probability that we reject the null hypothesis when in fact it is correct. So, in the example of wages and genders, this would be the probability that we conclude that the average wages in the two groups are unequal when in fact they are equal. We want this probability to be small, so typically a significance of 5% or 1% is chosen as a benchmark.

64
Q

confidence interval

A

this constitutes a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%.

Instead of rejecting the null hypothesis with a given probability, we establish a range around the sample statistic, such as a sample average, that contains the population mean with a given probability.

The range of the confidence interval depends critically on the sampling error. If the sampling error is large, this means there isn’t much information in the sample relative to the population, so our statements about the latter will be vague (large confidence interval).

On the other hand, with a smaller sampling error, we can make more precise statements. The sampling error is related to the sample size, with a larger sample resulting in a smaller error (as the sample grows larger, it approximates the actual population more closely).

65
Q

t-test

A

(also known as Student’s t-test) is typically used to compare the means of two populations based on their sample averages.

This is a so-called two-sample t-test (a one sample t-test compares the sample average to a hypothesized value for the mean).

The null hypothesis is that the two population means are equal. However, since we do not observe the actual means, but only the sample averages, we can only make a probabilistic statement about the equality.

Each of the sample averages has its only sampling distribution. By comparing the two sampling distributions, we can make statements about the null hypothesis. Under the null hypothesis, the test statistic follows the Student’s t distribution (similar to the normal distribution, but with thicker tails).

The implementation of the t-test is slightly different between the one sample case, and for the two-sample case, between assumptions of equal variance or unequal variance.

A common application of the t-test is to test the significance of a regression coefficient (see below). The null hypothesis is that the population regression coefficient is zero and the alternative coefficient is that it is non-zero. Rejecting the null hypothesis is interpreted as designating the coefficient as significant (at a given p-value). To compute the t-test in this case, we take the estimate and divide it by its standard error.

66
Q

test the significance of a regression coefficient

A

A common application of the t-test is to test the significance of a regression coefficient. The null hypothesis is that the population regression coefficient is zero and the alternative coefficient is that it is non-zero. Rejecting the null hypothesis is interpreted as designating the coefficient as significant (at a given p-value). To compute the t-test in this case, we take the estimate and divide it by its standard error.

67
Q

ANOVA

A

analysis of variance is a more complex form of testing the equality of means between groups. The typical application is in a so-called treatment effects analysis where the outcome of a variable is compared between a treatment group and a control group (in medical experiments, this would be the placebo group). For example, we would compare the average speed of cars on a street before (control) and after a street calming infrastructure was put in place (treatment). It is thus similar to the case considered in a t-test, but it allows more complex categorization of the groups. Typically, one classifies the sample into several groups according to categorical variables and compares the mean outcome on a continuous outcome variable. An F-test is a simple case of ANOVA, a slight generalization of the t-test (allowing different variances in two groups).

68
Q

F-test

A

a simple case of ANOVA, a slight generalization of the t-test (allowing different variances in two groups).

69
Q

Chi Square test

A

is a measure of fit. It is a test that assesses the difference between a sample distribution and a hypothesized distribution. A Chi Square test is often used to test the null hypothesis of independence in a contingency table, i.e. when the observations are grouped according to two categorical variables. The observed proportions are compared to the proportions we would expect if the two classifications were independent.

70
Q

Chi Square distribution

A

a skewed distribution that is obtained by taking the square of a standard normal variable (so, it only takes positive values). Under the null, the Chi Square test follows a Chi Square distribution.

71
Q

correlation coefficient

A

measures the strength of a linear relationship between two variables. Note that, very importantly, this does not imply anything about causation, i.e., whether one variable influences the other. Also, the correlation coefficient only pertains to a linear relationship and can be misleading when the relationship is nonlinear. The correlation is computed by standardizing each of the variables and its value is between -1 and +1. The square of a correlation coefficient is often referred to as r2 (or R2), i.e., r-squared.

The correlation coefficient is typically used as a descriptive statistic, but it can also be construed as a hypothesis test. The null hypothesis of no correlation corresponds to a value of 0. A significant difference from 0 would suggest a linear relationship. A typical one-sided alternative hypothesis would be positive correlation (high values of one variable match high values of the other, and low values match low values), or negative correlation (high values of one variable match low values of the other, and vice versa). In practice, using the correlation as a hypothesis test is not that useful, since it almost always rejects the null hypothesis.

72
Q

Bivariate Relationships

A

In practice, we are often interested in assessing whether two variables are related to each other, for example, whether health outcomes and environmental indicators are related.

73
Q

linear regression

A

A slightly more general consideration of the linear relation between two or more variables is a linear regression. This hypothesizes a linear relationship between a dependent variable (on the left-hand side of the equal sign) and one or more explanatory variables (on the right-hand side of the equal sign). For example, a typical regression equation would be expressed as y = a + b1x1 + b2x2 + e. In this expression, y is the dependent variable, say the outcome on the AICP test, and x1 and x2 are explanatory variables, such as the number of hours studied and the years of experience. The e stands for a random error term, since the variables observed are a sample from the population. The coefficient is the intercept, and b1 and b2 are the slope coefficients. The coefficients of the linear regression are estimated by means of least squares, and their significance interpreted by means of a t-test.

74
Q

dependent variable

A

(on the left-hand side of the equal sign)

75
Q

explanatory variables

A

(on the right-hand side of the equal sign)

76
Q

the four major population estimation and projection methods

A

Linear
Symptomatic
Step-Down Ratio
Cohort Survival

77
Q

Linear Method

A

The linear method uses the change in population (increase or decline) over a period of time and extrapolates this change to the future, in a linear fashion. For example, if the population of Plannersville has grown an average of 1000 people per year over the last 20 years, it would be assumed to grow by 1000 people annually in the future.

78
Q

Exponential and Modified Exponential Method

A

The exponential method uses the rate of growth (or decline), i.e., the percentage change in population over a period of time to estimate the current or future population. In our same Plannersville example, say the population has been increasing by 2% per year for the last 20 years. This percentage change is extrapolated into the future. Two percent of 2,000 people is larger than 2% of 1,000 people. The result is a curved line.

A modified exponential projection assumes there is a cap to the change and that at some point the growth will slow or stop, resulting in an S-shaped curved line. The Gompertz Projection is a further modification of the modified exponential, where the growth is slowest at the beginning and speeds up over time.

79
Q

Symptomatic Method

A

The symptomatic method uses any available data indirectly related to population size, such as housing starts, or new drivers licenses. It then estimates the population using a ratio, such as the average household size (from the U.S. Census). For instance, with the average household size at 2.5, data on 100 new single-family building permits that are issued this year, would yield an estimate of 250 new people will be added to the community.

Other sources of data for estimating population can include water taps, phone lines, voter registration, and utility connections.

80
Q

Step-Down Ratio Method

A

The step-down ratio method is a relatively simple way to estimate or project population. This method uses the ratio of the population in a city and a county (or a larger geographical unit) at a known point in time, such as the decennial Census.

This ratio is used to project the current or future population. For example, the population of Plannersville is 20% of the county population in 2000. If we know that the county population is 20,000 in 2005, we can then estimate the population of Plannersville as 4,000 (20%).

81
Q

Distributed Housing Unit Method

A

This method uses the Census Bureau data for the number of housing units, which is then multiplied by the occupancy rate and persons per household. This method is reliable for slow growth or stable communities but is less reliable in communities that are changing more quickly.

82
Q

Cohort Survival Method

A

The cohort survival method uses the current population plus natural increase (more births, fewer deaths) and net migration (more in-migration, less out-migration) to calculate a future population. The population is calculated for men and women in specific age groups.

Specific time intervals are used, such as one, five or ten years. The smallest time interval for which an estimate can be made is the length of time it takes for all members of an age cohort to age (e.g., age 10 - 14) to the next age grouping (e.g., age 15 - 19). All of the cohorts must have the same interval since each group must pass from one cohort to the next with nobody left behind over the course of the analysis. So, if data is available for each age (year), the method can be used to project the population year by year. Typically, five-year intervals are used. In that case, the shortest time for which a projection can be made is five years.

The cohort survival method provides the most accurate population projection but requires a large amount of data.

83
Q

population pyramid

A

The results of Cohort Survival analysis can be presented in both numerical and graphic form. A graphic presentation with male cohorts on one side and female cohorts on the other will look like a pyramid with many people on the bottom (“birth cohort”). The number of people in each group declines with age. This is called a population pyramid.

84
Q

Natural increase

A

is the difference between the number of children born and the number of people who die during the one-time interval. The analysis, however, is being done in terms of age-cohorts for each sex. Children can only be born into the first cohort, but people die in all of the cohorts, including the birth cohort. Children are born only to women of childbearing age, which means young girls and post-menopausal women have no direct effect on the number of children born.

85
Q

Vital Statistics of the United States through the U.S. National Center for Health Statistics

A

Birth and death rates are published by the state. This information can be found at state offices as well and is available by age cohort.

86
Q

Death rate

A

the number of deaths per 1,000 people.

87
Q

Crude Birth Rate

A

is the total number of births per 1,000 people.

88
Q

General Fertility Rate

A

is the number of babies born per 1,000 females of childbearing age.

89
Q

Age-Specific Fertility Rate

A

is the number of babies born per 1,000 females in a given age group.

90
Q

Net Migration

A

is the difference between the number of people moving in and the number of people moving out.

There are a variety of ways to calculate net migration. It is possible to construct complex linear models to predict migration patterns for each cohort. However, it is also possible to use a simple migration rate that applies the migration rate from the previous period to the present projection. Migration rates can be obtained from the state demographic office.

In-Migration is the total number of people moving into a location. Out-Migration is the number of people leaving an area. The Net Migration Rate is simply in-migration minus out-migration divided by the population of the area.

91
Q

Geographic Information Systems (GIS)

A

Is the field of computerized mapping. G Data to be used in a GIS system can be captured through digitization or GPS.

92
Q

Digitizing

A

is tracing points, lines and areas from a paper map, photograph, or raster image, resulting in a digital line graphic or vector file.

93
Q

spatial data

A

in the form of themes, layers, or coverages. Spatial data can be displayed accurately because of georeferencing, which refers to the exact location for example in latitude and longitude. Themes could be waterways, forest land, school districts or any other thematic features.

94
Q

Attributes

A

are the information about an object or feature - taking a census tract as an example the attribute data would include the tract number, population, number of households etc. Attribute data are typically stored in a database or spreadsheet format. Ian McHarg in design by nature illustrated these concepts of layering thematic data with various attributes. His pioneering work led to the type of environmental planning work we can visualize in GIS today.

95
Q

Topographic map

A

is a two-dimensional representation of a portion of the three-dimensional surface of the earth.

96
Q

Global Positioning Systems (GPS)

A

have improved the spatial accuracy of planning information. GPS allows the incorporation of the location of features and facilities into databases. This is used frequently in a smartphone and associated apps to show your location or provide directions. It is also used by transportation departments to alert drivers to traffic delays.

97
Q

TIGER

A

is the acronym for Topographically Integrated Geographical Encoding and Referencing map, which is used for Census data. A TIGER map includes streets, railroads, zip codes, and landmarks. TIGER maps are used by the U.S. Census Bureau and can be downloaded into a GIS system, where they are often used as base layers upon which local information is added.

98
Q

Digital Aerial Photography

A

is frequently used by planners. Digital aerial photography has allowed for increased accuracy to the 0.5-foot resolution. These photographs can be incorporated into GIS.

99
Q

Digital Elevation Models (DEMs)

A

show digital data about the elevation of the earth’s surface as it varies across communities allows planners to analyze and map it. DEMs can be used for stormwater management, flood control, land use decisions, and other purposes.

100
Q

Light Detection and Ranging (LIDAR)

A

is a new technology using a laser, instead of radio waves, that is mounted in an airplane to provide detailed topographic information. It can provide a dense pattern of data points to create one-foot contours for DEMs for use in watershed mapping and hydrologic modeling for flood control. It can also be used to sense the environment for code violations, such as signs that were not built to comply with code.

101
Q

UrbanSim

A

is a simulation software program that models planning and urban development. This free software program is designed to be used by Metropolitan Planning Organizations (MPOs).

102
Q

CommunityViz

A

is an ESRI software environment that allows agencies to analyze land use scenarios and create 3D images. This allows citizens to visualize the potential for development and redevelopment.

103
Q

Urban Footprint

A

developed by Peter Calthorpe and Associates and is a more recent addition to the simulation program options for planners. It uses a library of place types, block types, and building types to support interactive scenario building.

104
Q

primary source

A

survey, observation, or other methods.

105
Q

secondary sources

A

such as the U.S. Census, regional planning agencies, etc.

106
Q

survey

A

is a research method that allows one to collect data on a topic that cannot be directly observed, such as opinions on downtown retailing opportunities. Surveys are used extensively in planning to assess attitudes and characteristics of the public on a wide range of topics.

107
Q

sampling frame

A

Surveys typically take a sample of a population. For example, 500 out of 5,000 households in a community might be mailed a survey. The population of interest is called the sampling frame.

108
Q

cross-sectional survey

A

Planners typically use a cross-sectional survey. A cross-sectional survey gathers information about a population at a single point in time. For example, planners might conduct a survey on how parents feel about the quality of recreation facilities as of today.

109
Q

longitudinal surveys

A

over a period of time, alternative to cross-sectional surveys

Some cities conduct a citizen survey of service satisfaction every couple of years. This data can be combined to compare the differences in satisfaction between 1995 and 2005.

110
Q

Written surveys

A

can be mailed, printed in a newspaper, or administered in a group setting. Written surveys are very popular when a planner is trying to obtain information from a broad audience, such as general opinions about the community. This is a low-cost survey method that is convenient for participants because they can complete the survey at their leisure. However, mail surveys have a low response rate, averaging around 20 percent. A written survey also requires the participant to be able to read and write. For this reason, it may be inappropriate when targeting seniors, those that do not speak English, and the illiterate.

111
Q

Group-administered surveys

A

are appropriate when there is a specific population that a planner is trying to target. This form of surveying allows a high and quick response rate. The difficulty with administering this survey is getting everyone together to complete the survey. One example would be to survey participants in recreation programming by asking each person to complete a survey at the end of class. This survey method requires a small sample size.

112
Q

Drop-off survey

A

allows the survey to be dropped off at someone’s residence or business. Respondents are free to complete the survey at their convenience. Response rates are higher than with a mail survey because the person dropping off the survey may have personal contact with the respondent. This method can be expensive because of the time required to distribute the surveys. The sample is generally smaller than with a mail survey.

113
Q

Oral surveys

A

can be administered on the phone or in person.

114
Q

Phone surveys

A

are useful when you need yes/no answers. Surveys on the phone or in person allow the interviewer to follow up and gain further explanation on answers. The response rate varies greatly, depending on the ability to reach potential respondents. Response rates for phone surveys are declining. Phone surveys are usually more expensive than mail or internet-based surveys. Phone and in-person interviews can be biased due to interaction with the interviewer. Long questions and those with multiple answers are difficult to administer using this method.

115
Q

Online surveys

A

are popular. These can be administered on a website, e-mail, or text message. This is an inexpensive method of surveying that can generate quick responses. Electronic surveys have a higher response rate than written or interview surveys. The downside is that you will not reach people without Internet access, which can introduce significant bias.

116
Q

In designing a survey the following points are important:

A
  • Make all questions clear (don’t use technical jargon).
  • Make sure each question only asks about one issue.
  • Make questions as short as possible.
  • Avoid negative items as they can confuse respondents.
  • Avoid biased items and terms.
  • Use a consistent response method, such as a scale of 1 to 7 or yes/no.
  • Sequence questions from general to specific.
  • Make the questions as easy to answer as possible.
  • Define any unique or unusual terms. For example, when you are conducting a survey about open space zoning be sure to define what the term means.
117
Q

sample design

A

he sample should represent the population about which information is being gathered. The extent to which this is the case determines how general the findings will be.

118
Q

probability sampling

A

there is a direct mathematical relation between the sample and the population, so that precise conclusions can be drawn. An example of such a conclusion would be something along the lines that 46% of the city’s homeowners favor additional playgrounds for the city, with an error rate of +/- 2%.

Example of probability sampling are random samples, where everyone has the same chance of being selected to participate in the survey, and systematic, stratified or cluster sampling, where special groups are targeted. For example, in stratified sampling, the population is divided into separate groups or classes, from which a sample is drawn such that the classes in the population are represented by the classes in the sample. Most electoral surveys are based on highly stratified samples. A cluster sample is a special form of stratified sampling, where a specific target group out of the general population is sampled from, such as the elderly, or residents of a specific neighborhood.

119
Q

random samples

A

an example of probability sampling, where everyone has the same chance of being selected to participate in the survey

120
Q

stratified sampling

A

the population is divided into separate groups or classes, from which a sample is drawn such that the classes in the population are represented by the classes in the sample. Most electoral surveys are based on highly stratified samples.

121
Q

cluster sample

A

is a special form of stratified sampling, where a specific target group out of the general population is sampled from, such as the elderly, or residents of a specific neighborhood.

122
Q

non-probability sampling

A

there is no precise connection between the sample and the population, so that the results have to be interpreted with caution since they are not necessarily representative of the population. On the other hand, they are often much easier to obtain than probability samples. Examples are convenience sample (individuals that are readily available), or a snowball sample (where one interviewed person suggests other potential interviewees). A volunteer sample consists of self-selected respondents. A special case of a volunteered sample is volunteered geographic information (VGI), for example, when participants enter information on a web map (e.g., volunteered street maps).

123
Q

convenience sample

A

individuals that are readily available

124
Q

snowball sample

A

(where one interviewed person suggests other potential interviewees)

125
Q

snowball sample

A

(where one interviewed person suggests other potential interviewees)

126
Q

volunteer sample

A

consists of self-selected respondents.