8: Research Methods / Statistics Flashcards

1
Q

What is ratio data?

A

Ratio data is the gold standard of measurment, where both absolute and relative differences have a meaning. An example would be distance measure.

Dif between 40 and 30 miles is the same as the dif between 30 and 20 miles. AND 40 miles is twice as far as 20 miles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Nominal Data

A

This type of measurement is classified into mutually exculsive groups or categories and lack intristic order.
(examples: zoning classification, social security number).

label of categories does not imply any order.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Hypothesis test

A

This type of test is designed to reject a null hypothesis, but never to accept the alternative hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Symptomatic Method

A

Uses data sets such as building permits that are reflective of populaAnation change and can be used to estimate current development population estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Systematic random sample

A

Equal chance of being selected, every Xth person is surveyed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the probability of an event that is certain to happen?

A

1 - probabilities range from 0 to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Total acreage of federal indian reservations in the U.S.

A

56.2 million

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a positive correlation?

A

When the high scores on one variable are associated with a high score on a second variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Analysis of the relationship between two variables

A

Regression analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Total acreage of national forest land in the U.S.

A

192 million

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Difference between the lowest and highest score on an exam

A

Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient of Correlation

A

Measures the degree to which two variables are related

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Stratified Sample

A

Subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup.

SYSTEMATIC stratified sample represents the most effective way to get an accurate cross-section of the local population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Qualitative V Quantitative V Mixed Methods

A

Qualitative: Approach for understanding the meaning individuals and groups ascribe to a human or social problem. Emerging questions.

Quantitative: Approach for testing objective theories by examining the relationships among variables (deductive). Nucmbered data.

Mixed methods: Collection of both qualitative & quantitative data. Integrating the 2 forms of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Discourse Analysis

A

Study of the way versions of the world, society, events, and psyche are produced in the use of language and discourse. It is often concerned with the construction of subjects within various forms of knowledge / power.

EXAMPLES: Semiotics, deconstruction, narrative analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Ethnography

A

Multi-method qualitative approach - studies people in their “naturally occurring settings or “fields” by means of methods which capture their social meanings and ordinary activities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Grounded Theory

A

Inductive form of qualitative research where data collection & analysis are conducted together. Theories remain grounded in the observations rather than generated in the abstract.

Approach that develops the theory from the data collected rather than applying a theory to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Narrative Analysis

A

Form of discourse analysis that seeks to study the textual devices at work in the constructions of process or sequence within a text. Tells researcher about the meaning of events in their lives.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

3 steps to a statistical process:

A

1- Collect Data
2- Describe & Summarize the distribution of values in the data set.
3- Interpret by means of inferential stats & stat modeling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Ordinal Data

A

Ordered categories which implies a ranking of the observations. Even though ordinal data may be given numeric values (example 1,2,3,4) - values are meaningless.

Example: Letter grades, suitability for development, response scales on a survey.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Interval data

A

Ordered relationship where the difference between the scales has a meaningful interpretation. Typical example = temperature.

Dif between 40 & 30 degrees, same as dif between 30 & 20 degrees - but 20 degrees is NOT twice as cold as 40 degrees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Continuous variables

A

Can take an infinite number of values, both positive and negative, & with as fine of a degree of precision as desired.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Discrete variables

A

Can only take a finite number of distinct values. Example = count of the number of events, such as the number of accidents per month. Cannot be negative, can only take on integer values.

Binary or dichotomous variables = can only take on 2 values coded as 0 and as 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Population

A

Totality of some entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Sample

A

Subset of the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Descriptive Statistics

A

Describes the characteristics of the distribution of values in a population or in a sample.

For example - descriptive stat such as the mean could be applied to the age distribution in the population of AICP exam takers. On average, test takers are 30 years old.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Inferential statistics

A

Use probability theory to determine characteristics of a population based on observations made on a sample from that population.

We infer things about the population based on what is observed in the sample.

Example - sample of 25 test takers and use their average age to say something about the mean age of all the test takers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Distribution

A

Overall shape of all observed data.
Can be listed as an ordered table or graphically represented by a histogram or density plot.

HISTOGRAM: groups observations in bins represented in bar chart.

DENSITY PLOT: Shows a smooth curve.

Characteristics are summarized by descriptive statistics: like central tendency, dispersion, symmetry or lack thereof (skewness) & presence of thick tails aka higher likelihood of extreme values (kurtosis).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Range

A

Difference between largest & smallest value

30
Q

Normal / Gaussian distributation

A

Bell curve.

Distribution is symmetric & has the additional property that spread around the mean can be related to the proportion of observations.

Often used as the reference distribution for statistical inference.

31
Q

Symmetric distribution

A

One where an equal number of observations are below the mean & above the mean.

A Symmetric distribution, where there are either more observations below the mean or more above the mean - skewed.

32
Q

Central Tendency

A

Typical or representative value for the distribution of observed values.

Ways to measure this = mean, median, mode.

Can be applied to the population as a whole, or to a sample from the population.

33
Q

Mean, Weighted Mean

A

Average of a distribution. Computed by adding up the values and dividing by the number of observations.

Weighted mean - when there is a greater importance placed on specific entries or when representative values are used for groups of observations.

Mean appropriate for interval & ratio data, not for ordinal or nominal.

34
Q

Median

A

Middle value of a ranked distribution

35
Q

Mode

A

Most frequent number in a distribution

36
Q

Symmetry

A

Symmetric distribution: Mean and median tend to be very close.

Skewed distributions: Tend to be different.

37
Q

Variance & Standard Deviation

A

Both based on the squared difference from the mean

Standard deviation is the square root of the variance.

Standard deviation is in the same units as the original variable and preferred to escribe how values are spread out around the central tendency.

LARGER VARIANCE = greater spread around the mean.

Used for interval and ratio data.

38
Q

Coefficient of Variation

A

Relative dispersion from the mean by taking the standard deviation and dividing by the mean.

Used for interval and ratio data.

39
Q

Z-Score

A

Standardization of the original variable by subtracting by the mean and dividing by the standard deviation.

As a result the mean of the z-score is 0 and the variance (standard deviation) is 1.

40
Q

Inter-quartile range (IQR)

A

Difference in value between the 75 percentile and the 25 percentile.

Example- if we have 20 observations ranked increasing order, take the 5th and the 15th observation and compute the differences between those two values. This is the IQR.

41
Q

Hypothesis Test

A

Start by setting up a null hypothesis (used as a reference).

Then find evidence in the data to REJECT the null hypothesis statement in the direction of the alternative hypothesis.

Statistical evidence only provides support to reject the null hypothesis, never to accept the alternative hypothesis.

42
Q

Statistical decision

A

The significance / P-value of a test (also called Type I Error) -the probability that we reject the null hypothesis with.

43
Q

Confidence interval

A

Rane of confidence interval depends on the sampling error.

If the sampling error is large, means there isn’t much information in the sample relative to the population.

SMALLER sampling error = more precise statements.

44
Q

T-test (students t-test)

A

Typically used to compare the means of 2 populations based on their sample averages.

44
Q

T-test (students t-test) & how to test

A

Typically used to compare the means of 2 populations based on their sample averages.

Test by testing the significance of a regression coefficient.

45
Q

ANOVA or Analysis of variance

A

More complex form of testing the equality of means between groups.

Use a treatment group & a control group.

F-test is a simple case of ANOVA.

46
Q

Chi Square Test

A

Measure of Fit - Test that assesses the difference between as ample distribution and a hypothesized distribution.

47
Q

Correlation Coefficient

A

Measures the strength of a linear relationship between two variables.

Does NOTE imply anything about causation (ex- whether one variable influences the other).

48
Q

Positive correlation vs negative correlation

A

Positive = high values of one variable match high values of the other.

Negative = high values of one variable match low values of the other and vice versa.

49
Q

Linear regression

A

Hypothesizes a linear relationship between a dependent variable & one or more explanatory variables.

50
Q

TIGER

A

Acronym for Topographically Integrated Geographical Encoding and Referencing map, which is used for Census data. A TIGER map includes streets, railroads, zip codes, and landmarks.

Used by the Census Bureau and can be downloaded into a GIS system.

51
Q

Digital Aerial Photogrpahy

A

Allowed for increased accuracy, can be incorporated into GIS.

52
Q

Digital Elevation Models (DEM)

A

Show digital data about the elevation of the earth’s surface as it varies across communities, allows planners to analyze and map it. DEMs can be used for stormwater management, flood control, land use decisions, and other purposes.

53
Q

Light Detection and Ranging (LIDAR)

A

New technology using a laser, instead of radio waves, that is mounted in an airplane to provide detailed topo information.

LIDAR can provide a dense pattern of data points to create 1-foot contours for digital elevation models (DEMs)

Use in watershed mapping, hydrologic modeling for flood control.

54
Q

UrbanSim

A

Simulation software program that models planning and urban development. FREE. Designed to be used by MPOs

55
Q

CommunityViz

A

ESRI Software that allows agencies to analyze land use scenarios and create 3D images. Allows citizens to visualize potential for development and redevelopment.

56
Q

Urban Footprint

A

Developed by Peter Calthorpe & Associates and is a more recent addition to the simiulation program option for planners.

Uses a libraryof place types, block types, and building types to support interactive scenario building.

57
Q

Sampling frame

A

Population of interest for a survey.

58
Q

Cross-sectional survey vs Longitudinal surveys

A

CS Survey- Gathers information about a population at a single point in time.
Longitudinal Survey- Conducted over a period of time.

59
Q

Written surveys- Pros & Cons

A

Pros: low cost, convenient for survey takers.

Cons: low response rate (AVERAGE 20%), requires literacy.

Could be bad for seniors and groups who don’t speak english and groups with low rates of literacy.

60
Q

Drop-Off Survey - Pros & Cons

A

Pros: convenient for respondents. Response rates higher than mail survey because person dropping off the survey may have personal contact with the respondent.

Cons; Can be expensive because of time required to distribute the surveys. Sample generally smaller than mail survey.

61
Q

Phone survey- pros and cons

A

Pros: Allow to get further explanations.

Response rates vary greatly.

More expensive than mail or internet surveys.
Can be biased due to interaction with the interviewer.

Long questions & those with multiple answers are difficult to administer.

62
Q

Online Surveys

A

Popular - administered on a website, e-mail or text.

INEXPENSIVE. Higher response rate than interview or written surveys.

Won’t reach people without internet access

Keep these things in mind:
Make all questions clear (don’t use technical jargon).
Make sure each question only asks about one issue.
Make questions as short as possible.
Avoid negative items as they can confuse respondents.
Avoid biased items and terms.
Use a consistent response method, such as a scale of 1 to 7 or yes/no.
Sequence questions from general to specific.
Make the questions as easy to answer as possible.
Define any unique or unusual terms. For example, when you are conducting a survey about open space zoning be sure to define what the term means.

63
Q

Cluster sample

A

A special form of stratified sampling where a specific target group out of the general population is sampled from, such as the elderly or residents of a specific neighborhood.

64
Q

Non-probability sampling (Convenience, snowball, volunteer)

A

No precise connection between the sample and the population.

Convenience: individuals that are readily available

Snowball: one interviewed person suggests other potential interviewees.

Volunteer: Self-selected respondents.

65
Q

Choropleth maps

A

Best way to link statistical data to discrete geographic areas using different colors or shades of the same color.

66
Q

Correlation Values

A

STRONG correlation would be a vlaue close to either _1 or -1

WEAK correlation would be a value close to 0.

NEGATIVE correlation implies an inverse relationship between the two variables. If you study a lot, you don’t have a lot of free time.

67
Q

Two variables, studying for the AICP Exam and Amount of Free Time, were examined to determine if there is a correlation. The analysis resulted in a correlation value of -0.85. What does this mean?

A

There is a high correlation between studying for the AICP Exam and the amount of free time people have.

68
Q

Best reason for selecting a random sample from a large population?

A

To provide an approximation of the characteristics of the population.

69
Q

Sample Selection Bias can occur when:

A

The availability of data is influenced by the selection process

70
Q

Future value equation

A

FV = (1 + r)^y PV with the interest r in fractions of 100 (so 5% = 5/100) and y the number of years. The present value is thus

71
Q

What can be described as a measure of dispersion around the mean that is calculated as the average of the sum of the squared deviations from the mean?

A

Variance