Statistics Flashcards

1
Q

Variance (or Sample Variance)

A

The average of the squared differences from the mean.

Measure of how far a set of numbers is spread out (but just a numerical value that doesn’t make much sense on its own).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Standard Deviation

A

Square root of the variance.

Tells you how much the data is spread out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Regression Analysis

A

A way to find trends in data.
Will provide an equation for a graph so you can make predictions about your data.
Fitting a set of points to a graph.
Provides an estimate of one variable based on the linear function of another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling frame

A

The population of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

T-test

A

Tests the differences in means.

Are 2 groups part of the same population?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Chi-square test

A

Goodness of fit.

How well expected/predicted values match observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Ordinal

A

Position in a list, ranking.
Even though may be given numerical values, such as 1, 2, 3, 4, the values themselves are meaningless, only the rank counts. So, even though one might be tempted to infer that 4 is twice 2, this is not correct. Examples: letter grades, suitability for development, and response scales on a survey (e.g., 1 through 5).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interval

A

Data that has an ordered relationship where the difference between the scales has a meaningful interpretation.
Example: temperature. The difference between 40 and 30 degrees is the same as between 30 and 20 degrees, but 20 degrees is not twice as cold as 40 degrees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ratio

A

Both absolute and relative differences have a meaning. Example: distance measure, where the difference between 40 and 30 miles is the same as the difference between 30 and 20 miles, and in addition, 40 miles is twice as far as 20 miles.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

p value

A

Number that you get by running a hypothesis test on your data. A P value of 0.05 (5%) or less is usually enough to claim that your results are repeatable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

R squared value

A

In regression, tells you how good your model is.

The values range from 0 to 1, with 0 being a terrible model and 1 being a perfect model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nominal data

A

Mutually exclusive groups or categories, llack intrinsic order.
Examples: zoning classification, social security number
The label of the categories does not matter and should not imply any order. So, even if one category might be labeled as 1 and the other as 2, those labels can be switched.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Population

A

The totality of some entity.

Example: the total number of planners preparing for the 2018 AICP exam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Sample

A

Subset of the population.

Example: 25 candidates selected at random out of the total number of planners preparing for the 2018 AICP exam.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Descriptive Statistics

A

Describe the characteristics of the distribution of values in a population or in a sample.
Example, the mean could be applied to the age distribution in the population of AICP exam takers, providing a summary measure of central tendency (e.g., “on average, AICP test takers in 2018 are 30 years old”). The context will make clear whether the statistic pertains to the population (all values known), or to a sample (only partial observations). The latter is the typical case encountered in practice.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Inferential Statistics

A

Use probability theory to determine characteristics of a population based on observations made on a sample from that population. We infer things about the population based on what is observed in the sample. For example, we could take a sample of 25 test takers and use their average age to say something about the mean age of all the test takers.

17
Q

Hypothesis test

A

Statement about a particular characteristic of a population (or several populations). We distinguish between the null hypothesis (H0), i.e., the point of departure or reference, and the alternative hypothesis (H1), or the research hypothesis one wants to find support for by rejecting the null hypothesis.

18
Q

Sampling error

A

The connection between the sample and the population. Because a sample does not contain all the information in the population, any statistic computed from the sample will not be identical to the population statistic, but show variation. That random variation is the sampling error or sampling distribution. The sampling error, which is random, should be distinguished from a systematic error or model misspecification, which occurs because our model (or assumptions) are wrong. It is unrelated to the sample as such.

19
Q

Confidence interval

A

A range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%. So, instead of rejecting the null hypothesis with a given probability, we establish a range around the sample statistic, such as a sample average, that contains the population mean with a given probability. The range of the confidence interval depends critically on the sampling error. If the sampling error is large, this means there isn’t much information in the sample relative to the population, so our statements about the latter will by necessity be vague (large confidence interval). On the other hand, with a smaller sampling error, we can make more precise statements. The sampling error is related to the sample size, with a larger sample resulting in a smaller error (as the sample grows larger, it approximates the actual population more closely).

20
Q

Null hypothesis

A

Neutral statement that doesn’t suggest direction of result.

21
Q

Stratified random sample

A

Your sample has the same proportion of each group as the overall population.
Examples: male/female or education level.
If you have data about the individuals in the groups, that’s a stratified sample.
Can divide the population into strata by variables thought to be related to variables of interest.

22
Q

Systematic random sample

A

Choose every nth person from a list.

23
Q

Cluster sample

A

Used when natural groups are present in a population. The whole population is subdivided into clusters, or groups, and random samples are then collected from each group.
Example: each city is a cluster.
If you only have data about the groups themselves (not data about individuals in the group)(you may only know the location of the individuals), then that’s a cluster sample.