Qualitative and Quantitative Flashcards by Jessica Gumm

3 Steps in the Statistical Process

1) Collect Data
2) Describe & Summarize the Distribution
3) Interpret - draw general conclusion for the pop on the basis of the sample

How well did you know this?

Not at all

Perfectly

Nominal Data

Mutually Exclusive groups, lack intrinsic order.

Zoning classification, social security numbers, sex.

How well did you know this?

Not at all

Perfectly

Ordinal Data

Ordered implying a ranking of observations. Values are meaningless - rank is important.

Letter grades, response scales on a survey 1-5, suitability for development

How well did you know this?

Not at all

Perfectly

Interval data

Data with ordered relationship where the difference between scales has meaning.

Temperature. Diff between 40 and 30 degrees is the same as 30 and 20 but 20 degrees is not twice as cold as 40 degrees.

How well did you know this?

Not at all

Perfectly

Ratio Data

Gold standard of measurement. Absolute and relative difference have meaning.

Distance measurement. 40 - 30 miles is the same difference as 30-20 miles and 40 miles is twice as far as 20 miles.

How well did you know this?

Not at all

Perfectly

Quantitative Variables

Variables where numerical value is meaningful.

Interval or ratio measurement.

HH income, level of pollution in river

How well did you know this?

Not at all

Perfectly

Qualitative Variables

Variables where numerical value is not meaningful.

Nominal/Ordinal measurement.

Zoning classification

How well did you know this?

Not at all

Perfectly

Continuous Variables

Infinite number of values.

Positive & negative.

Most measurements in physical sciences yield continuous variables.

How well did you know this?

Not at all

Perfectly

Discrete variables

Finite number of distinct values.

Accidents per month - can’t be negative.

How well did you know this?

Not at all

Perfectly

Binary/dichotomous variables

Special case of discrete variables which can only take on two values - 0/1 typically.

How well did you know this?

Not at all

Perfectly

Descriptive variables

Describe the characteristics of the distribution of values in a population or sample.

Ex: on average, AICP test takers in 2018 are 30 years old

How well did you know this?

Not at all

Perfectly

Inferential Statistics

use probability to determine characteristics of a pop based on a sample.

How well did you know this?

Not at all

Perfectly

Distribution

the overall shape of observed data.

Ordered table, or histogram, or density plot

How well did you know this?

Not at all

Perfectly

Normal or Gaussian Distribution

the bell curve.

Distribution is symmetric. The spread around the mean can be related to the proportion of observations.

More specifically, 95% of the observations that follow a normal distribution are within two standard deviations from the mean

How well did you know this?

Not at all

Perfectly

Symmetric distribution

equal number of observations are below and above the mean

How well did you know this?

Not at all

Perfectly

Central tendency

Study These Flashcards

Typical or representative value for the distribution of observed values

Coefficient of Variation

Study These Flashcards

the relative dispersion from the mean by taking the standard deviation and dividing by the mean.

z-score

Study These Flashcards

This is a standardization of the original variable by subtracting the mean and dividing by the standard deviation.

The z-score in effect transforms the original measure into standard deviation units.

inter-quartile range or IQR.

Study These Flashcards

Alternative measure of dispersion.

Breaks things into quartiles.

This is visualized in a box plot (also called box and whiskers plot).

confidence interval.

Study These Flashcards

this constitutes a range around the sample statistic that contains the population statistic with a given level of confidence, typically 95% or 99%.

Standard Deviation

Study These Flashcards

a measure of how much the data in a certain collection are scattered around the mean. A low standard deviation means that the data are tightly clustered; a high standard deviation means that they are widely scattered. There are two common formulas used for standard deviation, both yielding the same result.

Variance

Study These Flashcards

the square of the standard deviation. It is a mathematical expectation of the average squared deviations from the mean. The formula is the same as that for the standard deviation except the “s” variable is squared, and no square root function is performed.

Coefficient of Variation

Study These Flashcards

unlike the other three measures of dispersion measures relative dispersion from the mean rather than absolute dispersion across the field. It is merely the standard deviation divided by the mean (CV = s / x ).

Hypothesis Testing

Study These Flashcards

is conducted to determine outcomes based on the scientific method. First, the statistician must declare the predicted (desired) outcome, then must also identify and describe all possible outcomes.
• The Research Hypothesis (designated H1) is a statement that describes the interrelationships between different characteristics. It is what the researcher is seeking to prove through the analysis.
• The Null Hypothesis (designated H0) is the opposite of the research hypothesis. It is what the researcher is seeking to prove wrong so that the research hypothesis can be assumed to be correct by implication.
• Remember that is easier to prove something wrong than correct (statistically speaking) so the null hypothesis is used.

• There are two kinds of error a researcher can make in hypothesis testing. First is a Type 1 Error, where H0 is rejected even though it is true. The other kind is a Type 2 Error where H0 is accepted when it should have been rejected as false.

t-test

allows us to compare the means of two groups and determine how likely the difference between the two means occurred by chance.

correlated t-test

concerned with the difference between the average scores of a single sample of individuals who is assessed at two different times (“before” vs. “after”) or on two different measures. The measures must be correlated (co-related), and so it can also compare average scores of samples of individuals who are paired in some way (i.e. parent-child).

independent t-test

compares the averages of two samples that are selected independently of each other. Independent t-tests come in “equal variance” and “unequal variance” flavors, but these go beyond the scope of this work.

ANOVA

an extension of the t-test. It stands for Analysis of Variance. It allows a composite view of data by assuming that by placing variable x into groups, a better understanding of variable y will be found. o ANOVA identifies the relationship between two variables. o The x variable is always nominal o The y variable is always interval • Mathematically, a line is expressed as y = Mx + b

Correlation

measures the strength of the relationship between variables or the degree to which two variables are correlated (co-related). It is used to demonstrate relationships between situations and/or actors, even disparate ones (think apples and oranges). The test is linear.

Regression

a statistical test of the effect one variable (condition/actor) has on another while holding all other conditions constant. This test is also linear. If there is no correlation, there is no need to utilize a regression test. Regression allows us to predict the value of one variable give the value of the other, or explore the relationships between variables. o There is always one dependent variable (y) in regression. o In simple regression, there is only one independent variable. The formula for simple regression is y=b0+b1x1. o In multiple regression, there are two or more independent variables. Multiple regression simply extends simple regression y=b0+b1x1.+b2x2+ … bnxn. o Regression answers one or more of these questions: . What is the association between x and y? . How can changes in y be explained by changes in x? . What are the functional relationships between y and x? o Beware of false relationships! Correlation and regression can be used to “prove” that fire trucks cause house fires (if there is a house fire, there are likely fire trucks).

Qualitative and Quantitative Flashcards

(31 cards)