STAT Definitions (EDUS 608) Flashcards
Variable
a characteristic that can vary in value among subjects in a sample or a population.
Categorial Variable (Qualitative)
scale for measurement is a set of categories.
Examples:
Racial-ethnic group (white, black, Hispanic)
Political party identification (Dem., Repub., Indep.)
Vegetarian? (yes, no)
Happiness (very happy, pretty happy, not too happy)
Gender
Religious affiliation
Major
Quantitative Variable
possible values differ in magnitude.
Examples: Age, height, weight, BMI Annual income GPA Time spent on Internet yesterday Reaction time to a stimulus (e.g., cell phone while driving in experiment) Number of “life events” in past year
Nominal Scale
Used to measure CATEGORICAL VARIABLES by using unordered categories.
Example:
Preference for President, Race, Gender,
Religious affiliation, Major
Opinion items (favor vs. oppose, yes vs. no)
Ordinal Scale
Used to measure CATEGORICAL VARIABLES by using ordered categories.
Political ideology (very liberal, liberal,
moderate, conservative, very conservative)
Anxiety, stress, self esteem (high, medium, low)
Mental impairment (none, mild, moderate, severe)
Government spending on environment (up, same,
down)
Interval Scale
Used to measure QUANTITATIVE VARIABLES by using numerical values.
The difference between values are consistent:
-Moving from $20,000 to $21,000 is the same
magnitude as moving from $50,000 to $51,000
-Moving from 90 degrees F to 95 degrees F is the
same as moving from 70 to 75
Note: In practice, ordinal categorical variables often
treated as interval by assigning scores
(e.g., Grades A,B,C,D,E an ordinal scale, but
treated as interval if assign scores 4,3,2,1,0 to
construct a GPA)
What is Descriptive Statistics?
- Describing data with tables and graphs
(quantitative or categorical variables) - Numerical descriptions of center
(mean/median) and variability (standard
deviation/ variance) (quantitative variables)
Histogram
Bar graph of frequencies or percentages.
Skewed right
Long tail on the right. Mean is to the RIGHT of the Median
Skewed left
Long tail on the left. Mean is LEFT of the Median.
Bimodal
Mean and median are the same, but there are two modes.
Bell-shaped
Mean, median, and mode are the same.
Median
Middle measurement of ordered sample.
Mean
average that is used to derive the central tendency of the data in question. It is determined by adding all the data points in a population and then dividing the total by the number of points. The resulting number is known as the mean or the average.
Mean vs. Median (Distribution)
Mean sensitive to “outliers” (median often preferred for highly skewed distributions)
When distribution symmetric or mildly skewed or discrete with few values, mean preferred because uses numerical values of observations
Range
Difference between largest and smallest observations (highly sensitive to outliers).
Standard Deviation
A “typical” distance from the mean. It is the square root of the variance.
Variance
Measures how far a data set is spread out. It comes from calculating the average of the squared differences from the mean.
Deviation
The difference of an observation’s value from the mean.
Properties of Standard Deviation
- s 3 0, and only equals 0 if all observations are equal
- s increases with the amount of variation around the mean
- like mean, affected by outliers
Empirical Rule
If distribution is approximately bell-shaped:
• about 68% of data within 1 standard dev. of mean
• about 95% of data within 2 standard dev. of mean
• all or nearly all data within 3 standard dev. of mean
Point Estimation
Estimating parameters (mean, median, standard dev.)
Inference
Testing theories about parameters.
Hypothesis Testing
Creating models based on hypotheses and testing them with data to see if they are consistent with the data.
Null Hypothesis H0
– There is no effect.
– E.g. contestants on “Survivor” and members of the public will not differ in their scores on personality disorder questionnaires
It is called the “null” because it is frequently,
though not always, used to say that something
is 0)
• Examples of null hypotheses:
– Ho: μ=0
– Ho: There are no differences in math achievement
by SES level.
– Ho: “I don’t have the flu”
The alternative hypothesis, HA (or H1)
– There is an effect.
– E.g. contestants on “Survivor” will score higher on personality disorder questionnaires than members of the public
• Typically suggests that an effect exists, or
(in this class) is statistically significant
• Examples of alternative hypotheses
corresponding to the previous examples:
– Ha: μ≠0
– Ha: There are differences in math achievement by SES level.
– Ha: “I have the flu”
Null versus Alternative
- The null and the alternative can’t both be true and are mutually exclusive
- Using statistics, we have strong tools to assess the probability that one is correct and the other isn’t…
- Based on the results you obtain, you will either reject the null hypothesis (you have evidence an effect exists), or you will fail to reject the null hypothesis (you don’t have enough evidence that an effect exists)
- Instead of saying “fail to reject” the null hypothesis, some disciples use “retain” the null hypothesis
Type I Error
Aka. False Positive.
Reject the null if it is true.
• My test says that μ≠0, but actually μ=0
• My test says that achievement differs by SES, it actually
doesn’t
• My swab results say “I have the flu”, but I actually don’t
– i.e., I got a false positive
– In medical tests, testing “positive” means rejecting the null
Type II Error
Aka. False Negative.
Fail to reject the null when it’s false.
• My test says that I can’t reject the assertion that μ=0, but in reality μ≠0
• My test says achievement doesn’t differ by SES, but it actually does
• My swab results say “I don’t have the flu”, but I actually do
– i.e., I got a false negative
– In medical tests, testing “negative” means you do not reject the null
Alpha (α)
- the proportion of the times I can expect to reject the null when it’s true in repeated randomly drawn samples of the same sample size from the population
- aka the probability that I will make a type I error (with repeated sampling)
- This is also called the “significance level”
How Do We Choose Alpha?
• There are conventional choices for α
– The most common choice for α is .05
– Also common are .1 and .01
• All these choices are arbitrary, but they are attempting to be conservative
The smaller the alpha level, the smaller the area where you would reject the null hypothesis. So if you have a tiny area, there’s more of a chance that you will NOT reject the null, when in fact you should. This is a Type II error.
In other words, the more you try and avoid a Type I error, the more likely a Type II error could creep in. Scientists have found that an alpha level of 5% is a good balance between these two issues.