Text Ch.14+15+16 Flashcards
Linking Statistics to Arguments in Political
Science Research
Common arguments/claims, Examples, and Some Questions to Ask
- Descriptive claims
- E.g., percentages, frequencies, averages
- Are the sample data representative of the population? - Claims of differences between groups
- E.g., young versus old E.g., experimental group versus control group
- How large are the differences?
- Are the differences due to chance? - Claims of relationships between variables
- E.g., correlations
- How strong is the relationship?
- Is the relationship due to chance?
- Is it a causal relationship?
Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures.
• Measures of central tendency
• Measures of dispersion/variation
Central Tendency of each
Nominal
Ordinal
Interval
Nominal: Mode
Ordinal: Mode, Median
Interval :Mode, Median, Mean
Mode
The mode is the value that occurs most frequently. • Not every sample has a distinct mode. Sometimes it is bimodal (two modes) or multimodal (three or more modes) or sometimes there is no mode at all. • The mode is the only measure of central tendency we can use for nominal data.
Median
The sample median is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. If there isn’t one value in the middle we take the average of the two middle values. The median is not affected by extreme values.
Mean
The sample mean is the mathematical average of the data and is the
measure of central tendency we use most often.
Standard Deviation
Deviation: how far an individual score is from the mean • Standard deviation: on average, how far scores are from the mean • Sensitivity to extreme scores
- Always positive.
- Always in the same units as the observations in the sample.
- Affected by outliers.
- Affected by sample size.
Variation Ratio
• Only option available for nominal level data; can also be used for ordinal and interval level data
• Reflects number of cases that are NOT in the modal category
• Higher ratio: suggests data are more dispersed à mode may be less representative of the data
• Lower ratio: suggests data are less dispersed à mode may be more representative of the data
• Calculated as:
Variation ratio = 1 (number of cases in modal category/number of total cases)
Range
Range
• Can be used for ordinal and interval level data
• Represents the difference between the highest and lowest scores (two extreme values)
Hypothetical example A:
Highest grade : 94%
Lowest grade: 10%
Range = 94-10 = 84
Hypothetical example B:
Highest grade : 70%
Lowest grade: 50%
Range =70-50 = 20
Null and Alternative Hypothesis
Null hypothesis (H0) • Relationship observed in data is due to chance • “Nothing going on”
Alternative hypothesis (Ha)
• Relationship observed in data is not due to chance
• “Something going on”
• Burden of proof is on Ha: must gather evidence
• Reject/fail to reject H0
Describe type 1 and type 2 errors
type 1:false positive -Relationship claimed by researcher (“reject H0”) but the relationship doesnt exist •Lower confidence levels (e.g., 90%, 95%) make it easier to reject the null, thus making Type I errors more likely
type 2:false negative -No relationship claimed by researcher (“fail to reject H0”) but relationship exists • Higher confidence levels (e.g., 99%, 99.5%) make it harder to reject the null, thus making Type II errors more likely
SPSS
computer software for finding significance of statistics
-but researcher chooses confidence level
Measures of Dispersion(3) Vs. Measures of Central Tendency(3)
Mode,Median, Mean
Variation Ration, Range, Standard Deviation
Confidence level (aka alpha level)
• probability that the sample statistic is an accurate estimate of the population parameter, and the population parameter lies within an estimated range of values (known as the confidence interval) • E.g., If the sample statistic is 45% and the confidence interval is +/- 3%, then the confidence interval is 42% to 48% -confidence interval is the range within which the sample statistic should lie if is accurately representing the population parameter
higher confidence level=_____ range
lower confidence level=_____ range
higher
lower
Higher confidence level (99%)
- wider confidence interval
- more accuracy (more likely to be correct), less precision
Lower confidence level (95%)
- narrower confidence interval
- less accuracy (less likely to be
correct) , more precision
Match Lower and Higher confidence levels with either the type 1 error or type 2
Lower confidence levels=Type 1 error -made it easier to reject the null so a greater possibility of a false positive • Lower confidence levels (e.g., 90%, 95%) make it easier to reject the null, thus making Type I errors more likely
Higher confidence level=Type 2 error -made it harder to reject null and • Higher confidence levels (e.g., 99%, 99.5%) make it harder to reject the null, thus making Type II errors more likely
Confidence level reflects _________
sample size
Low confidence level and large sample size = type __ error
1
High confidence level and small sample size= type__ error
2
Statistical Significance
- Statistical significance as a yes/ no question; cannot be ‘more’ or ‘less’ significant
- Selection of confidence level affects likelihood of something being found statistically significant