Text Ch.14+15+16 Flashcards
Linking Statistics to Arguments in Political
Science Research
Common arguments/claims, Examples, and Some Questions to Ask
- Descriptive claims
- E.g., percentages, frequencies, averages
- Are the sample data representative of the population? - Claims of differences between groups
- E.g., young versus old E.g., experimental group versus control group
- How large are the differences?
- Are the differences due to chance? - Claims of relationships between variables
- E.g., correlations
- How strong is the relationship?
- Is the relationship due to chance?
- Is it a causal relationship?
Descriptive Statistics
Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample and the measures.
• Measures of central tendency
• Measures of dispersion/variation
Central Tendency of each
Nominal
Ordinal
Interval
Nominal: Mode
Ordinal: Mode, Median
Interval :Mode, Median, Mean
Mode
The mode is the value that occurs most frequently. • Not every sample has a distinct mode. Sometimes it is bimodal (two modes) or multimodal (three or more modes) or sometimes there is no mode at all. • The mode is the only measure of central tendency we can use for nominal data.
Median
The sample median is the middle value when the original data values are arranged in order of increasing (or decreasing) magnitude. If there isn’t one value in the middle we take the average of the two middle values. The median is not affected by extreme values.
Mean
The sample mean is the mathematical average of the data and is the
measure of central tendency we use most often.
Standard Deviation
Deviation: how far an individual score is from the mean • Standard deviation: on average, how far scores are from the mean • Sensitivity to extreme scores
- Always positive.
- Always in the same units as the observations in the sample.
- Affected by outliers.
- Affected by sample size.
Variation Ratio
• Only option available for nominal level data; can also be used for ordinal and interval level data
• Reflects number of cases that are NOT in the modal category
• Higher ratio: suggests data are more dispersed à mode may be less representative of the data
• Lower ratio: suggests data are less dispersed à mode may be more representative of the data
• Calculated as:
Variation ratio = 1 (number of cases in modal category/number of total cases)
Range
Range
• Can be used for ordinal and interval level data
• Represents the difference between the highest and lowest scores (two extreme values)
Hypothetical example A:
Highest grade : 94%
Lowest grade: 10%
Range = 94-10 = 84
Hypothetical example B:
Highest grade : 70%
Lowest grade: 50%
Range =70-50 = 20
Null and Alternative Hypothesis
Null hypothesis (H0) • Relationship observed in data is due to chance • “Nothing going on”
Alternative hypothesis (Ha)
• Relationship observed in data is not due to chance
• “Something going on”
• Burden of proof is on Ha: must gather evidence
• Reject/fail to reject H0
Describe type 1 and type 2 errors
type 1:false positive -Relationship claimed by researcher (“reject H0”) but the relationship doesnt exist •Lower confidence levels (e.g., 90%, 95%) make it easier to reject the null, thus making Type I errors more likely
type 2:false negative -No relationship claimed by researcher (“fail to reject H0”) but relationship exists • Higher confidence levels (e.g., 99%, 99.5%) make it harder to reject the null, thus making Type II errors more likely
SPSS
computer software for finding significance of statistics
-but researcher chooses confidence level
Measures of Dispersion(3) Vs. Measures of Central Tendency(3)
Mode,Median, Mean
Variation Ration, Range, Standard Deviation
Confidence level (aka alpha level)
• probability that the sample statistic is an accurate estimate of the population parameter, and the population parameter lies within an estimated range of values (known as the confidence interval) • E.g., If the sample statistic is 45% and the confidence interval is +/- 3%, then the confidence interval is 42% to 48% -confidence interval is the range within which the sample statistic should lie if is accurately representing the population parameter
higher confidence level=_____ range
lower confidence level=_____ range
higher
lower
Higher confidence level (99%)
- wider confidence interval
- more accuracy (more likely to be correct), less precision
Lower confidence level (95%)
- narrower confidence interval
- less accuracy (less likely to be
correct) , more precision
Match Lower and Higher confidence levels with either the type 1 error or type 2
Lower confidence levels=Type 1 error -made it easier to reject the null so a greater possibility of a false positive • Lower confidence levels (e.g., 90%, 95%) make it easier to reject the null, thus making Type I errors more likely
Higher confidence level=Type 2 error -made it harder to reject null and • Higher confidence levels (e.g., 99%, 99.5%) make it harder to reject the null, thus making Type II errors more likely
Confidence level reflects _________
sample size
Low confidence level and large sample size = type __ error
1
High confidence level and small sample size= type__ error
2
Statistical Significance
- Statistical significance as a yes/ no question; cannot be ‘more’ or ‘less’ significant
- Selection of confidence level affects likelihood of something being found statistically significant
P-hacking
P-hacking= play with models in order to just find statistically significant findings(dropping
variables to obtain significant results)
Harking
‘harking’ (hypothesizing after results are
known)
p values= ______ values
significance
substantive signnificance
- relationship or a statistic is substantively significant if it is theoretically important, if it plays a role in elaborating, modifying, or rejecting your theory
.need for things to be meaningful
- does it have a real impact on theory? IT SHOULD
- if statistics do not meet the confidence criteria but are included anyway it is because the researcher thinks there is a significant finding there despite is being rejected
T OR F
Some substantively significant findings are
not statistically significant
T
confounding variable
Confounding variable: extraneous variable that affects both of the correlated variables and makes it seem like there is a relationship between them.
scatterplot(independent and dependent variables) for interval level data
x-axis is _____
y-axis is _____
independent
dependent
Positive correlation:
IV and DV change in same direction (e.g.,
increase on IV corresponds to increase on DV)
Negative correlation:
IV and DV change in opposite directions (e.g.,
increase on IV corresponds to decrease on DV)
Contingency Table: Indepdentand Dependent Variables for non-interval data
Row:_____
olumn:_____
IV in column
DV in row
Considerations on Bivariate Relationships(4)
- Is there a relationship?
- What is the direction of the
relationship? (ordinal and interval
level variables only) - What is the strength of the
relationship? - Is the relationship statistically
significant? (inferential statistics)
Perfect correlation
Perfect Correlation:
knowing the value on one variable
always lets us know the value on
the other
Interpreting Measures of Association
Nominal:
Ordinal and Interval:
Nominal Level:
Range 0 to 1
• 0 = no relationship
• 1 = perfect relationship
Ordinal and Interval Level: Range -1 to +1 • -1 = perfect negative relationship • 0 = no relationship • +1 = perfect positive relationship -/+ indicates direction, not strength
___ or higher is usually a ‘strong’ relationship
0.5 or higher is usually a ‘strong’ relationship
Researcher must select appropriate measure
of association to use for Nominal, Ordinal, and interval based onlowest level involved
Nominal:
Ordinal:
Interval:
1.Nominal level
- nominal-nominal
- nominal-ordinal
- nominal-interval
.Measures: Cramer’s V or Lambda
-Cramer’s V tends to overestimate strength
-Lambda can underestimate strength
2.Ordinal level
-ordinal-ordinal
- ordinal-interval
Measures: Gamma or Tau-b or Tau-c
-Gamma can overestimate strength
-Tau-b only for square tables (e.g., 3x3)
-Tau-c only for rectangular tables (e.g., 3x4)
3.Interval level
-interval-interval
Measures: Pearson’s R or Spearman’s rho
-Pearson’s R used for linear relationships
-Spearman’s rho used for non-linear relationships
Pearson’s R used for..
Pearson’s R used for linear relationships(interval)
Spearman’s rho used for..
Spearman’s rho used for non-linear relationships(interval)
Gamma can be used for..
Gamma can overestimate strength(ordinal)
Tau-b can be usedfor…
Tau-b only for square tables (e.g., 3x3)(ordinal)
Tau-C can be used for..
Tau-c only for rectangular tables (e.g., 3x4)(ordinal)
Cramer’s law can be used for..
Cramer’s V tends to overestimate strength(nominal)
Lambda can be used for…
Lambda can underestimate strength(nominal)
Practice at determining lowest level of measurement
1• Gender and feelings about party leader (0-100 feeling thermometer) 2• Age (in years) and feelings about party leader 3• Age (in categories) and feelings about party leader 4• Partisanship and attitudes about oil sands policy (scale) 5• Age (in years) and attitudes about oil sands policy 6• Ideology (left-centre-right) and attitudes about oil sands policy
- gender nominal(cant rank them), feelings interval= lowest is nominal
- age interval, feeling interval or ordinal= ordinal or interval
- age ordinal, feelings interval or ordinal=ordinal
- partisanship nominal(yes or no), attitudes interval=nominal
- age interval, attitudes are ordinal or interval=ordinal or interval
- ideology nominal, attitudes ordinal or interval=nominal
For each of these questions what would you use to get answers?
- Is there a relationship?
- What is the direction of the relationship?
(ordinal, interval) - What is the strength of the relationship?
- Is the relationship statistically significant?
- Is there a relationship?
- Contingency table, scatterplot - What is the direction of the relationship?
(ordinal, interval)
- Contingency table, scatterplot - What is the strength of the relationship?
- Measures of association - Is the relationship statistically significant?
- Inferential statistics
How to examine changes with control variables
1. Consider crosstab tables, measures of association, and inferential statistics for each value of the control variable. 2. Look for changes in measure of association and statistical significance
Relationship may:
- holds constant
- gets stronger
- gets weaker/disappear
- vary across categories
reinforcing variable
Reinforcing variable: strengthens relationship between
independent and dependent variables
If relationship weakens after controling for variable..
Relationship Weakens/Disappears After Control Suspect either confounding variable (spurious relationship) OR intervening variable
If relationship strengthens after control for variable…
suspect a Reinforcing variable:
Spurious or Intervening?
• Answer lies in theory and logic, not in statistical tests • Consider temporal order. • If control precedes IV, suspect spurious. • If control follows IV, suspect intervening.
To figure that out you either need a temporal order or you have to look through the theory behind it
Multivariate Analysis
when you bring in and test all control variables at once
Ask Yourself: • Which variables in the model are statistically significant? • What is the nature of the relationship between an IV and the DV when all other variables are controlled? • What percentage of variation is explained by the model as a whole?
When we control for avariablethe result may be..4
Relationship may: - holds constant ànot relevant to relationship - gets stronger à reinforcing - gets weaker/disappear àspurious or intervening - Vary across categories à conditions relationship