Chapters 14-16 Quantitative Data Analysis Flashcards
Three Common Arguments and Claims in Quantitative Political Science
Descriptive claims - %
Claims of group differences
Claims of relationships between variables
Descriptive data helps us understand ____ variable(s).
One
Two Types of Basic Descriptive Statistics
Measures of central tendency
Measures of dispersion/variation
Selection of Univariate Stats/Levels of Measurement
Nominal
- central tendency is found through modes
- dispersion is found through variation ratio
Ordinal
- central tendency is found through mode and median
- dispersion is found through variation ratio and range
Interval
- Central tendency is found through mode, median and mean
- Dispersion is found through variation ratio, range, and standard deviation
Three Possible Measures of Central Tendency
Mode - that which occurs most frequently
Median - the sample median is the middle value when in order to increasing magnitude
Mean - average
Pros and Cons of the three Measures of Central Tendency
Mode cons: - susceptible to categorical construction (green and not green vs. green, ndp, lib, etc.) - doesn't use all data Mode pro: - can use with nominal measures Median con: - does not use precise values Median Pro: - stable, not affected by extreme values Mean pro: - uses precise values Mean con: - skewed by extreme
Statistical Distribution and Measures of Central Tendency
If stats are normally distributed (in a nice curve), then all three will be the same
If not distributed nicely, different central tendencies will pull data in different directions
3 Measures of Dispersion, and why we need it
Standard Deviation
Variation Ratio
Range
Because central tendency doesn’t give us all the information!
Deviation and Standard Deviation
- how far an individual score is from the mean
- standard deviation is the average deviation
- effected significantly by outliers (like all means) and sample size
Mean is appropriate to use when…
The standard deviation is minimal.
Variation Ratio
The number of cases that aren’t in the modal category.
High ratio means data are more dispersed.
Range
- Difference between highest and lowest score
- Can’t be used for nominal obvi
If there is an even number of cases and the two middle values are different, the median becomes…
The mean of the two middle numbers.
Positive and Negative Skew
Negative skew (low extremes) Positive skew (high extremes) Too many make mean a bad central tendency to use.
Is 80/100 cases are in the modal category, the variation ratio is…
0.2
So small.
Null Hypothesis
Mean of the control = mean of the treatment group
Alternative Hypothesis
Mean of the control group isn’t equal to that of the treatment group
Type 1 Error
False positive
Type 2 Error
False negative
Inferential Statistics
Stats which test the probability that sample statistics are reasonable estimates of population parameters.
5 Steps of Hypothesis Testing
- Formulate Null and Alternative
- Select a confidence level
- Calculate the appropriate inferential statistic
- Using the table for the test statistic, find the critical value (expected value) at the selected confidence level.
- If the calculated statistic equals or exceeds the critical value, reject the null.
Confidence Levels are determined by…
Probability.
Use Inferential Statistics to Reject the Null Hypothesis and find a relationship
…
If you find false positives to be more objectionable than false negatives, you will will likely want a ____ confidence level.
Higher.
Lower confidence levels (I feel like it should worded lower confidence criteria) make it ____ to reject the null hypothesis.
Easier.
Higher confidence levels make it ____ to reject the null hypothesis.
Harder.
I find the terms high or low confidence levels confusing as fuck because…
When they say high confidence levels they don’t mean how confident one is that their is a causal relationship, they mean that there’s a higher confidence threshold/criteria before one can assert that there is a causal relationship.
Question to Ask of Descriptive Stats
Are the sample data representative of the population?
Some Questions to Ask of Claims of Differences Between Groups
How large are the differences?
Are they due to chance?
Some Questions to Ask of Claims of Relationships Between Variables
How strong is the relationship?
Is it due to chance?
Is it a causal relationship?
Alpha Level
(Aka confidence level) • probability that the sample statistic is an accurate estimate of the population parameter, and the population parameter lies within an estimated range of values (known as the confidence interval)
Confidence Interval
The Estimated Range of Values for the Population Parameter
If the sample statistic is 45% and the confidence interval is +/- 3%, then the confidence interval would be…
42%-48%
P > 0.10
o 90% of confidence intervals would contain the population parameter – 10% would not
A higher confidence level (criteria) means that the sample statistic will reflect the population parameter ____ accurately, but ____ precisely.
More accurately (because you have a wider confidence interval), and less precise (because so many possible numbers)
A lower confidence level (criteria) leads to a _____ confidence interval.
Narrower / smaller
A lower confidence level makes it _____ to reject the null hypothesis, and has a better chance of leading to a type _ error.
Makes it easier to reject the null hypothesis.
Likely to lead to a type 1 error.
(Do vice versa in yo head)
Greater sample sizes will result in ___ sampling errors.
Fewer.
Three Considerations Prior to Deciding Confidence Levels
What’s your sample size going to be? The higher the size, the higher the confidence criteria should be.
Does it have tight controls? Then it should be higher.
Is it exploratory? If so, it can be smaller.
Effect Size
The difference between a control and treatment group regardless of sample size.
Should be considered in addition to p value.
Confidence levels must be understood in relation to ____
Sample size.
Substantively Significant
The extent to which something actually matters. (Ex.: Does it modify, build upon, or reject your theory?)
4 Questions when Asking Whether to Reject Null Hypothesis or Not
Sample size
Confidence level
Confidence level appropriateness
5 Criteria for Causality
We can use measures of association to…
measure the strength of a bivariate relationship.
5 Considerations when looking at Bivariate Relationships
Is there a relationship?
What is the direction of the relationship? (not nominal)
What is the strength of the relationship?
Is the relationship statistically significant?
Does the relationship continue to exist when other measures are controlled?
Independent variable goes on _ axis, while the dependent variable goes on the _ axis.
IV = X DV = Y
Perfect Correlation
knowing the value on one variable
always lets us know the value on
the other
Weak, Moderate, Strong
Correlation:
Knowledge of the IV allows us to
better predict the value of the
DV
Two Things Measures of Association Do
- condense the patterns in a contingency table or scatter plot into a single numerical value - provide a standardized and compact way to convey relationship information
Measures of Association for Nominal v Ordinal and Interval
Nominal’s range is 0 - 1. The other two it is -1 to +1.
Interpreting Associations
0.00 No relationship \+/- 0.01-0.09 Very weak \+/- 0.10-0.20 Weak \+/- 0.21 to 0.30 Moderate \+/- 0.31-.049 Moderately strong \+/- 0.50-0.99 Strong, very strong 1.00 Perfect relationship
How to determine:
- Is there a relationship?
- What is its direction?
- What is its strength?
- Is it statistically significant?
- Does the relationship still exist when other variables are controlled?
- Contingency table, scatterplot
- Contingency table, scatterplot
- Measures of association
- Inferential statistics
- Contingency tables with controls, regression analysis
2 Types of Descriptive Statistics
Measures of central tendency
MEasures of variation
Three Possible Outcomes after New Variable is Added to Bivariate
The bivariate:
holds constant - that would increase confidence in original relationship
relationship gets stronger - reinforcing variable
relationship is gone - confounding relationship or intervening variable is doing something