Final Exam Flashcards
Theory
to create explanations for what we see
Hypothesis
create statements derived form theory which is testable and observable
Evidence
test hypotheses with data, cases, and experiments
Causal theory (applicable to political scientists)
- World is in terms of variables and causal explanations, explore this relationship
Independent variable
Presumed cause
Dependent variable
presumed effect/outcome
- Value of dv depends on value of IV
Causal theory breakdown
- Theory
- IV
- DV
- hypothesize
operationalization
process by which abstract concepts are turned into ‘real world’ observations
- you can measure/collect data on unobservable phenomenon
Hypothesis
theory based statement about relationsips that is observed– statement from theory that is testable and observable
Bivariate theory
most causal theories look at relationship between two variables (does X cause Y; does independet variable cause dependent)
most real world problems are multivariate, so how to account for bivariate
control for other factors (z) and other possible causes of Y
Spurious relationships
two or more events or variables are associated but not causally related, due to either coincidence or the presence of a certain third, unseen factor.
how to control for spurious relationship
when z is introduced, relationship bw X and y disappears, if Z is helf constant, then relationship disappears
Additive relationships
Third variable (Z) → DV ← IV (X) X causes Y, but Z also causes Y, both are independent to each other (X does not affect Z, and Z does not influence relationship bw X and Y)
Interactive Relationships
Ex/ Relationship between education and abortion rights depends on gender
- Relationships bw X and Y depends on value of Z
How to control for interactive relationships
- Correlation bw X and Y disappears in both groups= spurious relationship
- Observe correlation still exists and does not differ bw females and males = additive relationship
- Observe correlation still exists in females but not in males (education does impact opinions on abortion rights, but depends/is conditional on gender) = interactive relationship
- Observe correlation still exists in both groups but stronger in females = depends on gender, but greater impact of opinion with females
Types of variables
Categorical
Ordinal
Continuous
Label
description of variable
label: “gender of survey respondent”
values
denominations in which variable occurs
Values: male, female, other
categorical variables
- Limited categories
- Cannot make universally-held ranking distinctions
- Impossible to rank order the categories from least to greatest
- Label: religious ID ;
- values: christian, jewish, muslim
- EX/ race, gender, age, sex
ordinal variables
- Universal ranking distinctions
- Rank order the categories from least to greatest
- Assign scores 1,2,3,4,5 to categories; 5 is assigned for best situation
- Size of difference between categories is inconsistent
EX/ socioeconomic status: low income, middle income, high income
continuous variables
- Have equal unit differences
- Universal ranking distinction
- Size of difference between categories is consistent
- If you can measure it, it is continuous; any value within a range
- Discrete and continuous variables are numeric/scale variables as opposed to categorical variables
- Use term continuous variables for discrete variables
measures of central tendency (typical or average of a variable)
Median
Mode
Mean
Research design
- quantitiative
- qualitative
Quantitative research (3 forms of research)
- uses numerical data for statistical analysis
1. surveys
2. observational studies (to determine if an existing condition is related to a characteristic of interest ie. smoking causes lung cancer)
3. experiments (condition is created by imposing a treatment on the sample) –> researcher both controls and randomly assigns values of iv to subjects
qualitative research
- does not use numerical data for sa
- collects/analyzes non-numerical data (text, video, or audio) to understand concepts, opinions, experiences
- EX/ how does social media shape the image of female politicians?
observational studies
taking the world as it already is no controlled setting
why is randomization important
- if x is determined by pure randomness then it should not be correlated with any variable including Z
- takes systematic differences out of play
- subjects will not be systematically different from one another
- helps show that the observed Y is caused only bc X
To control for possible causes of Y to overcome spurious relationship between X and Y in observational studies
use statistical controls
every causal relationship is….
potentially spurious; bc we can never know for sure until all possible third variables are tested
the most common approaches in political science is
observational designs
investigator can make sure to control for factors he or she has not thought of in:
Experimental Designs
Which approach is stronger, experimental or observational
design?
Experimental Designs
Investigator controls for the possible effects of all rival
explanations in:
observational designs
internal validity
degree of confidence in causal relationship bw X and Y and ability to draw these conclusions
external validity
extent to results being applied to and across different environments/outside world
- generalizable data?
data sets (2)
- population
- sample
Population
data for every possible relevant case
sample
a subset of cases that is drawn from underlying population
types of samples
- random sample
- non-random
random sample
each member of population has an equal probability of being selected in the same
non-random
sample of convenience
populations vs. samples
Not interested in properties of same per se
Interested in underlying population as a whole
Statistical inference
Use what we know to be true about one thing (sample) to infer what is likely to be true about another thing (population)
Process of making probabilistic statements about population characteristics based on our knowledge of sample characteristics
How can we learn about population from sample? Use stat inference
How to infer
Things are known w certainty - like the mean of some variable in our sample
- central limit theorem
central limit theorem
used if only a sample of data and wanting to know anything about vast majority of individuals for whom we don’t have data; invokes kind of distribution called normal distribution
Probability Theory
- Outcome: result of a random observation
- Independent outcomes: two more outcomes can be said to be independent if the realization of one of the outcomes does not affect realization of other outcomes
Properties of probability
- All outcomes have probability ranging rom 0 to 1
- If we roll a die, what if probability of outcome 2?: ⅙ - Sum of probabilities of all possible outcomes must be exactly 1: all possible outcomes: 1,2,3,4,5,6 → probability of each outcome is ⅙, add ⅙ 6 times you get 1
- IF two outcomes are independent, then probability of those events both occurring is equal to the product of them individually
Normal distribution
- is most important probability distribution in statistics
- Called a bell-shaped distribution with some unique features
- Distribution is symmetrical around the mean
- certain % of cases fall within the a distance of mean
68% of values are within 1 SD
95% of values are within 2 SD
99.7% of values are within 3 SD
frequency distribution
- distribution of scores
- not normally shaped
statistical inference
use what we know to be true about one thing (sample) to infer what is likely to be true about another thing (population)
- process of making probabilistic statements about a population characteristics based on knowledge of sample characteristics
How to know about vast majority of data with only a sample of data?
Central limit theorem
Central limit theorem
- take the mean of each sample
- plot those means
- the hypothetical distribution of sample means- a sampling distribution- will be normal
* *no matter the shape of frequency distribution, sampling distribution will always be normal
mean of the sampling distribution=
true population mean
mean of the sampling distribution=
true population mean
Standard deviation of the sampling distribution=
standard error of the mean
if we know the standard deviation of distribution, we can…
estimate the confidence interval for the population mean
which is most common standard of confidence interval to use
95%= 95% of all possible sample estimates will fall within the interval by chance
“sample mean -(2x standard error)
at lower end
sample mean + (2x standard error)
at the upper end
As sample size increases, distribution of the sample means tends toward…
the normal distribution
sample mean can be considered approximately normal distributed if sample size is at least?
30
central limit theorem only applies to samples that are selected…
randomly
*does NOT apply to sample of convenience
what do “surveys” says about the population as a whole in central limit theorem?
BC samples are clearly not random samples of underlying population, the answer is “nothing”
the smaller the standard errors
the tighter our resulting confidence intervals will be
if interested in estimating population values, based on samples, with as much precision as possible, then we want
tighter confidence intervals
probability distribution around mean is…
symmetrical
if population is normally distributed, then sample mean…
is also normally distributed
When sample size is small, how do you make inferences about population mean?
use t-distribution
As n grows, what happens to t-distribution
begins to resembles a normal curve
t-distribution for a small sample curve
- fatter tails than normal
- wider boundaries on SE
- less confidence in accuracy of sample static
- boundaries of 95% CI are not fixed
Forms of Hypothesis Testing
- confidence interval
- p-value approach
what is included in the p-value approach?
- z-score
- standard normal distribution
- P-value
Test-hypothesis
- method from sample data to decide whether null hypothesis should be rejected
null hypothesis
H0: hypothesis of no change from current opinion/rejection of current theory
two conclusions from hypothesis-testing analysis
- reject H0 (in favor of the alternative hypothesis)
2. Fail to reject H0 (continue to believe null hypothesis)
p-value
number describing how likely it is that data would have occurred by random chance (that null hypothesis is true)
- level of confidence with which we can reject the null hypothesis
common p-value mistakes
- it is not the probability that null hypothesis is correct (that can’t be proven)
- it is not the probability of making an error
Z-score
- number of standard deviations a specified data point lies from the mean
estimate of how far difference in values is from pop mean, relative to the spread of our data set
standard normal distribution
mean=0; SD=1
- total area under curve is 1
- 95% of cases fall within distance of 1.96 of the mean
What does p-value mean? (3)
- probability of an observed relationship given that the null hypothesis (no relationship bw X and Y) is true
- prob. that we see relationship bw two variables if there were truly no relationship bw them in unobserved pop.
- prob. that we see relationship that we are finding bc of a random chance
high p-value
high prob. that relationship likely happens by random chance, so it does not support theory
low p-value
low prob. that relationship unlikely occurs by random chance, so relationship likely does exist
- the more contradictory the data is to H0
- more confidence that there is a systematic relationship
- relationship is statistically significant
limitations of p-value
- has nothing to do with strength of relationship only talks about confidence that there is a relationship bw two variables in society
- does not say whether relationship is causal