POLS285 Exam (REVIEW OTHER DECKS) Flashcards

1
Q

What is a confounding variable and what are its implications for causal analysis?

A

Confounding variable: Variable that is correlated with both the independent and dependent variable that alters the relationship between them.

Causal analysis: Third variable relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

how do they affect the size, and direction of omitted variable bias, and how experiments (or random assignment of our independent variable) make omitted variable bias go away?

A

Size: (KW 230)

Direction of omitted variable bias:

Experiments:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is systematic and non-systematic sampling error and their relationship to coverage bias, non-response bias, sample size, population variance and different sampling techniques (probability non-probability samples)?

A

Systematic sampling error: occurs when a sampling method consistently selects samples that are not representative of the population being studied. This can happen when a sampling method follows a particular pattern or systematic approach.

Non-systematic sampling error: Occurs due to chance variation, which can result in a sample that is not perfectly representative of the population.

Relationship to bias: Both can cause coverage and non-response bias.

Relationship to sample size: Larger sample size can reduce the impact of random sampling errors.

Relationship to variance: Higher population variance can increase the impact of both systematic and random sampling errors on accuracy of estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe tools that social scientists use for visualizing statistics.

A

Bar chart:
Plots numeric values to compare categorical data with multiple variables (e.g birthdays in a specific month over year). Bar charts have a numeric scale, not intended to display relationship between variables so it is nominal.

Histogram:
Plots distribution of a numeric variables values as a series of bars (height indicates frequency of data points within a bin). Record frequency of a single variable, do not record relationship between two variables (e.g. test results

Box and Whisker plot: Used to show distribution of numeric data values, especially between multiple groups. Useful comparing distributions of different variables or the same variable across different groups. Convey information about a single numeric variable or relationships between them.

Scatter plot:
Relationship between two numerical data from same sampling unit. Representation of the relationship between two continuous variables. Identifies patterns and trends in the relationship between two variables (positive or negative correlation or lack of). Variables are numeric (e.g. relationship between height and weight).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the differences between substantive, and statistical significance; how do we measure or asses them; why its easier to measure and asses than one another, how its possible to have statistically significant relationship, even if the effect is small.

A

Substantive significance: a judgment call about whether or not statistically significant relationships are “large” or “small” in terms of their real-world impact. Assessed with effect size and expertise of a field.

Statistical significance: a conclusion, based on the observed data, that the relationship between two variables is not due to random
chance, and therefore exists in the broader population. Assesed through formulation of null/alternative hypothesis, t-test/correleation coefficients, p-value calculation and confidence interval estimation.

Differences: Substantive signifcance refers to practical relvance of research findings to broader context, while statistical significance refers to the likelihood that an observed relationship is not due to chance.

Measure and asses: T-tests, regression analysis

Possibility: If sample size i large enough even small results can be significant, as the sample size increases, the standard error decreases, which increases the likelihood of finding a statistically significant effect, even if the effect size is small.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is difference between probabilistic and deterministic understanding of causality, KW’s answer to why probabilistic is more suited for social sciences, how its reflected in average treatment effects, stochastic component of regression equation and R2 and root MSE?

A

-Probabilistic relationship: Increases in X are associated with increases (or decreases) in the probability of Y occurring, but those probabilities are not certain.

-Deterministic: If some cause occurs, then the effect will occur with certainty.

-Average treatment effect: The estimation of the ATE involves modeling the probability distribution of the response variable given the treatment variable, and estimating the difference in the mean response between the treatment and control groups.

-Stochastic component: This component reflects the probabilistic nature of the relationship between the predictor variable(s) and the response variable, and the fact that there are other factors that affect the response variable that are not included in the model.

-Root MSE: The RMSE measures the average distance between the observed values of the response variable and the predicted values from the model. This statistic reflects the extent to which the model captures the stochastic component of the relationship between the predictor variable(s) and the response variable, and the degree of variability in the response variable that is not explained by the model.

-KW statement: Human beings are not deterministic robots whose behaviors always conform to lawlike statements.

R2: This statistic reflects the goodness of fit of the model, and the extent to which the model captures the probabilistic relationship between the predictor variable(s) and the response variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a null hypothesis test, how is it conducted and what is its relationship with statistical inference, the central limit theorem and the concept of a sampling distribution?

A

-Null hypothesis:
What we expect to see if our theory is false, usually, it’s that the parameter is 0. In statistics, null hypothesis tests are always performed on the null; that’s why we call it a null hypothesis testing.

-Statistical inference:
By comparing observed data to null hypothesis, we can determine whether the sample provides evidence for or against the null, making inferences about the population based off the evidence.

-Central limit theorem:
Allows us to make assumptions of the sampling distribution of the t-statistic, which is necessary for calculating the p-value.

-Sampling distribution:
also important in null hypothesis testing, as it refers to the distribution of sample statistics that would be obtained if we were to repeatedly sample from the same population. By understanding its properties, we can make inferences about the population based on the sample data, and determine the likelihood of obtaining the observed data under the null.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is OLS, what is it trying to accomplish, how do we end up with the bivariate formula for b*, can you explain it with reference to a scatter plot of data, and discussion of the relationship between the residual values and the regression line?

A

OLS: The goal of OLS regression is to find the line of best fit that minimizes the sum of the squared differences between the predicted values and the actual values of the dependent variable.

Formula : OLS regression is derived by finding the slope of the line that passes through the means of the two variables in the scatter plot of data.

Relationship: The relationship between the residual values and the regression line is such that the sum of the squared residuals is minimized when the regression line is drawn through the means of the two variables. If the regression line were drawn through any other point, the sum of the squared residuals would be larger, indicating a poorer fit to the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is confidence interval, its relationship with statistical inference, between its width and our confidence level, KW’s and Purist interpretations of it and why they differ?

A

-Confidence interval: Range of values likely to include a true parameter value.

-Three levels of confidence: 90%, 95% and 99%. Refer to the probability of the true value (parameter) falling in the interval over multiple samples.

-Prime example of statistical inference using facts we know (samples statistics) to make probabilistic statements about facts we don’t know (population parameter).

-Width and confidence level: The width of a confidence interval will be smaller when you have a larger sample size (because larger samples make sample statistics more reliable). The width of the confidence interval will be larger when the confidence level is higher (because you can have greater confidence when you are less precise)

-Purist Objection: Purists object to this interpretation because they believe it can lead to misunderstandings about the meaning of the interval. Specifically, they argue that the confidence level does not measure the probability that the true parameter falls within the interval, but rather the probability that a random interval calculated from the same population would contain the true parameter. They argue that interpreting the confidence interval as a probability statement about the true parameter is incorrect.

-Purist Interpretation: acknowledges that the true parameter is unknown and that the confidence interval provides a range of plausible values, but it does not make a probability statement about the true parameter being within that range.

Example of purist: If we randomly sampled from the population over and over, the estimated confidence intervals would contain the true mean 95% of the time; they would exclude it 5% of the time.

Correct interpretation:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is average treatment effect and stochastic component of regression?

A

-Average treatment effect: The average treatment effect (ATE) is a concept in statistics that measures the difference in the mean response between two groups, one of which has received a treatment and the other has not

-Stochastic component of regression: The stochastic component of the regression equation is the error term or residual, which represents the variability in the response variable that cannot be explained by the predictor variable(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Brookman and Kalla study summary and report

A

Summary: Brookman and Kalla’s study on transphobia sought to investigate the effectiveness of canvassing as a means of reducing transphobia among voters. The study’s research question was whether a face-to-face conversation with a canvasser who shared a personal story about transgender discrimination would reduce prejudice towards transgender individuals.

The dependent variable in this study was the participant’s level of transphobia, which was measured using a scale of questions that assessed their attitudes towards transgender people. The experimental treatment was a 10-minute canvassing conversation with a canvasser who shared their personal story about transgender discrimination.

Causal mechanism: the psychological process of contact hypothesis, which suggests that increasing interpersonal contact with stigmatized groups can reduce prejudice towards them.

The study used a randomized field experiment design, with participants randomly assigned to either the treatment or control group. The main findings of the study were that participants in the treatment group exhibited a significant reduction in transphobia compared to those in the control group. This effect was observed both immediately after the canvassing conversation and up to three months later.

In terms of internal validity, the study had several strengths. The use of random assignment ensured that any differences between the treatment and control groups were not due to pre-existing differences in their characteristics. The study also used a well-established scale to measure transphobia and conducted pre- and post-treatment measurements to assess changes in attitudes.

However, there were also some weaknesses in the study’s internal validity. The study relied on self-reported measures, which could be subject to social desirability bias or other response biases. Additionally, the study did not measure potential spillover effects or the extent to which participants may have shared their canvassing experiences with others.

In terms of external validity, the study’s field experiment design enhances its external validity by testing the intervention in a real-world setting. However, the study’s sample was drawn from a single city and may not be representative of other populations.

Based on these findings, it seems that canvassing may be an effective strategy for reducing transphobia. However, more research is needed to determine the generalizability of these findings to other populations and contexts. Given the potential benefits of this strategy, I would recommend further testing of this approach in other settings and with other groups, with attention paid to potential modifications to the intervention to enhance its effectiveness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly