11 Quantitative Methods - Different Regressions Flashcards
Questions on Wenzelburger, Georg et al. (2014): Gepoolte Zeitreihenanalyse. In: Weiterführende statistische Methoden für Politikwissenschaftler. De Gruyter Oldenbourg. (pages to read: 125-145).
What is binary logistic regression?
Binary logistic regression is often employed when the dependent variable has two outcomes (e.g., whether you pass the exam or not). In essence, the technique is about understanding how the increase in X influences the probability of Y.
Why may it sometimes make sense to employ time series cross sectional analysis?
If there are regression analysis to be done for different e.g. countries with the year and the variable, you can pool it together instead of creating X amount of different graphs for each country. You can pool different e.g. different countries with its own variables and outcomes in a large graph and compare the results and the correlation which allows to analyse the countries and the variance. The pooled time series analysis is also possible for non-metric dependent variables. It is very useful for comparative political researchers.
(2) What are the challenges and the solutions to the time series cross sectional analysis?
The data are often (e.g. data from year X and X+1) correlated, thus the outcome does not necessarily represent a causation.
Heterogeneity: The pooling of countries and years leads to the view that each of the observations are not independent from each other. There are three types of heterogeneity: heterogeneity of units (e.g. very different y-axis of different countries), heterogeneity of the parameter (the correlation of
some variables have different effects); heterogeneity of dynamics and structure (time structure of
effects is different in different countries e.g. due to different governance system).
However, with heterogeneity of units, there are issues that error terms are present in the regression
due to unobserved country-specific variables which distort the pooled OLS when they correlate with
independent variables and have also an effect on the dependent variables. Thus it is important to
split the error term by creating an isolated error term e.
To deal with the error term u one can use pooled OLS regression with corrected standard error or a
random-effects “guesser” which can create efficient guessed results by creating dummy variables
that absorbs the unit-specific heterogeneity.
LSDV-method can be used to homogenise the data, which is very extreme as it excludes some
countries.
What are examples for different regression techniques?
The dependent variable (e.g., continuous, discrete etc.)
Shape of the regression line
Independent variables that can be introduced to the regression
How to select a specific regression technique?
In social sciences, some regression techniques are more common than others. To find out (a) if regression is suitable for your research question and (b) whatregression technique is appropriate, take a look at previous research on the topic. In any case, it is important to understand that OLS cannot be used for everything.
What is multilevel modeling? For what is it used?
Often we assume that our dependent variable is influenced by independent variables
at ”different levels”.
For example the attitude towards the war in UA can be influenced by age, education, part of ex-USSR etc. And these factors also are in a relationship towards each other (e.g. older person whose country was part of USSR)
So there are different levels of data. And there are independent variables at any level and dependent variables at the lowest level.
What is Survival analysis? For what is it used?
Often we are interested not only in if something happens, but also how long time it takes before it happens. E.g. school closure during the pandemic in different countries
E.g. which variable led to school closures? Which kind of gov. (left/right) were likelier to close schools first?
Because OLS cannot deal with independent variables which change during the observation period (of closure), Survival analysis fits here
What is Binary Logistic regression? For what is it used?
When there are two (in binary) outcomes possible to a question. E.g. Is a woman pregnant or not?
But there can be more e.g. multinominal logistic regression (3+ outcomes) or ordinal logistic regression (the dependent variables ore ordered, as in the person has one child, two children, three children and so on)
Example: Did you vote -> Yes or No
What is your gender: male or female
The interaction between variables leads to the question: How does the increase in X (e.g. gender) influence the probability of Y (e.g. voting)
We then can measure the odds and probabilities of voting based on gender.
What is an example for the Time series analysis?
Der Einfluss von Parteien auf die Generosität in der Arbeitslosenversicherung