Exam preparations Flashcards
What is field notes?
Field notes are a qualitative research tool used to record observations, thoughts, and reflections during or after fieldwork. Field notes are a qualitative research tool used to record observations, thoughts, and reflections during or after fieldwork.
Often used from ethnographic studies. other things just that the verbal communication that is of interest.
Be careful with observer bias.
What is archival data?
A structured source of information that exists independent
of the researcher (one type of secondary data)
Not an objective mirroring of reality
Fragments and lost pieces
Pay attention to time (when written) and purpose (why
written) of a certain documen
What is grounded theory?
Grounded theory is a qualitative research methodology focused on developing theories directly from data rather than testing pre-existing theories.
Important features:
Data-driven
Constant comparision
Open-ended and minded
Often inductive, can be deductive.
conceptualisation of underlying patterns
What is trustworthiness and how can we make the research trustworthy?
what is trustworhtiness and how can we make the research trustworthy?
key compomemts:
credibility
transferability
dependability
What is ontology?
How we view the nature of reality. Assumption about the nature of reality.
“Does gravity exist?” Yes! our view on the world; how do we look upon reality
What is epistemology?
How knowledge is acquired.
“How do we know gravity exists?” through evidence!
Abductive reasoning can best be described in the following way:
An iterative process between data and theory
Can regression analysis detect causation?
Yes, with a casual research design.
What is a population regression function?
describes the relationship between a dependent variable (outcome) and one or more independent variables (factors) for the entire population.
ex Imagine you want to know how study hours (independent variable) affect exam scores (dependent variable) for every student in the world.
How can you check for linearity and homoskedasticity?
Scatterplots
How can you check for multicollinearity?
Correlation matrices
How can you check for autocorrelation?
Durbin-Watson
How can you check for normality?
Shapiro-Wilk and Kolmogoro-Smirnov
What are the consequences when the Linearity assumption is violated and what is the solution?
When the linearity assumption is violates, the relationship between X and Y is not linear.
Consequence: biased estimates
Fix: Use a nonlinear regression model
What are the consequences when the Homoskedasticity assumption is violated and what is the solution?
When its violated we have heteroskedasticity.
Fix:
- Transform the covariates prior to regression (log transformation)
- Robust Standard Errors such as White Standard Errors for larger samples
What are the consequences when the No Perfect Multicollinearity assumption is violated and what is the solution?
In that case we have multicollinearity: independent variables are highly correlated. This does not violate the assumption of “no linear dependence” because multicollinearity is not perfect collinearity.
Fix: Remove or merge correlated variables. Increase sample size as it increases SST.
What are the consequences when the Normality of Errors assumption is violated and what is the solution?
The residuals (errors) are not normally distributed. Coefficient estimates remain unbiased, but hypothesis tests (e.g., t-tests, F-tests) may be invalid, especially in small samples.
Fix: Use non-parametric methods (which are not dependent on the normality assumption), or check if large samples mitigate this issue (Central Limit Theorem). Or use large samples.
What are the consequences when the “No Autocorrelation” assumption is violated and what is the solution?
No autocorrelation = No Independence.
autocorrelation is just the term for when the independence assumption in ols regression is violated.
So, when violated, errors are correlated across observations.
Fix:
- Transform the covariates prior to regression. Re-specify model to incorporate path-dependency.
- Use Robust Standard Errors (like Newey-West) to correct for the issues in residuals.
It is necessary for independent and dependent variables to be normally distributed?
No - the independent and dependent variables do not need to be normally distributed. Regression can handle variables of any distribution, like skewed.
For the errors and residuals, yes. That is an assumption in OLS regression.
How can we detect non-normal distributions of the residuals? (3)
- Histogram. Non-normality appears as skewed distributions or outliers.
- Q-Q Plot. Compares the distribution of residuals to a normal distribution. Points should align along a straight diagonal line if residuals are normal. Deviations from the line indicate non-normality. It creates a smiley because of heteroskedasticity.
- Shapiro-Wilk/Kolgomorov-Smirnow
What is a good fit of a regression and how can we measure it?
A good fit = The model explains a large proportion of the variability in the dependent variable. Residuals (differences between observed and predicted values) are small and randomly distributed.
The model meets assumptions (e.g., linearity, independence, homoscedasticity).
Predictions are accurate for the data.
Measured by R squared. It ranges from 0 to 1. 1 = Perfect fit.
What are outliers and how can we mitigate their impact on a regression?
Outliers are extreme values. We can detect them via scatterplots or boxplots ex.
Three ways to handle it:
Transforming (reduce impact)
Trimming (remove them)
Winsorizing (replace them)
What is the benefit from simple regression → multiple regression?
A simple regression only accounts for one independent variable to explain the dependent variable. Reduces omitted variable bias.
What is Zero mean of the residuals?
Refers to the overall average of the residuals across all observations.