Exam preparations Flashcards
What is field notes?
Field notes are a qualitative research tool used to record observations, thoughts, and reflections during or after fieldwork. Field notes are a qualitative research tool used to record observations, thoughts, and reflections during or after fieldwork.
Often used from ethnographic studies. other things just that the verbal communication that is of interest
What is archival data?
Archival data refers to information that has been collected and stored in a systematic way, typically for non-research purposes, but that researchers can later use to address specific questions or hypotheses.
The data is NOT collected by the researcher but sourced from ex historical records. Because it is pre-existing, researchers use it as a cost-effective and efficient way to investigate patterns, trends, and relationships without conducting primary data collection.
Types:
Records, photographs, audio recordings, statistics.
What is grounded theory?
Grounded theory is a qualitative research methodology focused on developing theories directly from data rather than testing pre-existing theories.
Important features:
Data-driven
Constant comparision
Open-ended
What is trustworthiness and how can we make the research trustworthy?
what is trustworhtiness and how can we make the research trustworthy?
key compomemts:
credibility
transferability
dependability
What is ontology?
“Does gravity exist?” Yes! our view on the world; how do we look upon reality
What is epistemology?
“How do we know gravity exists?” Through evidence!
what can we know about reality, our knowledge about something
What is abduction?
going back and forth between inductive and deductive
Can regression analysis detect causation?
Yes, with a casual research design.
What is a population regression function?
describes the relationship between a dependent variable (outcome) and one or more independent variables (factors) for the entire population.
ex Imagine you want to know how study hours (independent variable) affect exam scores (dependent variable) for every student in the world.
How can you check for linearity and homoskedasticity?
Scatterplots
How can you check for multicollinearity?
Correlation matrices
How can you check for autocorrelation?
Durbin-Watson
How can you check for normality?
Shapiro-Wilk and Kolmogoro-Smirnov
What are the consequences when the Linearity assumption is violated and what is the solution?
When the linearity assumption is violates, the relationship between X and Y is not linear.
Consequence: biased estimates
Fix: Use a nonlinear regression model
What are the consequences when the Homoskedasticity assumption is violated and what is the solution?
When its violated we have heteroskedasticity.
Fix: Use Robust Standard Errors or Weighted Least Squares
What are the consequences when the No Perfect Multicollinearity assumption is violated and what is the solution?
In that case we have multicollinearity: independent variables are highly or perfectly correlated.
Fix: Remove or merge correlated varaibles. Principal Component Analysis
What are the consequences when the Normality of Errors assumption is violated and what is the solution?
The residuals (errors) are not normally distributed. Coefficient estimates remain unbiased, but hypothesis tests (e.g., t-tests, F-tests) may be invalid, especially in small samples.
Fix: Use non-parametric methods (which are not dependent on the normality assumption), or check if large samples mitigate this issue (Central Limit Theorem). Or use large samples.
What are the consequences when the “No Autocorrelation” assumption is violated and what is the solution?
No autocorrelation = No Independence.
autocorrelation is just the term for when the independence assumption in ols regression is violated.
So, when violated, errors are correlated across observations.
- Adjust your model to directly address the source of autocorrelation (e.g., include lagged terms).
- Use robust standard errors (like Newey-West) to correct for the issues in residuals.
It is necessary for independent and dependent variables to be normally distributed?
No - the independent and dependent variables do not need to be normally distributed. Regression can handle variables of any distribution, like skewed.
For the errors and residuals, yes. That is an assumption in OLS regression.
How can we detect non-normal distributions of the residuals? (3)
- Histogram. Non-normality appears as skewed distributions or outliers.
- Q-Q Plot. Compares the distribution of residuals to a normal distribution. Points should align along a straight diagonal line if residuals are normal. Deviations from the line indicate non-normality.
- Shapiro-Wilk/Kolgomorov-Smirnow
What is a good fit of a regression and how can we measure it?
A good fit = The model explains a large proportion of the variability in the dependent variable. Residuals (differences between observed and predicted values) are small and randomly distributed.
The model meets assumptions (e.g., linearity, independence, homoscedasticity).
Predictions are accurate for the data.
Measured by R squared. It ranges from 0 to 1. 1 = Perfect fit.
What are outliers and how can we mitigate their impact on a regression?
Outliers are extreme values. We can detect them via scatterplots or boxplots ex.
Three ways to handle it:
Transforming
Trimming (remove them)
Winsorizing (reduce their impact)
What is the benefit from simple regression → multiple regression?
A simple regression only accounts for one independent variable to explain the dependent variable.
What is Zero mean of the residuals?
Refers to the overall average of the residuals across all observations.
What is Zero conditional mean?
Exogeneity. Ensures that the independent variables are uncorrelated with the error term u.
The covariance between independent variables and the residuals is zero.
What is constant variance of the residuals?
Homoskedasticity
What is omitted variables and its consequences?
Omitted variables are important factors that influence the dependent variable Y but are not included as independent variables X in the model. This creates omitted variable bias.
How can you address omitted variables?
- For panel or longitudinal data, fixed effects can control for omitted variables that are constant within individuals or groups.
- Randomized Control Trials
What is overspecification and its consequences?
Overspecification occurs in regression analysis when the model includes too many independent variables, some of which are irrelevant or redundant. These extra variables do not improve the model’s ability to explain the dependent variable Y and can even harm its performance.
Consequences: Multicollinearity
Do dependent and independent variables need to be normally distributed?
No.
Vi kan ju ha en bra modell trots tex negative skewness.
What is a Q-Q plot?
A Q-Q plot (Quantile-Quantile plot) is used to compare the distribution of a dataset to a theoretical distribution (e.g., normal distribution) by plotting their quantiles.
Assess if residuals from a regression model are normally distributed.
What is a scatterplot?
It’s used to show the relationship between two variables by plotting their values as coordinate points.
Ex: Explore if study hours X and test scores Y have a linear relationship.
What is sampling error and how can you solve it?
Sampling error is the difference between a sample statistic (e.g., sample mean, sample proportion) and the corresponding population parameter (e.g., population mean).
For example:
You survey 1,000 people from a city to estimate the average income. The sample mean might differ from the actual population mean due to sampling error.
SOLVE IT BY INCREASING THE SAMPLE SIZE
What is random sampling?
Ensure the sample is chosen randomly so every member of the population has an equal chance of being selected. This reduces the likelihood of systematic bias.
What is stratified sampling?
Divide the population into subgroups (strata) based on characteristics like age, income, or region, and take random samples from each.
Ensure that each subgroup is proportionally represented in the sample.
What is cluster sampling?
Divide the population into clusters (e.g., geographic areas or naturally occurring groups) and randomly select clusters for sampling. Makes data collection more efficient by sampling whole clusters instead of individuals.
What is a quasi-experiment and how is it different from RCT?
A quasi-experiment lacks the random assignment of participants to treatment and control groups, which RCT has.
What is difference-in-difference?
Compares outcomes between a treatment group and a control group before and after an intervention.
What is panel data?
Panel data (also known as longitudinal data) is a type of dataset that contains observations of multiple entities (such as individuals, firms, countries, etc.) over multiple time periods. It combines elements of cross-sectional data (data collected at one point in time) and time-series data (data collected over time for a single entity).
What is fixed effect?
Fixed effects control for factors unique to each entity that do not change over time (time invariant, culture)
Its cross-sectional but with fixed time.
Time-invariant factors are automatically controlled in FE for because these models focus only on within-entity variation over time.
What is random effect?
Time-invariant variables can be estimated because random effects assume that these factors are uncorrelated with the independent variables.
Is one way panel regression model the same thing as fixed effect?
Not exactly. A one-way panel regression model can be a fixed effects model, but it can also be a random effects model depending on how the unobserved effects are treated.
What is One-Way Panel Regression Model?
A one-way panel regression model accounts for unobserved effects that vary only across entities or only across time, but not both.