ARMS Week 1 Flashcards
Frequentist approach vs Bayesian framework
Frequentist:
- Test how well data fit H0
- p-values, confidence intervals, effect sizes, power analysis
Bayesian:
- Probability of hypothesis given the observed data -> prior information
- BFs, prior, posterior, credible intervals
Both can be used for estimation and hypothesis testing.
Frequentist estimation
- Emprical research uses collected data to learn from
- likelihood function
- probability of an event = frequency it occurs
Bayesian estimation
- prior information about u
- prior is updated and provides posterior distribution of u
- conditional probabilities -> P(A given B)
Pro: accumulating knowlegde
Con: results depend on choice prior
Posterior mean or mode
The mean or mode of the posterior distribution. Should add up to 1, just as the prior model probabilities.
Posterior SD
Standard deviation of posterior distribution (comparable to frequentist standard error)
Posterior 95% credible interval
Providing bounds of the part of posterior with 95% of posterior mass
Posterior Model Probability (PMP)
The probability of the hypothesis after observing the data
Depends on two criteria:
1. How sensible it is, based on the prior
2. How well it fits the new data
- PMP are relative probabilities. They are updates of prior probabilities with the BF.
Bayesian testing is comparative. What does this mean?
Hypotheses are tested against one another, not in isolation
What does the Bayes factor say?
How much support there is for H1 compared to H0.
BF10 = 10 -> support for H1 is 10 times stronger than for H0.
BF10 = 1 -> support for H1 is as strong as for H0.
Probability theory: frequentist vs bayesian
Frequentist:
- probability is the relative frequency of events
Bayesian:
- probability is the degree of belief
95% Confidence interval vs 95% credible interval
Confidence interval 95% (frequentist)
- if we were to repeat this experiment many times and calculate a CI each time, 95% of the intervals will include the true parameter value.
Credible interval 95% (Bayesian)
- there is a 95% probability that the true value is in the credible interval
- a zero present = not significant
Linear regression
Lineair association between a dependent and independent variable
Residual
Difference between value and the line in the plot (we can’t explain them).
- we want to minimize residuals
Multiple Lineair regression (MLR)
Lineair regression with multiple predictors (independent variables)
Assumptions:
- Dependent variable is continuous (interval/ratio)
- Independent variables are continuous or dichotomous (nominal with two categories)
- Linearity of relations (the L in MLR) -> checked with scatterplots
- No outliers
Dummy variables
Has value 0 and 1. Is used in MLR to code data suitable for this approach (interval/ratio).
Number of dummy variables is equal to the groups minus 1
R^2 vs adjusted R^2
R^2 -> proportion of explained variance in the sample
Adjusted R^2 -> proportion of explained variance in the population
R
Correlation coefficient, explains how much the variables correlate with one another
B value
Unstandardized effect
B0
Intercept
Hierarchical MLR
Comparing two nested models (multiple formulas) -> one model is a smaller version of the other
R^2 changed
How much more variance does the second model explain compared to the first?
Prior
Existing knowlegde before looking at own data
Prior model probabilities
How likely is each hypothesis before seeing the data.
- add up to 1 -> relative probabilities divided over the hypotheses of interest
For example: H1 = 0,2 and H2 = 0,8 ; or H1 = H2 = H3 = 0,33
Standardized residuals
Check wether there are outliers in the Y-space.
Rule of thumb -> values must be between -3,3 and +3,3
Cook’s Distance
With this you can check whether there are outliers within the XY-space.
Rule of thumb -> must be lower than 1
Multicollinearity
Indicates whether the relationship between two or more independent variables is too strong.
Including overly related variables in your model has three consequences:
1. B is unreliable
2. Limits magnitude of R (correlation between Y and Y^)
3. Importance of individual independent variables can hardly be determined (if at all)
Possible solutions:
- remove variables
- combine variables in scales
Variance Inflation Factor (VIF)
Used to determine whether multicollinearity is an issue.
Rule of thumb -> VIF > 5 = potential problem, VIF > 10 = problem.
Homoscedasticy
Spread of residuals must be approximately the same across all values for y. When not: heteroscedasticy
B (Beta)
Standardized coefficients, can be used to determine the most important predictor of a model. The independent variables with the largest beta is the most important predictor. -> - or + doesn’t matter.
Construct validity
Extent to which a conceptual variable is accurately measured or manipulated
Internal validity
Extent to which the research method can eliminate alternative explanations for an effect/relationship (trying to find causal relationship)
External validity
Extent to which the research results generalize to besides those in the original study
Statistical validity
Extent to which te results of a statistical analysis are accurately measured and well founded (checking assumptions and report significance)