Research skills 4 Flashcards
what is correlational (observational) design?
- Quantitative description of trends, attitudes, or opinions of a population
- Testing association of X and Y
what is an experimental design?
- Systematic manipulation of one or more variables (X) to evaluate an outcome (Y)
- Holds other variables constant to isolate effects
- allows to test causability
correlation coefficients all boil down to a ratio of…
How much two variables vary together: How much two variables vary on their own
which correlation coefficient is used for continous (numerical) data?
Pearson’s r
By squaring the value of r we get the….
proportion of variance in one variable shared by the other (the overlap)
For example a coefficient of r = 0.6 indicates that 36% of the variance of X and Y is shared -> 0.6 * 0.6 = 0.36
which correlation coefficients are used for ‘ranked’ data
- Spearman’s rho for few tied ranks
- Kendall’s tau for when there are tied ranks, and better for small samples
what is the phi correlation and when should it be used?
The phi coefficient can be used when you have a 2x2 contingency table (two binary variables), and it quantifies the strength of association between these two variables. It is a measure of the degree of association or dependency between the two binary variables
what is the point-biserial correlation and when should it be used
a statistical measure used to assess the strength and direction of the relationship between two variables when one of the variables is dichotomous (having two categories, often represented as 0 and 1) and the other is continuous. It is essentially a special case of the Pearson correlation coefficient (r) that is adapted for situations with one dichotomous variable.
what is a partial correlation
Measures the relationship between two variables, controlling for the effect that a third variable has on them both
what is semi-partial correlation
Measures the relationship between two variables controlling for the effect that a third variable has on only one of the variables in the correlation.
what is a random variable?
they are What we measure in psychological research, are probabilistic quantities (not deterministic).
measured using the mean and standard deviations
go through the simple regression pipeline
- 2 variables (DV, IV)
- Overall fit (R^2)
- Test of overall fit (F)
- only if the F statistic is significant )p<.05) - Coefficients (b0, bx)
what does the F statistics tell us?
The F-statistic is a measure of overall significance or goodness-of-fit of the regression model.
It assesses whether the regression model explains a significant amount of variability in the dependent variable compared to a model with no independent variables (i.e., a null model).
what does the R² statistic tell us?
The R-squared statistic measures the proportion of variance in the dependent variable that is explained by the independent variable(s) in the regression model.
what do coeffiecients b₀ and b₁ tell us?
- The coefficient b₀ (also known as the intercept) represents the predicted value of the dependent variable when the independent variable is zero.
- The coefficient b₁ (also known as the slope) represents the change in the dependent variable for a one-unit change in the independent variable.
what is multicollinearity?
The more two IVs are correlated with each other, the less sense it makes to keep both
go through the multiple regression pipeline
- N (at least 3) variables (1 DV, N IV)
1.Entry Method
2.Overall fit (R2)
3.Test of overall fit (F)
- if p(F) < (alpha)
4.
a.Coefficients (b0, bX1, … , bXN)
b.Zero-ord & Partial correlations
R² is essentially the combination of..
- each IV’s unique contribution to the DV (unique variance)
& - shared variance
what is ‘forced entry’
when all the predictors are entered at once
what is hierarchical regression?
Hierarchical regression involves entering blocks of variables into the model in a predetermined order based on theoretical or conceptual considerations.
what are stepwise, forward and backward regressions?
- Stepwise regression is a combination of forward selection and backward elimination.
It iteratively adds and removes variables from the model based on predetermined criteria (e.g., significance level, change in R-squared). - Forward selection starts with an empty model and iteratively adds one independent variable at a time.
At each step, the variable that contributes the most to the model’s explanatory power (e.g., based on significance level, change in R-squared) is added to the model. - Backward elimination starts with a model that includes all independent variables, and iteratively removes one variable at a time.
At each step, the variable with the least contribution to the model’s explanatory power (e.g., based on significance level, change in R-squared) is removed from the model.
what are the 7 assumptions to check for a multiple regression?
- Independence
- Variable Type
- Sample Size
- Linearity
- Outliers and Influential Cases
- Normality
- Multicollinearity (Tolerance, VIF)
explain the independence assumption
All values of the outcome should come from a different person:
- Each observation (raw in the dataset) comes from a unique individual
- Each individual is in one group and one group only
Each group is made of different people
what is the ‘variable type’ assumption?
Dependent Variable or DV: outcome must be continuous
Independent Variable(s) or IV(s): predictors can be continuous or categorical