Research skills 4 Flashcards

1
Q

what is correlational (observational) design?

A
  • Quantitative description of trends, attitudes, or opinions of a population
  • Testing association of X and Y
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is an experimental design?

A
  • Systematic manipulation of one or more variables (X) to evaluate an outcome (Y)
  • Holds other variables constant to isolate effects
  • allows to test causability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

correlation coefficients all boil down to a ratio of…

A

How much two variables vary together: How much two variables vary on their own

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

which correlation coefficient is used for continous (numerical) data?

A

Pearson’s r

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

By squaring the value of r we get the….

A

proportion of variance in one variable shared by the other (the overlap)

For example a coefficient of r = 0.6 indicates that 36% of the variance of X and Y is shared -> 0.6 * 0.6 = 0.36

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

which correlation coefficients are used for ‘ranked’ data

A
  • Spearman’s rho for few tied ranks
  • Kendall’s tau for when there are tied ranks, and better for small samples
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the phi correlation and when should it be used?

A

The phi coefficient can be used when you have a 2x2 contingency table (two binary variables), and it quantifies the strength of association between these two variables. It is a measure of the degree of association or dependency between the two binary variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the point-biserial correlation and when should it be used

A

a statistical measure used to assess the strength and direction of the relationship between two variables when one of the variables is dichotomous (having two categories, often represented as 0 and 1) and the other is continuous. It is essentially a special case of the Pearson correlation coefficient (r) that is adapted for situations with one dichotomous variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what is a partial correlation

A

Measures the relationship between two variables, controlling for the effect that a third variable has on them both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is semi-partial correlation

A

Measures the relationship between two variables controlling for the effect that a third variable has on only one of the variables in the correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is a random variable?

A

they are What we measure in psychological research, are probabilistic quantities (not deterministic).

measured using the mean and standard deviations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

go through the simple regression pipeline

A
  1. 2 variables (DV, IV)
  2. Overall fit (R^2)
  3. Test of overall fit (F)
    - only if the F statistic is significant )p<.05)
  4. Coefficients (b0, bx)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what does the F statistics tell us?

A

The F-statistic is a measure of overall significance or goodness-of-fit of the regression model.
It assesses whether the regression model explains a significant amount of variability in the dependent variable compared to a model with no independent variables (i.e., a null model).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what does the R² statistic tell us?

A

The R-squared statistic measures the proportion of variance in the dependent variable that is explained by the independent variable(s) in the regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what do coeffiecients b₀ and b₁ tell us?

A
  • The coefficient b₀ (also known as the intercept) represents the predicted value of the dependent variable when the independent variable is zero.
  • The coefficient b₁ (also known as the slope) represents the change in the dependent variable for a one-unit change in the independent variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is multicollinearity?

A

The more two IVs are correlated with each other, the less sense it makes to keep both

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

go through the multiple regression pipeline

A
  1. N (at least 3) variables (1 DV, N IV)

1.Entry Method

2.Overall fit (R2)

3.Test of overall fit (F)
- if p(F) < (alpha)

4.
a.Coefficients (b0, bX1, … , bXN)
b.Zero-ord & Partial correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

R² is essentially the combination of..

A
  • each IV’s unique contribution to the DV (unique variance)
    &
  • shared variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is ‘forced entry’

A

when all the predictors are entered at once

16
Q

what is hierarchical regression?

A

Hierarchical regression involves entering blocks of variables into the model in a predetermined order based on theoretical or conceptual considerations.

17
Q

what are stepwise, forward and backward regressions?

A
  • Stepwise regression is a combination of forward selection and backward elimination.
    It iteratively adds and removes variables from the model based on predetermined criteria (e.g., significance level, change in R-squared).
  • Forward selection starts with an empty model and iteratively adds one independent variable at a time.
    At each step, the variable that contributes the most to the model’s explanatory power (e.g., based on significance level, change in R-squared) is added to the model.
  • Backward elimination starts with a model that includes all independent variables, and iteratively removes one variable at a time.
    At each step, the variable with the least contribution to the model’s explanatory power (e.g., based on significance level, change in R-squared) is removed from the model.
18
Q

what are the 7 assumptions to check for a multiple regression?

A
  1. Independence
  2. Variable Type
  3. Sample Size
  4. Linearity
  5. Outliers and Influential Cases
  6. Normality
  7. Multicollinearity (Tolerance, VIF)
19
Q

explain the independence assumption

A

All values of the outcome should come from a different person:

  • Each observation (raw in the dataset) comes from a unique individual
  • Each individual is in one group and one group only
    Each group is made of different people
20
Q

what is the ‘variable type’ assumption?

A

Dependent Variable or DV: outcome must be continuous
Independent Variable(s) or IV(s): predictors can be continuous or categorical

21
Q

what is the ‘sample size’ assumption? and what are the 2 approaches?

A

The sample size assumption for multiple regression analysis is typically related to the number of observations (cases or data points) relative to the number of predictor variables included in the model.

  1. Liberal (Stevens, 1996): 15 participants per predictor
    You need at least 10 for every variable you enter, and some would argue for as many as 50.
  2. Conservative (Green, 1991):
    a) N = 50 + 8m (where m is the number of IVs) for testing the multiple correlation
    b) N = 104 +m for testing individual predictors (partial correlation)
22
Q

what is the ‘linearity’ assumption?

A

The linearity assumption in multiple regression refers to the assumption that there is a linear relationship between the independent variables (predictors) and the dependent variable (outcome). This assumption means that changes in the dependent variable are assumed to be proportional to changes in the independent variables, with a constant rate of change across all levels of the independent variables.

23
Q

what are the 2 kinds of outliers in the ‘outliers and influential cases’ assumption?

A

There are two kinds of outliers

Univariate: only present in one variable
Eg: one participant has a very different score from the rest of the sample

Multivariate: they result from the combination of two or more variables together
Eg: a participant’s scores are in the same range as the rest of sample in each variables (ie, not univariate outliers), but the overall pattern of their scores is off from the group (eg, one participant has the same exact score in all variables)

24
Q

what is the ‘normality’ assumption?

A

The normality assumption in multiple regression pertains to the distribution of the residuals (the differences between observed and predicted values) and not directly to the distributions of the independent or dependent variables themselves. The assumption states that the residuals should be normally distributed.

25
Q

what is the ‘Multicollinearity (Tolerance, VIF)’ assumption?

A

Multicollinearity exists if predictors are highly correlated. This assumption can be checked in SPSS, you access this option through the Statistics button when you set up a linear regression. Select Collinearity diagnostics

26
Q

what are the 3 tests for multivariate outliers?

A

a. Residual Statistics
Standardized Residuals (ΖΡΕ_1)
- , if ZRE_1 is greater than +3 or smaller than -3, is likely to be an outlier and of concern.

b. Mahalanobis distance
(MAH_1)

c. Influential cases
Cook’s (COO_1) distance
- Cook’s values greater than 1 (or closer to it) are a cause for concern.

27
Q

define z-score

A

A z-score, also known as a standard score, is a statistical measurement that describes a data point’s position relative to the mean of a group of data points. It is expressed in terms of standard deviations from the mean.

28
Q

Define Inferential statistics

A

Inferential statistics is a branch of statistics that focuses on making predictions or inferences about a population based on a sample of data drawn from that population

29
Q

define ‘central limit theorem’

A

The Central Limit Theorem (CLT) is a fundamental principle in statistics that states that the distribution of the sample mean (or sum) of a sufficiently large number of independent, identically distributed (i.i.d.) random variables approaches a normal (Gaussian) distribution, regardless of the original distribution of the variables

30
Q

define ‘standard error of the mean’

A

The standard error of the mean (SEM) is a statistical measure that quantifies the amount of variability or dispersion in the sample mean estimates of a population mean

31
Q

define t-score

A

A t-score is a type of standardized score used in statistics to compare the difference between an observed sample mean and the population mean when the population standard deviation is unknown and the sample size is relatively small

32
Q

define ‘confidence interval’

A

A confidence interval (CI) is a range of values, derived from sample data, that is likely to contain the true population parameter (such as the mean or proportion) with a specified level of confidence

33
Q

open questions:

ADVANTAGES:DISADVANTAGES

A

Writing the question response in any form the respondent feels is useful: (e.g., ‘What reasons are there for recycling?’)

Advantages
- Gets all the information

  • Does not lead the respondent
  • Is more naturalistic

Disadvantages
- Can be difficult to complete (requires listing)

  • Difficult to code and analyse
  • Poor when a numeric result is required
34
Q

closed questions:

ADVANTAGES:DISADVANTAGES

A

Require researcher to have a idea of the likely
response options (e.g., Which of the following do you recycle?:
Glass/paper/clothing/none of the above)

Advantages
- Easy to code and analyse

  • Good when a numerical result is required
  • Quick for respondents to answer

Disadvantages
- Can encourage bias

  • Can miss possible answers
  • Can create opinions where none exist
35
Q

define Split-half reliability

A

Split-half reliability measures the consistency of a test by dividing it into two equal halves and correlating the scores on each half. It assesses whether both halves produce similar results.

36
Q

define Internal reliability

A

Internal reliability, also known as internal consistency, refers to the extent to which all items or components of a test, survey, or measurement instrument measure the same underlying construct consistently

37
Q

define Cronbach’s Alpha

A

Cronbach’s Alpha is a measure of internal consistency, indicating how well a set of items in a test or survey measure a single unidimensional latent construct.

38
Q

what is Factor analysis

A

Factor analysis is a statistical method used to identify underlying relationships between variables by grouping them into factors. It reduces data dimensionality by detecting patterns of correlations.

39
Q

what are the two types of factor analysis?

A

Exploratory Factor Analysis (EFA): Used when the underlying structure is not known. It explores the potential factors without predetermined ideas.
Confirmatory Factor Analysis (CFA): Used to test hypotheses or theories about the structure of factors, confirming whether the data fits a pre-specified factor model.

40
Q

what are the 2 types of rotation

A
  1. orthogonal – this type of rotation assumes that each factor is unique and has no shared associations. This tends to be used when testing a theoretical model that specifies independent factors (e.g. varimax).
  2. oblique – this is more often used and determines the relationship of factors to one another rather than assuming independence (e.g. oblimin).
41
Q

what is ‘test-retest reliability’?

A

Correlation between the scores at two different times of testing. There are a number of factors that can affect this: the internal reliability of the test, external factors in the test sample (mood, fatigue etc.), carry over effects from the first testing period

42
Q

define validity

A

The extent to which a test measures what it is intended to measure

43
Q

define face validity

A

Face validity is the extent to which a test, measurement, or instrument appears to measure what it is intended to measure, based on subjective judgment. It refers to the degree to which the items on a test look like they are assessing the intended construct, at face value.