Lecture 1 Flashcards

1
Q

Why Explore Data?

A
  • Greater effectiveness and efficiently in conducting and interpreting analysis
  • Reduces ambiguity in interpretation
  • Identify spurious relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the difference between raw data and data input?

A

Raw Data:
- Measures should be valid and relevant
- Appropriate response scale and adequate range of responses
- Hard to ‘fix’ retrospectively

Data Imput:
- Tedious task when done manually
- Complete variable and value label tabs in the SPSS data input screen
- Data input errors have catastrophic effect on any analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is MCAR? and what are some examples and criteria of it?

A

MCAR = Missing completely at random

Examples:
- Missed question on a questionnaire
- Equipment failure when collecting a data point.

MCAR data points don’t favor any variable:
- occurs infrequently
- having no relationship with IV’s or DV in the study
- Missingness is completely unsystematic and random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How is the MCAR assessed?

A

Using the Little’s MCAR chi square test

Null hypothesis (p> 0.05) is that the missing data is “missing completely at random” THIS IS GOOD!

It’s bad if p < 0.05 which suggests the missing data is not “randomly missing” which would require further investigation (MAR or MNAR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is MAR? and what are some examples and criteria of it?

A

MAR = Missing at Random

MAR data points:
- not random
- specific to sub-groups
- In individuals with specific characteristics (IV)
- DO NOT bias measurement of DV so ‘ignorable’ (Tabachnick & Fidell, 2018)
- Might influence external validity via population interference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of a MAR?

A

In a study of attitudes to bullying in children, children with poor reading skills may not complete study due to literacy problems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

In real life can MAR assumptions be verified?

A

MAR assumptions cannot be verified because the information about the missing values is not available

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What will happen if you exclude MAR data?

A

It will lead to biased estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is MNAR? and what are some examples and criteria of it?

A

The most problematic “missingness”

MNAR data:
- Directly influences scores on the DV
- Produce bias in measurement and estimates
- An association exists between participant characteristics and DV; ‘non-ignorable’ (Tabachnick& Fiddell, 2018)
- Points cannot be easily estimated/imputed from sample data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some examples of MNAR?

A

Examples:
- Reading ability test where poor readers fail to respond to certain test items because they do not understand the text.
- Matched groups used to investigate effectiveness of intervention (face to face vs internet counselling). High dropout in one group would influence post-test scores on DV.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can you tell whether is MAR or MNAR?

A

A t-test in SPSS determines if a group mean of participants with missing data on one variable (e.g. study load) differs significantly for their group mean without missing data on another variable (e.g. age)

If there are significant differences on pairs of variables (scale items), but these don’t affect the key DV, the data can be assumed to be MAR (But lacks external validity)

If the difference is found to affect the DV too, then data are MNAR (invalid results)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Listwise Deletion?
When would it be used?
What are some things to consider?

A

Listwise Deletion is a procedure that eliminates all cases with one or more missing values on the analysis variables.

It is used if the data is MCAR and missingness is <5%.

Some things to consider are that:
- It reduces power but interpretation is not affected if the sample size is large
Should question:
- What is the minimum sample size needed to run a planned analysis
- Will deletion of cases with missing values result in underpowered study?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Pairwise Deletion?
When would it be used?
What are some things to consider?

A

Pairwise Deletion is a statistical procedure that uses all available data in the data (keep the case, running analysis on only the remaining data)

It increases power, but influences error term based on sample size (less recommended)

Some things to consider:
- If there is are large amount of “missingness” on several variables you plan to use but not on variables of the most interest, consider EXCLUDING these problematic variables from analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Data Imputation?
When would it be used?
What are some things to consider?

A

Data imputation is a technique in which missing data is filled in through different techniques.

This applies to the actual individual values (item) for a variable NOT THE VARIABLE TOTAL.

Consider replacing missing values if:
- Larger amounts of missing values (up to 20-25% - pie chart) when data MCAR
- Insufficient sample size leading to underpowered results
- Case deletion would introduce bias to estimates (Brick & Kalton, 1996)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which is better modern missing data imputation techniques or deletion?
Why?

A

Modern missing data imputation techniques are preferable to deletion because they can reduce the impact of MNAR according to Graham (2009)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is group mean imputation?
Is it recommended?

A

Group Mean imputation is using the group mean to replace the missing data.

This is not recommended because it:
- Produces bias
- distorts correlations
- Reduces group variance around the mean

17
Q

What is Individual mean imputation?
When should you use it?
What is an example of it?

A

Individual mean imputation is using the average mean of the participants filled out items, to replace the missing data for the item.

It’s a solution for an occasional missing point (multi-item scales)

An example is:
on a 12-item scale, data point for 1 item is missing for a participant. Calculate the mean for the remaining 11 items and use that value to replace the missing point.

18
Q

What is multiple imputation?
What are some aspects to using this technique?

A

Multiple imputation is a modern data technique that creates multiple copies of data set, then analyses the data using each data set, and finally pools the results of each into one set of results.

Some aspects to consider is that it:
- Is robust to violation of normality even when small sample size
- Produces unbiased parameter estimates when high amounts of missing data
- Cannot be used for all analysis in SPSS

19
Q

What are three things that you could do when designing your study to avoid missing data? (by having good methodology)

A

Obtain sufficient sample size plus some extras:
- Run a priori G*power analysis to estimate required sample size for your analytical approach
- Add at least 5-10% to account for missingness

Consider your method to avoid problems:
- Items poorly worded, ambiguous, cmplex, sensitive.
- Poor layout of questionnaire - consider mobile vs computer views
- Poor response scales

Consider the study length:
- Fatigue
- Bordon
- Clinical sample vs control group - MNAR

You can also pilot test the study on a small representative sample to avoid unforseen problems.

20
Q

What are latent variables?
List some examples.

A

Latent variables are psychological constructs that cannot be directly observed or assessed with a single item.

Some examples are:
- wellbeing - self-esteem - attitudes - stress - intelligence -personality

21
Q

What can latent vari be assessed with?

A

Scales which are multi-item psychological measures

22
Q

Can you use specific items in a multi-item scales to test for a unidimensional construct?

Why?

A

No. multi-item scales cannot be used in hypothesis testing In this raw form as they need to be ‘prepared’ for data analysis.

The complete multi-item scale generates a composite score that reflects the constrict as a whole.

23
Q

What are unidimensional scales?

A

Unidimensional scales have items that only access a single psychological construct (first order)

They typically measure less-complex constructs or when the aim is a brief measure of a psychological construct.

24
Q

What is a multidimensional scale?

A

A multidimensional scale is when the items represent distinct domains of an overarching psychological construct (second -order)

This typically is used for more complex constructs with distinct dimensions, each assessed with a series of items.

25
Q

Composite scores are used in any subsequent analysis.
What are two commonly used composite scores in psychology?

A
  • Total/summated scores
  • Mean scores
26
Q

What is the Central Limit Theorem?
How is that used statistically in relation to the collection of data?

A

The central limit theorem is the theory that the distribution of sample means will approximate a normal distribution as the sample gets larger.

The sample size has to be equal to or more than 30 to be considered sufficient for the CLT to hold.

It’s important to identify patterns and scores that can have undue effects on the results and, subsequently, data interpretation.

27
Q

What can extreme outliers impact?

A

Extreme outliers inflate the error and confidence intervals around a parameter estimate (mean value) and can impact the null hypothesis significant testing.

28
Q

What should you do with extreme scores?

A

Extreme scores need to be investigated and dealt with (don’t simply delete the outliers)

  1. Use graphical and statistical procedures to identify outliers
  2. Check each outlier to understand the patterns of their responses. (might be a reason or not)
  3. Nevertheless, delete outliers if they influence the results and subsequent interpretations.
29
Q

How do you statistically identify outliers?

A

Click on save standardized values as variables in descriptive (in SPSS) and check against the criteria.

95% of valid cases should fall within z of + or - 1.96.
<5% should exceed z +-1.96 (a=.05)
<1% should exceed z+-2.58 (a=.01)
0% should exceed z +-3.29 (a=.001)

30
Q

What can you do if you find an outlier?

A

One common method is to change the outlier (score) to one more “unit” than the next higherst non-outlier number in the group (e.g. Tabachnick and Fidell 2013)

Instead of 1,2,3,4,5,6,7,49
Change it to
1,2,3,4,5,6,7,8

31
Q

What is sample variance?
What is the formula associated with it?

A

Sample variance measures variability by calculating all the deviations of each score around the mean, squaring them, and average them.

32
Q

Why do we square each deviation in Sample variance?

A

We square each deviation because if we just took the average deviation, the answer would always be zero.

33
Q

In getting the average of the squared deviation (the variance) why do we divide the squared deviation by (N-1) rather than only N?

A

Mathematically, By using (N-1) this makes out sample variance term a better estimator of the population variance (which is what we’re interested in)

34
Q
A