Quiz 3 Flashcards

Question

As we have less variability, effect size becomes _____, as SD is in denominator, even with the same degree of mean difference

Answer 1

- Effect sizes based on variance explained - These effect sizes estimate the amount of the variance in an outcome variable that is explained or “accounted for” by the model/predictor variables - E.g. include r, which is the correlation coefficient and r^2 which is the coefficient of determination - On a range of ±1 for pos/neg correlation - Tells us the size/strength of the association and the direction of the association of the two variables

Answer 2

r = .1, d = .2 (small effect): the effect explains 1% of the total variance r = .3, d = .5 (medium effect): the effect accounts for 9% of the total variance r = .5, d = .8 (large effect):

Answer 3

- Estimates of anticipated ES can be used to project the sample size that would be adequate for detecting statistically significant results → power analysis - They enable researchers to inform judgment about the practical significance of the study - Because effect sizes are standardized measures of the size of mean differences or strength of relations, they are used to compare the results of different studies with one another and to be used in meta-analysis - A qualitative approach would be a systematic review

Answer 4

Estimates of anticipated ES can be used to project the sample size that would be adequate for detecting statistically significant results

Answer 5

effect size

Answer 6

effect sizes, meta-analyses

Answer 7

whether each individual subject has made a clinically significant improvement -

Answer 8

- Clinically significant change conceptualized as return to normal functioning - Clinical significance → the extent to which therapy moves someone outside the range of dysfunctional population or within the range of functional population

Answer 9

effect size

Answer 10

a. 2 SDs above the dysfunctional mean b. 2 SDs below the functional mean c. Midpoint between the means of the functional and dysfunctional groups

Answer 11

- normative data from the general population - In the absence of normative data, we cannot us cutoff b or c - when there is norm data, but the two distributions do not overlap, it is more appropriate to use cutoff b than c

Answer 12

b (2 SDs below the functional mean) than c (the midpoint between the means of the functional and dysfunctional groups)

Answer 13

RCI = (score 2 - score 1) / SEdiff

Answer 14

- What this formula is capturing is whether the degree of change the individual has achieved is greater than what we would expect to see by random chance

Answer 15

RCI ≥ 1.96 : reliable change (not due to a measurement error) RCI < 1.9: unreliable change (could be due to a measurement error) Sometimes hard to use as a dichotomous measure, researchers can also use it as a continuous measure

Answer 16

- Organizing / making sure data is entered correctly - Some say this should take up the majority of our time A method to do this can be double entry - To compare the two entries of data and check the accuracy of entry - Another method is running descriptives

Answer 17

- Begins with the question as to why data are missing - Knowing the source of the missing data is essential to know what to do about it and to achieve good statistical measures - Randomness in missing data just happens and is less bad

Answer 18

- MCAR - MAR - MNAR

Answer 19

- Missing Completely At Random - The probability that an observation (Yi) is missing is unrelated to the value of Yi or the the value of any other variable (e.g. X) - Missingness is NOT related to the characteristics of the participant/case (e.g. a questionnaire is lost or there is a data entry error) - The best case-scenario of missingness because it is random, and will not lead to bias in the estimated parameters - Can analyze and still have valid results

Answer 20

can artificially decrease sample size → loss in statistical power

Answer 21

- Missing At Random - The probability than an observation (Yi) is missing is unrelated to the value of Yi but is related to the value of other variables (e.g. X) - As long as another variable in the data set can explain it → MAR - E.g. those who are younger age more likely to skip income question on a survey

Answer 22

- Missing Not At Random -The probability that an observation Yi is missing is related to the value of Yi - MOST problematic - When we have data that are MNAR, it is nonignorable - Most difficult to address - Run two sets of analyses

Answer 23

- One for primary analyses to answer research question (comparing means, correlations etc) - The other for modeling possible scenarios of the missingness → what-if analysis - Then compare and see if results from primary analysis are consistent across different scenarios of MNAR

Answer 24

sample size

Answer 25

If NOT significant → MCAR But does not tell us exactly what we want to know about the missing data If significant → MAR or MNAR Doesn't tell us what variables have the missingness / violate MCAR We need to do more work to identify that

Answer 26

Create dummy variables for missingness Binary variable 1= missing; 0 = not missing If continuous → use t-test If categorical → use chi-squared When there are significant relations → data are likely MAR and not MCAR because other variables in the data set can be used to predict the missingness in the variable Cannot necessarily rule out MNAR

Answer 27

- Measure some of the missing data (e.g. follow up on non-respondents) - Then compare initial respondents and non- respondents on the variables with missing data - Use scientific knowledge and knowledge of situations where MNAR may occur - i.e. the more sensitive the nature of the variable you’re assessing, the less likely participants are likely to provide a response

Answer 28

Options to delete data or estimate the missing values through single imputation

Answer 29

Listwise and Pairwise

Answer 30

- aka casewise deletion - Subjects who have any missing data points are entirely deleted - Con → loss of power

Answer 31

- Subjects with missing data points on the particular analysis you are running are not included in that analysis, but are included for analyses for which they do have data -

Answer 32

loss of power

Answer 33

- parameter estimates will be based on slightly different sets of data - Degree of inconsistency across different analys

Answer 34

- Makes use of all available data points - Maintains largest possible sample size

Answer 35

5% or less

Answer 36

- Mean for items on a scale that are not missing - Mean for the missing variable (using available data) - Mean for a subsample (e.g. men) - Multiple regression (with or without random error) - Use regression estimate to fill in missing value

Answer 37

Filling in the missing values with some single number

Answer 38

- While we increase the sample size, we are artificially reducing the variability in the data - This is important because the data might be missing because they are different from the rest of the data set - By plugging in the typical values from existing data points may not be the best/most accurate approach - This can artificially increase the strength of relations between variables and increase bias in hypothesis testing → spurious results

Answer 39

Add random errors intentionally to add variability and avoid data that is too homogeneous

Answer 40

added random error

Answer 41

Mean substitution should be avoided because it is not as good as regression substitution

Answer 42

1. Maximum likelihood procedures 2. Multiple imputation

Answer 43

some data points that have really unusual or unlikely values on one variable

Answer 44

combination of unusual scores on at least two variables

Answer 45

a box plot

Answer 46

- 2-stage flag/process for outliers built into SPSS - Values between 1.5-3 box links from either end of the box are marked as a circle, meaning a potential outlier - Values further away from the three box links are in the “red zone” / “red range” and those values would be marked with an asterisk - Indicates a more extreme outlier

Answer 47

- Easy to see when transpose the box plot with normal distribution - Note: Not the same as SDs - The range of the box plot includes most of the data points—close to 99%, so this is why 1.5 box lengths is the first cut off for outliers - So outside of the box plot range is an unlikely value

Answer 48

- Run analysis with and without the outliers - If results do not change much, you can potentially keep the outliers and report results based on the data - If the results do change → Trimming, Winsorizing, Transformation,

Answer 49

Trimming, Winsorizing, Transformation

Answer 50

- Remove outliers - Makes sense when outliers occur for random reasons - Con --> Reduce power

Answer 51

reduces power

Answer 52

- Bring outliers back towards the mean - Replacing outlier values with other value - 90th percentile, 95th percentile, sometimes 3 SD cutoff - 90th percentile replacement would be described as 10% Winsorized

Answer 53

10% Winsorized

Answer 54

- Can help to pull unusual / high numbers towards other values - E.g. square root, log transformations - Can help to make statistical assumptions (e.g. normality) work better - Can reduce impact of a single point/outlier Need to be extra careful with interpretation

Answer 55

5% Note: Less than 5% of data should have little effect on p-value - In the case of trimming, more than 5% can greatly reduce power

Answer 56

A case that has an unusual combination of scores on 2 or more variables

Answer 57

- A statistical procedure to discern if a case is an outlier - defined as the distance of a case from the centroid

Answer 58

point created by the means of all the variables

Quiz 3 Flashcards

Quiz 3 for Stats I (85 cards)