Reliability and Validity Flashcards
Reliability
- refers to the replicable nature of research studies/tools
- high reliability does not guarantee scientific validity
Testing reliability
- test-retest correlation
- instrument is twice administered to the same population
- 2-14 days in between
Cronbach’s alpha
- measures internal consistency of a test by correlating each item with the total score and averaging the correlation coefficients
- it takes values between negative intinity and 1 as maximum but only positive values make sense
- arbitrary cut-off of 0.70 is used to call the evaluated test to be internally consisten
Split-half reliability
-refers to splitting a scale into two parts and examining the correlation
Interrater reliability
-measured using two or more raters rating the same population using the same scale
Intraclass correlation coefficient
- used for continuous variables
- the proportion of total variance of the measurement that reflects true between subject variability
- ranges between 0 (unreliable) and 1 (perfect reliability)
- relative ICC is always higher than absolute ICC
ANOVA intraclass coefficient
-used for quantitative data of more than 2 raters/groups
Nominal data reliability
-if it has more than two categories then a kappa or weighted kappa can be used
Validity of an instrument
-the extent to which an instrument measures what it proposes to measure
Face validity
-refers to a subjective measure of deciding whether the test measures the construct of interest on its face value (what it was designed for)
Construct validity
-measures whether a test really measures the theoretical construct of interest of something else
Content validity
-refers to whether the contents ie each individual subscales, items or elements of the test are in line with the general objectives or specifications the test was originally designed to measure
Criterion validity
-refers to performance against an external criterion such as another instrument (concurrent) or future diagnostic possibility (predictive)
Concurrent validity
-refers to the ability of a test to distinguish between subjects who differ concurrently in other measures e.g those who score high on a scale of insomnia may score high on the scale of fatigue ratings
Predictive validity
-the ability of a test to predict future group differences according to current group differences in score
Incremental validity
-refers to the ability of a measure to predict or explain variance over and above other measures
Convergent validity
- refers to the agreement between instruments that measure the same construct
- form of construct validity
Discriminant validity
- refers to the degree of disagreement between two scales measuring different constructs
- form of construct validity
Experimental validity
- refers to the sensitivity to change
- an instrument must show the difference in results when an intervention is carried out to modify the measured domain
- form of construct validity
Factorial validity
-form of construct validity established via factor analysis of items in a scale
Precision
- degree to which a calculated central value varies with repeated sampling
- the more narrow the variation the more precise the value is
- random error leads to imprecision
Factors reducing precision
- wider limits of the interval
- expecting higher confidence interval
Accuracy
-the correctness of the mean value
Precision
-comparable to reliability while accuracy is comparable to validity
Bias
-compromises validity/accuracy
Face
-does the scale appear to be fit for the purpose of measuring the variable of interest?
Content
-does the scale appear to include all the important domains of the measured attribute?
Criterion
-is the scale consistent with what we already know (concurrent) and what we expect (predictive)?
Convergent
-does this new scale associate with a different scale that measures a similar construct?
Discriminant
-does the new scale disagree with scales that measure unrelated constructs?
Kappa
observed agreement beyond chance/maximum agreement beyond chance
OR
(observed agreement-agreement expected by chance)/(100%-agreement expected by chance)
Beyond chance agreement
-Kappa indicates the level of agreement that could be expected beyond chance
When is Kappa calculated?
-only for agreement on categorical variables such as presence or absence of a diagnosis
Weighted Kappa
-for ordinal variables
Bland-Altman
-used for continuous variables where pairs of score differences are plotted against the mean
Kappa values and degree of agreement
0= no agreement 0-0.2=slight 0.2-0.4=fair 0.4-0.6= moderate 0.6-0.8=substantial 0.8-1.0= almost perfect
What is kappa statistics?
- if two investigators independently assess the same group, there will be an extent to which their results ‘agree’
- simple percent agreement overestimates the degree of agreement and this is misleading. This is why kappa statistics are done
- kappa indicates the level of agreement that could be expected beyond chance
Interpreting kappa
- affected by the prevalence of the outcome studied -the higher the proportion of positive assessments the higher the kappa
- statistical significance cannot be tested directly from kappa measurements but it guides the actual degree of agreement
Observed agreement calculation
- draw two by two table
- take the two values agreed and add them together
- then divide by 100
Kappa coefficient
- dependent on the prevalence of the measured condition
- common disorders will have low kappa but rare disorders have low Kappa
- one must look at the actual percentage afreement
Phi
- similar to kappa
- all cells are utilised and statistical significance is possible
- small sample size can be used