PSYC332 mid sem Flashcards
Reliability?
Consistency - will get the same score on two occasions
Rxx?
Correlated scores of one administration of the test with scores on another administration of the exact same test (same variable being correlated)
Reliability coefficient for a good test?
At least .8
Test-retest?
Identical test given to the same group of people on two occasions
Coefficient of stability
Test-retest error variance?
Time sampling
Alternative forms?
Two versions of the test which are constructed in an identical way but the content is slightly different
Can be given immediately after one another or can be given on different days (this is delayed)
correlation of equivalence
Alternate forms error variance?
Content sampling
Delayed alternative forms - time sampling
Internal consistency reliability?
Split-half
KR20
Coefficient alpha
Split-half reliability?
calculated easy and doesn’t require administering two different forms or on two different occasions.
Single test is administered and the test is split into two halves – score on two halves for those individuals is correlated
correlation of consistency
Split -half reliability error variance?
content sampling
Underestimates reliability because the longer the test is, the higher reliability coefficient will be.
Spearman brown formula?
Determines how much split-half reliability underestimates the reliability coefficient
KR20?
Single test administration
Used for dichotomously scored items
Value = equal to the mean value of all- split half coefficients for a test.
Coefficient alpaha?
Single test administration
Multiple choice test or Likert
Value = equal to the mean value of all- split half coefficients for a test.
Error variance KR20 and alpha?
Content sampling
heterogeneity of the behaviour domain being sampled
Maths test example: maths test that only has addition problems in it = homogeneity- same throughout. If a test has different mathematical concepts e.g. addition, subtraction, algebra, division = heterogeneity. The greater amount of differences in what you’re measuring i.e. the more heterogeneity a test has the less internal consistency the test will have.
Pearson’s R and homogeneity?
r decreases as homogeneity of the sample increases – know with Pearson’s r that the more similar a sample is the lower the correlation coefficient.
If the standardised sample is varied i.e. Heterogenous (people are not as similar) correlation will be higher.
Validity?
Accuracy - is the test measuring what we think it’s measuring
Rxy?
x = the test being measured and y = something else being measured to see if the test correlates with the other thing that is being measured
Face validity?
The test appears to be testing what it sets out to test. It has no rxy, but it is important for most tests as it helps with compliance of test takers and makes them less suspicious. However, don’t want it to be too face valid where the test taker can manipulate the answers because they know what the test is assessing.
Criterion validity?
he test scores “predict” performance
Assessing future statistical performance!
Test scores are compared with performance on a criterion that is a direct and independent measure of what the test is designed to measure
Criterion, predictive validity?
Test is given to a group of people at one point in time – then at another point the same group of people are measured using the same criterion. Tells us whether test scores accurately predict criterion and whether they are valid indicators of criterion performance at some later point in time.
Good level of validity?
Looking for a statistically significant correlation coefficient of approximately 0.2 to 0.5.
Criterion, concurrent validity?
Concurrent = the most appropriate when looking at tests that are diagnosing the current state. Administration of the test and scores are obtained at the same time
Factors that influence rxy
Low sample size will impact on statistical significance
Restriction of range in either the rest or criterion scores – correlation will be reduced e.g. Only looking at high scores reduces the correlation as it looks more circular rather than cigar shaped (selectivity problem = especially effected). Lower rxy if you restrict the range.
Non-linear relationship between test scores and criterion scores – assumes x + y = linear therefore non- linear = affect the correlation.
Problems with criterion – biggest issues, the test may be good but operationalising criterion = not good.
Content validity?
Not empirical no rxy
Making sure the items on the test tap into the range of behaviours we want to measure
most relevant for achievement tests.
attempts to sample full behaviour domain
Built into the test from the outset
Construct validity?
The degree to which test scores reflect individual differences on a psychological construct i.e. Something that is not directly observable like anxiety
Needs lots of studies to test – get a gradual accumulation of data about the test.
Ask whether test scores behave the same way that we expect given knowledge of the construct