Lecture Two Flashcards
Remember correlations?
- An important stat in test creation and measures of worthiness
- -1 to 1
- linear
- line of best fit
- curvilinear
- assumptions underpinning correlation – why?
- How does variance impact on the coefficient?
- What about the shape of the line?
Shared variance (squared coefficient)
- What are the factors that contribute to the shared variance?
- What is not shared?
- Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? Its about consistency and dependability
- Depends on construction of test and environment administered in
- There’s no perfect test or environment so will always be some error but we want to minimise it
- Reliability usually reported as a correlation coefficient
- Different types of tests tend to have different levels of reliability e.g. well-constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70)d= because the concept is abstract and potentially fluctuates
- Many ways of measuring reliability including test retest, alternative forms, consistency of measurement’
Test-retest reliability
- Give test twice to same group, usually a few weeks apart
- Correlate results
- Higher the correlation, more reliable the test
- Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
- Is less likely to fluctuate if testing a stable construct (e.g., IQ)
Alternative, parallel or equivalent forms reliability
- Making two or more versions of the same test
- Stops issues like people remembering or studying particular answers between test-retest
- Hard to make tests equal in terms of content and levels of difficulty, or ensuring administration was exactly the same
- Test developer must demonstrate the versions are truly parallel
Reliability as Internal consistency
- Measures how test items relate to each other and the test as a whole
- Looks within the test to measure reliability
- E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
- Most common forms of internal consistency (i.e. reliability) are split half and Cronbach’s alpha
Split-half reliability
- Use one form of the test administered at the same time. Split the test in two and correlate the scores
- Issues: test may get harder as you go along so first half is not equal to second
- May compare odd and even numbered questions but still the halves may not be equal and the test is shorter which can decrease reliability
- Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)
Cronbach’s coefficient alpha and Kuder-Richardson
- Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
- Kuder-Richardson used with forced-choice format tests
- Validity – the extent to which all the available evidence supports that the test is actually measuring what it is intended to measure
- It is a central requirement for test without which the test items/tasks would not have meaning
o Content validity
• Face validity
o Construct validity
• Criterion-related validity
• Predictive validity
Content validity
- The content of a test reflects what the test is aiming to measure
- Sometimes content validity is enough validity for a test
- Not enough to ascertain test validity beyond achievement type tests e.g. for more abstract constructs
Content Validity Example
- Your results on the end of semester exam should reflect what you know about what has been covered in this unit
- So, there should be questions that equally represent the concepts introduced
o E.g. construct underrepresentation; a failure to capture an important aspect or overrepresentation - The questions should be worded in a way that is consistent with the language concepts are taught in
Face validity
- It refers to the look of the test but maybe superficial
o E.g. the items in the test look as though they ought to measure what you are aiming to measure - Some tests may look valid and not be and others not look valid but are
Construct validity
- Constructs are theoretically driven ways of talking about certain features in the world
- Construct validity asks how well a test can give a construct meaning
o E.g. anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
o Construct irrelevance: scores are influenced by something other than what the test is supposed to measure e.g. anxiety or illness impacting exam score
Construct validity
- Scientific evidence demonstrating that the construct (mode, concept, idea, notion) is actually being measured by the test
- Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy
- Measured with statistical tools and methods
Criterion-Related validity
- What is the relationship between the criterion (another standard) for the test and the test scores?
- Concurrent validity
o What is the evidence that my test compares well (e.g. highly correlated) with the results of some other way of knowing (e.g. corroboration)
o Relates to what is known at this point in time
Convergent validity
- If you think your test measures post-traumatic growth, you would expect it to be related to other instruments designed to measure positive post-trauma perception (e.g. SRG & Thriving)
- You don’t want a perfect correlation with other tests as that would make yours redundant
- Depending on how close the theoretical connection is between the tests, the coefficient will vary
Discriminant validity
- As with convergent validity, discriminant validity uses other established tests to test construct validity
- This time, you are looking for little or no correlation between your measure (e.g. IES-R & PTGI)
- Knowing what you are dealing with may have significant clinical influence
Predictive validity
- If a standard to compare to is not available, interest turns to Predictive validity
o The relationship between the test scores now and a standard in the future
o For example, OP scores and first year academic success – depends on age…
o OP for school leavers better predictor than learning strategies which is more predictive for mature age
o Combining evidence often offers more predictive validity e.g. OP. introversion and learning strategies in school leavers
Cross cultural fairness
- The idea that ethnicity, gender, class, background etc impact on results
- Lot so laws passed in the US about tests being culturally fair so as not to repeat errors of the past which disadvantaged minority groups e.g. employment tests must be able to demonstrate the test is relevant to the job sought
- E.g. what number comes next in the sequence, one, two, three, _____?
- Mong or yuur mong
- As wallaby is to animal so cigarette is to ___
- Kuuk thaayorre of Edward River
Practicality 1
- Choosing the right test to administer; time, format, cost etc
- Time
o Time taken to do test needs to reflect person targeted e.g. attention span, age, time available to do the test - Cost
o Many tests cost and some cost a lot. Balance cost of test with reliability of it and level of need to take it - Format
o Types of questions, font size, layout, MC lower anxiety but not always culturally fair (e.g. white males and MC)
Practicality 2
- Readability
o Test items be reviewed for readability – our school - Ease of administration, scoring and interpretation
o Understanding the test manuals
o How many people are taking the test and does this impact on ease of administration?
o Level of training needed to administer, score and interpret the results
o How long will scoring take and how long will report take?
o Time needed to explain results to test taker
o Other materials like publisher’s preformatted sheets
Selecting and administering a good test
- There are 000s of tests – how to choose?
- What are the goals of the client or researcher?
- Which tests can achieve that goal?
- Sourcing tests through articles, books, publisher’s catalogues, test library, online
- Major test categories include:
o IQ, aptitude, achievement, behaviour, development, personality, neuropsychological, science, sensory, perception, speech, hearing… - Examine research about the test such as its validity and reliability data
o Different forms of reliability testing
o Different forms of validity testing - Based on all the information and steps outlined, make a balanced and informed choice
Making meaning out of raw scores
- Raw scores have not been manipulated in any way
- They mean nothing without putting them in context so might compare the score an individual gets with the ‘normed score’
o How did the client fair next to others from the same type of group who have taken the test before?
o Compare people from 2 groups e.g. using percentiles
o Compare results for one person on 2 or more tests – sometimes discrepancies between these scores indicate an impairment of sorts
Frequency distributions
- What was the score and how often did it occur? Listing scores in numerical order you can easily see if this person scored higher or lower than most on the distribution
- Might list scores or groups of scores
- Histograms & frequency polygons etc assist in getting an overview of the data. We can learn a lot about the data from its shape
Standard scores
- Percentiles are commonly used (e.g. 75th percentile)
- Z-scores are standard scores
- The mean always = 0 and the SD = 1
- Z = the score, less the mean, divided by the standard deviation and is therefore sensitive to all components of the variance equations including sample size
The z-score distribution
- We can represent any score from any distribution of scores on this normal distribution and make the mean = 0 and the SD = 1
T scores
- Another standardised score
- T-scores are often used in personality testing
- M = 50 and SD = 10
- T = z(SD) + M
Quantitative Approaches
o Deductive o Positivist o Realist o Objective o Reductionist o Generalisation o Numbers
Qualitative Approaches
o Inductive o Interpretive o Constructivist o Subjective o Holistic o Uniqueness o Words
A word about qualitative rigor
- Qualitative research has different ways to investigate the quantitative equivalent to reliability and validity – trustworthiness e.g. triangulation
- Qual. Methods can be used to support quant. Measures
- They can also be used to inform quant studies
- BUT – qual and quant are underpinned by very different philosophical assumptions
o E.g. explore construct meaning/measure constructs
o Idiographic/nomothetic