- Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? Its about consistency and dependability - Depends on construction of test and environment administered in - There’s no perfect test or environment so will always be some error but we want to minimise it

- Reliability usually reported as a correlation coefficient - Different types of tests tend to have different levels of reliability e.g. well-constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70)d= because the concept is abstract and potentially fluctuates - Many ways of measuring reliability including test retest, alternative forms, consistency of measurement’

- Validity – the extent to which all the available evidence supports that the test is actually measuring what it is intended to measure - It is a central requirement for test without which the test items/tasks would not have meaning o Content validity • Face validity o Construct validity • Criterion-related validity • Predictive validity

- The content of a test reflects what the test is aiming to measure - Sometimes content validity is enough validity for a test - Not enough to ascertain test validity beyond achievement type tests e.g. for more abstract constructs

- It refers to the look of the test but maybe superficial o E.g. the items in the test look as though they ought to measure what you are aiming to measure - Some tests may look valid and not be and others not look valid but are

- Constructs are theoretically driven ways of talking about certain features in the world - Construct validity asks how well a test can give a construct meaning o E.g. anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings o Construct irrelevance: scores are influenced by something other than what the test is supposed to measure e.g. anxiety or illness impacting exam score

- Scientific evidence demonstrating that the construct (mode, concept, idea, notion) is actually being measured by the test - Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy - Measured with statistical tools and methods

Lecture Two Flashcards by Stephanie Arce

Remember correlations?

An important stat in test creation and measures of worthiness
-1 to 1
linear
line of best fit
curvilinear
assumptions underpinning correlation – why?
How does variance impact on the coefficient?
What about the shape of the line?

How well did you know this?

Not at all

Perfectly

Shared variance (squared coefficient)

What are the factors that contribute to the shared variance?
What is not shared?

How well did you know this?

Not at all

Perfectly

Reliability

Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? Its about consistency and dependability
Depends on construction of test and environment administered in
There’s no perfect test or environment so will always be some error but we want to minimise it

How well did you know this?

Not at all

Perfectly

Reliability

Reliability usually reported as a correlation coefficient
Different types of tests tend to have different levels of reliability e.g. well-constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70)d= because the concept is abstract and potentially fluctuates
Many ways of measuring reliability including test retest, alternative forms, consistency of measurement’

How well did you know this?

Not at all

Perfectly

Test-retest reliability

Give test twice to same group, usually a few weeks apart
Correlate results
Higher the correlation, more reliable the test
Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
Is less likely to fluctuate if testing a stable construct (e.g., IQ)

How well did you know this?

Not at all

Perfectly

Alternative, parallel or equivalent forms reliability

Making two or more versions of the same test
Stops issues like people remembering or studying particular answers between test-retest
Hard to make tests equal in terms of content and levels of difficulty, or ensuring administration was exactly the same
Test developer must demonstrate the versions are truly parallel

How well did you know this?

Not at all

Perfectly

Reliability as Internal consistency

Measures how test items relate to each other and the test as a whole
Looks within the test to measure reliability
E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
Most common forms of internal consistency (i.e. reliability) are split half and Cronbach’s alpha

How well did you know this?

Not at all

Perfectly

Split-half reliability

Use one form of the test administered at the same time. Split the test in two and correlate the scores
Issues: test may get harder as you go along so first half is not equal to second
May compare odd and even numbered questions but still the halves may not be equal and the test is shorter which can decrease reliability
Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)

How well did you know this?

Not at all

Perfectly

Cronbach’s coefficient alpha and Kuder-Richardson

Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
Kuder-Richardson used with forced-choice format tests

How well did you know this?

Not at all

Perfectly

Validity

Validity – the extent to which all the available evidence supports that the test is actually measuring what it is intended to measure
It is a central requirement for test without which the test items/tasks would not have meaning
o Content validity
• Face validity
o Construct validity
• Criterion-related validity
• Predictive validity

How well did you know this?

Not at all

Perfectly

Content validity

The content of a test reflects what the test is aiming to measure
Sometimes content validity is enough validity for a test
Not enough to ascertain test validity beyond achievement type tests e.g. for more abstract constructs

How well did you know this?

Not at all

Perfectly

Content Validity Example

Your results on the end of semester exam should reflect what you know about what has been covered in this unit
So, there should be questions that equally represent the concepts introduced
o E.g. construct underrepresentation; a failure to capture an important aspect or overrepresentation
The questions should be worded in a way that is consistent with the language concepts are taught in

How well did you know this?

Not at all

Perfectly

Face validity

It refers to the look of the test but maybe superficial
o E.g. the items in the test look as though they ought to measure what you are aiming to measure
Some tests may look valid and not be and others not look valid but are

How well did you know this?

Not at all

Perfectly

Construct validity

Constructs are theoretically driven ways of talking about certain features in the world
Construct validity asks how well a test can give a construct meaning
o E.g. anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
o Construct irrelevance: scores are influenced by something other than what the test is supposed to measure e.g. anxiety or illness impacting exam score

How well did you know this?

Not at all

Perfectly

Construct validity

Scientific evidence demonstrating that the construct (mode, concept, idea, notion) is actually being measured by the test
Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy
Measured with statistical tools and methods

How well did you know this?

Not at all

Perfectly

Criterion-Related validity

Study These Flashcards

What is the relationship between the criterion (another standard) for the test and the test scores?
Concurrent validity
o What is the evidence that my test compares well (e.g. highly correlated) with the results of some other way of knowing (e.g. corroboration)
o Relates to what is known at this point in time

Concurrent Validity

Study These Flashcards

o What is the evidence that my test compares well (e.g. highly correlated) with the results of some other way of knowing (e.g. corroboration)
o Relates to what is known at this point in time

Convergent validity

Study These Flashcards

If you think your test measures post-traumatic growth, you would expect it to be related to other instruments designed to measure positive post-trauma perception (e.g. SRG & Thriving)
You don’t want a perfect correlation with other tests as that would make yours redundant
Depending on how close the theoretical connection is between the tests, the coefficient will vary

Discriminant validity

Study These Flashcards

As with convergent validity, discriminant validity uses other established tests to test construct validity
This time, you are looking for little or no correlation between your measure (e.g. IES-R & PTGI)
Knowing what you are dealing with may have significant clinical influence

Predictive validity

Study These Flashcards

If a standard to compare to is not available, interest turns to Predictive validity
o The relationship between the test scores now and a standard in the future
o For example, OP scores and first year academic success – depends on age…
o OP for school leavers better predictor than learning strategies which is more predictive for mature age
o Combining evidence often offers more predictive validity e.g. OP. introversion and learning strategies in school leavers

Cross cultural fairness

Study These Flashcards

The idea that ethnicity, gender, class, background etc impact on results
Lot so laws passed in the US about tests being culturally fair so as not to repeat errors of the past which disadvantaged minority groups e.g. employment tests must be able to demonstrate the test is relevant to the job sought
E.g. what number comes next in the sequence, one, two, three, _____?
Mong or yuur mong
As wallaby is to animal so cigarette is to ___
Kuuk thaayorre of Edward River

Practicality 1

Study These Flashcards

Choosing the right test to administer; time, format, cost etc
Time
o Time taken to do test needs to reflect person targeted e.g. attention span, age, time available to do the test
Cost
o Many tests cost and some cost a lot. Balance cost of test with reliability of it and level of need to take it
Format
o Types of questions, font size, layout, MC lower anxiety but not always culturally fair (e.g. white males and MC)

Practicality 2

Study These Flashcards

Readability
o Test items be reviewed for readability – our school
Ease of administration, scoring and interpretation
o Understanding the test manuals
o How many people are taking the test and does this impact on ease of administration?
o Level of training needed to administer, score and interpret the results
o How long will scoring take and how long will report take?
o Time needed to explain results to test taker
o Other materials like publisher’s preformatted sheets

Selecting and administering a good test

Study These Flashcards

There are 000s of tests – how to choose?
What are the goals of the client or researcher?
Which tests can achieve that goal?
Sourcing tests through articles, books, publisher’s catalogues, test library, online
Major test categories include:
o IQ, aptitude, achievement, behaviour, development, personality, neuropsychological, science, sensory, perception, speech, hearing…
Examine research about the test such as its validity and reliability data
o PCA/EFA and CFA
o Different forms of reliability testing
o Different forms of validity testing
Based on all the information and steps outlined, make a balanced and informed choice

Making meaning out of raw scores

- Raw scores have not been manipulated in any way - They mean nothing without putting them in context so might compare the score an individual gets with the ‘normed score’ o How did the client fair next to others from the same type of group who have taken the test before? o Compare people from 2 groups e.g. using percentiles o Compare results for one person on 2 or more tests – sometimes discrepancies between these scores indicate an impairment of sorts

Frequency distributions

- What was the score and how often did it occur? Listing scores in numerical order you can easily see if this person scored higher or lower than most on the distribution - Might list scores or groups of scores - Histograms & frequency polygons etc assist in getting an overview of the data. We can learn a lot about the data from its shape

Standard scores

- Percentiles are commonly used (e.g. 75th percentile) - Z-scores are standard scores - The mean always = 0 and the SD = 1 - Z = the score, less the mean, divided by the standard deviation and is therefore sensitive to all components of the variance equations including sample size

The z-score distribution

- We can represent any score from any distribution of scores on this normal distribution and make the mean = 0 and the SD = 1

T scores

- Another standardised score - T-scores are often used in personality testing - M = 50 and SD = 10 - T = z(SD) + M

Quantitative Approaches

``` o Deductive o Positivist o Realist o Objective o Reductionist o Generalisation o Numbers ```

Qualitative Approaches

``` o Inductive o Interpretive o Constructivist o Subjective o Holistic o Uniqueness o Words ```

A word about qualitative rigor

- Qualitative research has different ways to investigate the quantitative equivalent to reliability and validity – trustworthiness e.g. triangulation - Qual. Methods can be used to support quant. Measures - They can also be used to inform quant studies - BUT – qual and quant are underpinned by very different philosophical assumptions o E.g. explore construct meaning/measure constructs o Idiographic/nomothetic

Lecture Two Flashcards

(32 cards)