Lecture 2 Flashcards

1
Q

Remember Correlations

A

An important stat in test creation and measures of worthiness
-1 to 1
Linear
Line of best fit
Curvilinear
Assumptions underpinning correlation – why?
How does variance impact on the coefficient?
What about the shape of the line?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Shared Variance (Squared coefficient)

A

What are the factors that contribute to the shared variance?

What is not shared?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Reliability

A
  • Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? It’s about consistency & dependability
  • Depends on construction of test & environment administered in
  • There’s no perfect test or environment so will always be some error but we want to minimise it.
  • Reliability usually reported as a correlation coefficient
  • Different types of tests tend to have different levels of reliability e.g., well constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70) because the concept is abstract & potentially fluctuates
  • Many ways of measuring reliability including test-retest, alternative forms , consistency of measurement
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Test-retest reliability

A
  • Give test twice to same group, usually a couple of weeks apart.
  • Correlate results
  • Higher the correlation, more reliable the test
  • Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
  • Is less likely to fluctuate if testing a stable construct (e.g., IQ)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Alternative, parallel or equivalent forms reliability

A
  • Making two or more versions of the same test
  • Stops issues like people remembering or studying particular answers between test-retest
  • Hard to make tests equal in terms of content & levels of difficulty, or ensuring administration was exactly the same
  • Test developer must demonstrate the versions are truly parallel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Reliability as Internal consistency

A
  • Measures how test items relate to each other & the test as a whole
  • Looks within the test to measure reliability
  • E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
  • Most common forms of internal consistency (i. e., reliability) are split-half and Cronbach’s alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Split-half reliability

A
  • Use one form of the test administered at the same time. Split the test in two and correlate the scores
  • Issues: test may get harder as you go along so first half is not equal to second
  • May compare odd and even numbered questions but still the halves may not be equal & the test is shorter which can decrease reliability
  • Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cronbach’s coefficient alpha & Kuder-Richardson

A
  • Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
  • Kuder-Richardson used with forced-choice format tests
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validity

A
  • Validity: The extent to which all the available evidence supports that the test is actually measuring what it is intended to measure.
  • It is a central requirement for test without which the test items/tasks would not have meaning
  • Content validity
    • Face validity
  • Construct validity
    • Criterion-related validity
    • Predictive validity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Content Validity

A
  • The content of a test reflects what the test is aiming to measure
  • Sometimes content validity is enough validity for a test
  • Not enough to ascertain test validity beyond achievement type tests e.g., for more abstract constructs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Example of content validity

A
  • Your results on the end of semester exam should reflect what you know about what has been covered in this unit.
  • So, there should be questions that equally represent the concepts introduced
    • e.g., construct underrepresentation; a failure to capture an important aspect or overrepresentation.
  • The questions should be worded in a way that is consistent with the language concepts are taught in
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Face validity

A
  • It refers to the look of the test but maybe superficial.
    • E. g., the items in the test look as though they ought to measure what you are aiming to measure
  • Some tests may look valid and not be and others not look valid but are.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Construct validity

A
  • Constructs are theoretically driven ways of talking about certain features in the world
  • Construct validity asks how well a test can give a construct meaning
    • e.g., anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
    • Construct irrelevance: Scores are influenced by something other than what the test is supposed to measure e.g., anxiety or illness impacting exam score
  • Scientific evidence demonstrating that the construct (model, concept, idea, notion) is actually being measured by the test.
  • Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy.
  • Measured with statistical tools and methods…
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Criterion-Related validity

A
  • What is the relationship between the criterion (another standard) for the test and the test scores?
  • Concurrent validity
    • What is the evidence that my test compares well (e.g., highly correlated) with the results of some other way of knowing (e.g., corroboration)
  • -Relates to what is known at this point in time
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Convergent validity

A
  • If you think your test measures post-traumatic growth, you would expect it to be related to other instruments designed to measure positive post-trauma perception (e.g., SRG & Thriving)
  • You don’t want a perfect correlation with other tests as that would make yours redundant.
  • Depending on how close the theoretical connection is between the tests, the coefficient will vary.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Discriminant validity

A
  • As with convergent validity, discriminant validity uses other established tests to test construct validity.
  • This time, you are looking for little or no correlation between your measure (e.g., IES-R & PTGI).
  • Knowing what you are dealing with may have significant clinical influence (e.g., Shakespeare-Finch & de Dasell, 2009).
17
Q

Predictive validity

A
  • If a standard to compare to is not available, interest turns to Predictive validity
  • -The relationship between the test scores now and a standard in the future.
  • -For example, OP scores and first year academic success – depends on age…
  • -OP for school leavers better predictor than learning strategies which is more predictive for mature age
  • -Combining evidence often offers more predictive validity e.g., OP. introversion and learning strategies in school leavers.
18
Q

Cross-cultural fairness

A
  • The idea that ethnicity, gender, class, background etc impact on results
  • Lots of laws passed in the US about tests being culturally fair so as not to repeat errors of the past which disadvantaged minority groups e.g., employment tests must be able to demonstrate the test is relevant to the job sought
  • E. g., What number comes next in the sequence, one, two, three, __________?
  • Mong or yuur mong
  • As wallaby is to animal so cigarette is to _______
  • kuuk thaayorre of Edward River
19
Q

Practicality

A
  • Choosing the right test to administer; time, format, cost etc
  • Time
  • -Time taken to do test needs to reflect person targeted e.g., attention span, age, time available to do the test
  • Cost
  • -Many tests cost & some cost a lot. Balance cost of test with reliability of it and level of need to take it.
  • Format
  • -Types of questions, font size, layout, MC lower anxiety but not always culturally fair (e.g., white males & MC)
  • Readability
  • -Test items be reviewed for readability – our school
  • Ease of administration, scoring & interpretation
  • -Understanding the test manuals
  • -How many people are taking the test & does this impact on ease of administration?
  • -Level of training needed to administer, score & interpret the results
  • -How long will scoring take and how long will report take?
  • -Time needed to explain results to test taker
  • -Other materials like publisher’s preformatted sheets
20
Q

Selecting & administering a good test

A
  • There are 000s of tests – how to choose?
  • What are the goals of the client or researcher?
  • Which tests can achieve that goal?
  • Sourcing tests through articles, books, publisher’s catalogues, test library, online
  • Major test categories include:
  • -IQ, aptitude, achievement, behaviour, development, personality, neuropsychological, science, sensory perception, speech, hearing…
  • Examine research about the test such as its validity & reliability data
  • -PCA/EFA & CFA
  • -Different forms of reliability testing
  • -Different forms of validity testing
  • Based on all the information and steps outlined, make a balanced & informed choice
21
Q

Making meaning out of raw scores

A
  • Raw scores have not been manipulated in any way
  • They mean nothing without putting them in context so might compare the score an individual gets with the ‘normed score’
  • -How did the client fair next to others from the same type of group who have taken the test before?
  • -Compare people from 2 groups e.g., using percentiles
  • -Compare results for one person on 2 or more tests – sometimes discrepancies between these scores indicate an impairment of sorts
22
Q

Frequency distributions

A
  • What was the score and how often did it occur? Listing scores in numerical order you can easily see if this person scored higher or lower than most on the distribution
  • Might list scores or groups of scores
  • Histograms & frequency polygons etc assist in getting an overview of the data. We can learn a lot about the data from its shape
23
Q

Standard Scores

A
  • Percentiles are commonly used (e.g., 75th percentile)
  • z-scores are standard scores
  • The mean always = 0 and the SD = 1
  • Z = the score, less the mean, divided by the standard deviation and is therefore sensitive to all components of the variance equations including sample size.
24
Q

T scores

A

Another standardised score
T-scores are often used in personality testing
M = 50 and SD = 10
T = z(SD)+M

25
Q

Quantitative

A
Deductive
Positivist 
Realist
Objective
Reductionist
Generalisation
Numbers
26
Q

Qualitative

A
Inductive
Interpretive
Constructivist
Subjective
Holisitc
Uniqueness
Words
27
Q

A word about qualitative rigor

A

Qualitative research has different ways to investigate the quantitative equivalent to reliability and validity - trustworthiness e. g., triangulation.
Qual. methods can be used to support quant. measures e. g.,
They can also be used to inform quant studies
e. g., Shakespeare-Finch, J., Wehr, T., Kaiplinger, I., & Daley, E. (2014). Caring for emergency service personnel: Does what we do work? Proceedings of the Australia & New Zealand Disaster & Emergency Conference, Gold Coast (QLD), 5th- 7th May 2014
BUT – qual and quant are underpinned by very different philosophical assumptions
e. g., explore construct meaning/measure constructs
Idiographic/nomothetic