2. Test worthiness and Statistics Flashcards
What is an important test in creation and measures of worthiness?
Correlations
what is the range for correlations?
-1 - 1
what are the ways to observe correlation?
Linear - line of best fit
Curvilinear - what is the shape of the line?
What does shared variance question?
What are the factors that contribute to shared variance?
what is another term for shared variance
Squared coefficient
what is the only way to determine a variance when you have a correlation coefficient?
square the variance then you know how much variance there are between two variables
what does reliability refer to?
Refers to how free the test is from measurement error; are you going to get the same, or a very close score if you sit the test again? It’s about internal consistency, reliability & dependability
what does reliability depend on?
construction of test & environment administered in
why will there always be error?
There’s no perfect test or environment so will always be some error but we want to minimise it.
how is reliability usually reported as?
correlation coefficient
why do different types of tests have different levels of reliability?
reliability e.g., well constructed achievement tests may have reliability coefficients of .90 but personality tests often much lower (.70) because the concept is abstract & potentially fluctuates
there are many ways to test reliability. What are some measures of reliability?
test-retest, alternative forms , consistency of measurement
what is test-retest reliability?
Give test twice to same group, usually a couple of weeks apart. then correlate results
what does a higher correlation suggest in test-retest reliability?
Higher the correlation, more reliable the test
why does the fluctuations in results of test-retest reliability occur?
Results can fluctuate depending on things such as the time between test being taken, forgetting information, might learn more about test contents by studying in the interim, more familiar with test format second time around
When is test-retest reliability likely to fluctuate?
when testing a stable construct (e.g. IQ)
what is ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Making two or more versions of the same test
what issues ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY prevent?
`Stops issues like people remembering or studying particular answers between test-retest
what is the difficulty of ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Hard to make tests equal in terms of content & levels of difficulty, or ensuring administration was exactly the same
what must test developers demonstrate in order to have good ALTERNATIVE, PARALLEL OR EQUIVALENT FORMS RELIABILITY?
Test developer must demonstrate the versions are truly parallel
what is reliability as internal consistency measuring?
Measures how test items relate to each other & the test as a whole
where does reliability as internal consistency look into?
Looks within the test to measure reliability
• E.g., test to measure of anxiety – respondents should answer items that tap aspects of anxiety in a similar way
what is them most common form of internal consistency measure?
Most common forms of internal consistency (i. e., reliability) are split-half and Cronbach’s alpha
What is split-half reliability?
Use one form of the test administered at the same time. Split the test in two and correlate the scores
what are the issues of the split-half reliability?
test may get harder as you go along so first half is not equal to second. May compare odd and even numbered questions but still the halves may not be equal & the test is shorter which can decrease reliability
what equation can be used to compensate for shorter tests in split-half reliability?
Can use Spearman-Brown equation to compensate for shorter test (rx2/1+r)
what does CRONBACH’S COEFFICIENT ALPHA AND KUNDER-RICHARDSON attempt to rate?
Try to rate internal consistency by estimating reliability of all possible split-half combinations by correlating each item with the total and averaging
what is Kuder-Richardson used with?
Kuder-Richardson used with forced-choice format tests
what is validity?
The extent to which all the available evidence supports that the test is actually measuring what it is intended to measure.
what is validity essential for?
It is a central requirement for test without which the test items/tasks would not have meaning
what are the categories of content validity?
face validity
what are the categories of construct validity?
- Criterion-related validity
* Predictive validity
what is content validity testing?
The content of a test reflects what the test is aiming to measure. It is sometimes enough for a validity test
What is construct validity not enough to ascertain test validity for?
Not enough to ascertain test validity beyond achievement type tests e.g., for more abstract constructs
what is face validity?
It refers to the look of the test but maybe superficial.
o E. g., the items in the test look as though they ought to measure what you are aiming to measure
Where some tests may look valid and not be or others dont look valid but are
what are constructs?
Constructs are theoretically driven ways of talking about certain features in the world
what does construct validity ask?
Construct validity asks how well a test can give a construct meaning
o e.g., anxiety – only exists in so much as the construct represents a set of behaviours, thoughts and feelings
what is construct irrelevance?
Scores are influenced by something other than what the test is supposed to measure e.g., anxiety or illness impacting exam score.
what does construct validity provide?
Scientific evidence demonstrating that the construct (model, concept, idea, notion) is actually being measured by the test.
when is construct validity most important?
Most important when developing tests to measure abstract constructs like depression, anxiety, happiness, love, empathy.
what is construct validity measured with?
Measured with statistical tools and methods…
what does criterion-related validity question?
What is the relationship between the criterion (another standard) for the test and the test scores?
what does concurrent validity question?
It refers to the extent to which the results of a particular test, or measurement, correspond to those of a previously established measurement for the same construct.
what does concurrent validity relate to?
Relates to what is known at this point in time
what is convergent validity?
e.g. If you think your test measures post-traumatic growth, you would expect it to be related to other instruments designed to measure positive post-trauma perception (e.g., SRG & Thriving)
why dont you want a perfect correlation with other tests when measuring convergent validity?
You don’t want a perfect correlation with other tests as that would make yours redundant.
how does convergent validity vary?
Depending on how close the theoretical connection is between the tests, the coefficient will vary.
what does discriminant validity use to determine validity?
As with convergent validity, discriminant validity uses other established tests to test construct validity.
when using discriminant validity, what are you looking for?
This time, you are looking for little or no correlation between your measure (e.g., IES-R & PTGI).
what can significantly influence discriminant validity in a clinical context?
Knowing what you are dealing with may have significant clinical influence (e.g., Shakespeare-Finch & de Dasell, 2009).
when does one turn to predictive validity?
If a standard to compare to is not available, interest turns to Predictive validity
what does predictive validity test?
The relationship between the test scores now and a standard in the future.
o For example, OP scores and first year academic success – depends on age…
o OP for school leavers better predictor than learning strategies which is more predictive for mature age
o Combining evidence often offers more predictive validity e.g., OP. introversion and learning strategies in school leavers.
what is cross-cultural fairness>
The idea that ethnicity, gender, class, background etc impact on results
why were there lots of laws passed in the US about tests being culturally fair?
Lots of laws passed in the US about tests being culturally fair so as not to repeat errors of the past which disadvantaged minority groups e.g., employment tests must be able to demonstrate the test is relevant to the job sought
what is practicality?
Choosing the right test to administer; time, format, cost etc
what are the elements of practicality?
time, cost, format, readability, ease of administration, scoring and interpretation
what is the time aspect of practicality?
Time taken to do test needs to reflect person targeted e.g., attention span, age, time available to do the test
what is the cost aspect of practicality?
Many tests cost & some cost a lot. Balance cost of test with reliability of it and level of need to take it.
what is the format aspect of practicality?
Types of questions, font size, layout, MC lower anxiety but not always culturally fair (e.g., white males & MC)
what is the readability aspect of practicality?
Test items be reviewed for readability – our school
what is the Ease of administration, scoring & interpretation aspect of practicality?
o Understanding the test manuals
o How many people are taking the test & does this impact on ease of administration?
o Level of training needed to administer, score & interpret the results
o How long will scoring take and how long will report take?
o Time needed to explain results to test taker
o Other materials like publisher’s preformatted sheets
how does one select and administer a good test?
Sourcing tests through articles, books, publisher’s catalogues, test library, online and examining the research about the test such as reliability data and validity
what are questions to considered when selecting and administering a good test?
- There are 000s of tests – how to choose?
- What are the goals of the client or researcher?
- Which tests can achieve that goal?
what do the major test categories include?
IQ, aptitude, achievement, behaviour, development, personality, neuropsychological, science, sensory perception, speech, hearing…
what are raw scores?
Raw scores have not been manipulated in any way
what is the meaning of raw scores?
They mean nothing without putting them in context so might compare the score an individual gets with the ‘normed score’ (Norm data usually generated from hundreds and hundreds of samples)
what are ways to interpret raw scores?
o How did the client fair next to others from the same type of group who have taken the test before?
o Compare people from 2 groups e.g., using percentiles
o Compare results for one person on 2 or more tests – sometimes discrepancies between these scores indicate an impairment of sorts
what does frequency distribution question?
What was the score and how often did it occur?
How does one determine frequency distribution?
Listing scores in numerical order you can easily see if this person scored higher or lower than most on the distribution. Might list scores or groups of scores. Histograms & frequency polygons etc assist in getting an overview of the data. We can learn a lot about the data from its shape
what is leptikurtic
tall distribution
what is platykurtic
flat distribution
what is positively skewed distribution
tail to the right
what is a negatively skewed distribution
tail to the left
what is commonly used to express standard scores?
percentiles
what are standard scores?
z scores
what is the mean of a standard score?
0
what is the standard deviation of a standard score?
1
What is the Z score?
the score, less the mean, divided by the standard deviation and is therefore sensitive to all components of the variance equations including sample size.
when a t scores normally used?
personality testing
what is the mean in t scores?
50
what is the sd in t scores?
10
what is the equation for t scores?
T = z(SD)+M
what are quantitative approaches?
o Deductive o Positivist o Realist o Objective o Reductionist o Generalisation o Numbers
what are the qualitative approaches?
o Inductive o Interpretive o Constructivist o Subjective o Holisitc o Uniqueness o Words
what can qualitative methods be used to support?
quantitative measures and inform quantitative studies