WEEK 10 - Testing Flashcards
What are the steps in test questionnaire construction?
- Define the test
- Selecting a scaling method
- Constructing the items
- Testing the items
- Revising the test
- Publishing the test
How is a test defined?
- Test/questionaire
- Item
- Measure
- Already been a test developed?
What is an item in relation to a test?
- Generic word for various forms of content in a psychological test or questionnaire
- Measurement of attribute
- Carefully selected
In defining a test, how do you establish what you are seeking to measure?
- Develop clear idea or specification of the attribute
- Existing theory as a guide
- Write a document containing specifications for the development of items that includes:
- Clear definition of attribute
- outcome of a literature
- If more than one attribute is to be measured, a specific specification is needed for each
In defining a test, it is costly and time consuming to develop a new test/questionnaire. Where would you go to find existing mental tests?
Mental measurement yearbook:
- Commercial product released every 5 years
- Contains info about tests purpose, publisher, pricing, population and scoring
- Includes only commercially available tests and those in English
What is the Kaufman and Kaufman model of the test definition process?
- Measure attribute/construct from a strong theoretical and research basis
- Must have capacity to distinguish between different attributes
- Yield scores that are translatable to an intervention
- Include novel tasks or questions
- Be easy to administer and objective to score
- Be sensitive to the diverse needs of the groups being assessed
What are the types of data?
Categorical
- Gender
- Age band/group
- Political party
Numerical
- Discrete
- no.of children
- Assignment mark
- Coffees in one day
- Continuous
- Weight
- Voltage
- Length
What is nominal measurement?
- A group you put someone in is categorical
- Assign number based on the group the person belongs to but the numerical value is meaningless
What is ordinal measurement?
- Still categorical, but in ranking order
- Tells order but not the distance between each point
What is interval measurement?
- Where continuous data is obtained
- eg. temperature
- Equal distance between points
- no true 0
- People can provide responses according to an ordered response option scale
- Also referred to as a likest-type scale
What is a ratio measurement?
- continuous
- starting point of zero
- difference between points are meaningful
- Ratio scales are rare in psychological measurement
What is included when constructing the items of a test?
- Item format Related to scaling method of choice Dozens of choices available - Types of formats MCQ T/F Force-choice Likert
What are some limitations with MCQs?
- Difficult to construct items
- Provides cues for correct response (does not assess free recall)
What are some limitations of true/false questions?
- Answers may reflect social desirability more than personality traits
- Not much variability
What are the strengths for forced-choice methodology?
- Often used in personality tests
- Overcomes the problems of t/f questions in social desirability
What are the problems with forced-choice methodology?
People don’t always fit in either category
What are the strengths to using likest-type scales?
- One of the most used
- Can better account for individual differences
- Good for assessing attitudes and perceptions
- Reduces desirability bias
What are some problems with the likest-type scales?
- is it consistently measuring the construct in question
- Are all of the item appropriate and contributing to the overall interpretation of the test
- Assumes strength, intensity of an attitude is linear
- People don’t always fit Ito specified option
- Social desirability can still occur
How may you test the items in a questionnaire to make sure they are reliable?
- Conduct a pilot study to ensure the items are clear and easily understood
- Administer questionnaire to large participant sample
- Do some number crunching in special statistical software
- Investigate psychometric properties for individual items
* item characteristics
* Create statistically sound sub scales
* Throw out non-performing items - Determine reliability and validity for the sub scales/overall test
How is a test revised?
- Using the new developed test/questionnaire, collect data in a new sample
- Repeat previous steps
- Make necessary refinements
Cross-validate - does test perform just as well in new sample?
- Obtain feedback from examinees or participants
What is involved with publishing a test?
- Produce testing materials
- Develop a technical and users manual that includes:
Background info
Development history
Administration instructions
Reliability
Validity
Normative info - Publish a scientific paper
What are the 3 main concepts of testing?
Standardisation and norms
Validity
Reliability
What is standardisation and norming?
The process of administering a test to a representative sample for the purpose of establishing norms is referred to as standardising a test.
What are standardisation groups?
- Once we have an individual’s score, we want. to know where that score fits in comparison with the individual’s peers
- Large groups of people are tested and their scores are used to work out test norms
- We can use the mean and SD of this to work out where an individual sits in comparison to others
- Depending on the purpose of the test, the standardisation group might be quite specific or general
- Norms also might change over time (eg. Flynn effect)
What are percentiles?
When we have a group of scores, we can also work out where a score fits in a distribution
* can be done for specific sub-categories aswell
What is test validity?
- Reflects a test’s ability to assess the construct it was designed to measure
What are the types of test validity?
- COntent validity
- Construct validity
- Criterion-related validity
What is content validity?
Determined by the degree to which items on the test are representations of the domain of behaviour the test purports to measure
Describe construct validity
The appropriateness of the inference about. the underlying construct
What is a construct?
A theoretical, intangible quality or trait in which individuals differ
What are the psychometric approaches to understanding the construct validity of tests?
- Identiies groups of items that intercorrelate highly
- Correlation
- Factor analysis
What is factor analysis in relation to construct validity?
Statistical technique to determine the pattern of correlations or variability amongst the items; correlated items or items that share variance form factors or dimension
- Factors represent underling abilities
- Factors in a test can be correlated with factors in other alternative tests
What is correlation?
Statistical measure to indicate the extent to which 2 variables are related
- Measured on a scale to -1 to 1
- 0= no correlation, 1= high positive -1= high negative
What is criterion validity?
The extent to which the test predicts or is related to an outcome
eg. does performance on a IQ test predict academic success?
What is reliability in relation to tests?
Concerns measurement consistency or the ability of a test to produce consistent results
- Is it consistently measuring the construct in question?
- Are all of the items appropriate and contributing to the overall interpretation of the test?
What are the two types of reliability?
Internal
External
What is internal reliability?
Concerns the extent to which a measure is consistent to itself
- Also referred o as internal. consistency
What is external reliability?
Concerns the extent to which a measure varies from one use to another
test-retest reliability: Stability over time
Inter-rater reliability: the degree to which different. rater’s give consistent estimates of the same behaviour
What are some sources of error in tests?
Test construction
Test administration
Test scoring and interpretation