WEEK 10 - Testing Flashcards by teag Lloyd

What are the steps in test questionnaire construction?

Define the test
Selecting a scaling method
Constructing the items
Testing the items
Revising the test
Publishing the test

How well did you know this?

Not at all

Perfectly

How is a test defined?

Test/questionaire
Item
Measure
Already been a test developed?

How well did you know this?

Not at all

Perfectly

What is an item in relation to a test?

Generic word for various forms of content in a psychological test or questionnaire
Measurement of attribute
Carefully selected

How well did you know this?

Not at all

Perfectly

In defining a test, how do you establish what you are seeking to measure?

Develop clear idea or specification of the attribute
Existing theory as a guide
Write a document containing specifications for the development of items that includes:
Clear definition of attribute
outcome of a literature
If more than one attribute is to be measured, a specific specification is needed for each

How well did you know this?

Not at all

Perfectly

In defining a test, it is costly and time consuming to develop a new test/questionnaire. Where would you go to find existing mental tests?

Mental measurement yearbook:

Commercial product released every 5 years
Contains info about tests purpose, publisher, pricing, population and scoring
Includes only commercially available tests and those in English

How well did you know this?

Not at all

Perfectly

What is the Kaufman and Kaufman model of the test definition process?

Measure attribute/construct from a strong theoretical and research basis
Must have capacity to distinguish between different attributes
Yield scores that are translatable to an intervention
Include novel tasks or questions
Be easy to administer and objective to score
Be sensitive to the diverse needs of the groups being assessed

How well did you know this?

Not at all

Perfectly

What are the types of data?

Categorical

Gender
Age band/group
Political party

Numerical

Discrete
no.of children
Assignment mark
Coffees in one day
Continuous
Weight
Voltage
Length

How well did you know this?

Not at all

Perfectly

What is nominal measurement?

A group you put someone in is categorical

- Assign number based on the group the person belongs to but the numerical value is meaningless

How well did you know this?

Not at all

Perfectly

What is ordinal measurement?

Still categorical, but in ranking order

- Tells order but not the distance between each point

How well did you know this?

Not at all

Perfectly

What is interval measurement?

Where continuous data is obtained
eg. temperature
Equal distance between points
no true 0
People can provide responses according to an ordered response option scale
Also referred to as a likest-type scale

How well did you know this?

Not at all

Perfectly

What is a ratio measurement?

continuous
starting point of zero
difference between points are meaningful
Ratio scales are rare in psychological measurement

How well did you know this?

Not at all

Perfectly

What is included when constructing the items of a test?

- Item format 
Related to scaling method of choice 
Dozens of choices available 
- Types of formats
MCQ
T/F
Force-choice
Likert

How well did you know this?

Not at all

Perfectly

What are some limitations with MCQs?

Difficult to construct items

- Provides cues for correct response (does not assess free recall)

How well did you know this?

Not at all

Perfectly

What are some limitations of true/false questions?

Answers may reflect social desirability more than personality traits
Not much variability

How well did you know this?

Not at all

Perfectly

What are the strengths for forced-choice methodology?

Often used in personality tests

- Overcomes the problems of t/f questions in social desirability

How well did you know this?

Not at all

Perfectly

What are the problems with forced-choice methodology?

Study These Flashcards

People don’t always fit in either category

What are the strengths to using likest-type scales?

Study These Flashcards

One of the most used
Can better account for individual differences
Good for assessing attitudes and perceptions
Reduces desirability bias

What are some problems with the likest-type scales?

Study These Flashcards

is it consistently measuring the construct in question
Are all of the item appropriate and contributing to the overall interpretation of the test
Assumes strength, intensity of an attitude is linear
People don’t always fit Ito specified option
Social desirability can still occur

How may you test the items in a questionnaire to make sure they are reliable?

Study These Flashcards

Conduct a pilot study to ensure the items are clear and easily understood
Administer questionnaire to large participant sample
Do some number crunching in special statistical software
Investigate psychometric properties for individual items
* item characteristics
* Create statistically sound sub scales
* Throw out non-performing items
Determine reliability and validity for the sub scales/overall test

How is a test revised?

Study These Flashcards

Using the new developed test/questionnaire, collect data in a new sample
Repeat previous steps
Make necessary refinements
Cross-validate
does test perform just as well in new sample?
Obtain feedback from examinees or participants

What is involved with publishing a test?

Study These Flashcards

Produce testing materials
Develop a technical and users manual that includes:
Background info
Development history
Administration instructions
Reliability
Validity
Normative info
Publish a scientific paper

What are the 3 main concepts of testing?

Study These Flashcards

Standardisation and norms
Validity
Reliability

What is standardisation and norming?

Study These Flashcards

The process of administering a test to a representative sample for the purpose of establishing norms is referred to as standardising a test.

What are standardisation groups?

Study These Flashcards

Once we have an individual’s score, we want. to know where that score fits in comparison with the individual’s peers
Large groups of people are tested and their scores are used to work out test norms
We can use the mean and SD of this to work out where an individual sits in comparison to others
Depending on the purpose of the test, the standardisation group might be quite specific or general
Norms also might change over time (eg. Flynn effect)

What are percentiles?

When we have a group of scores, we can also work out where a score fits in a distribution * can be done for specific sub-categories aswell

What is test validity?

- Reflects a test's ability to assess the construct it was designed to measure

What are the types of test validity?

- COntent validity - Construct validity - Criterion-related validity

What is content validity?

Determined by the degree to which items on the test are representations of the domain of behaviour the test purports to measure

Describe construct validity

The appropriateness of the inference about. the underlying construct

What is a construct?

A theoretical, intangible quality or trait in which individuals differ

What are the psychometric approaches to understanding the construct validity of tests?

- Identiies groups of items that intercorrelate highly - Correlation - Factor analysis

What is factor analysis in relation to construct validity?

Statistical technique to determine the pattern of correlations or variability amongst the items; correlated items or items that share variance form factors or dimension - Factors represent underling abilities - Factors in a test can be correlated with factors in other alternative tests

What is correlation?

Statistical measure to indicate the extent to which 2 variables are related - Measured on a scale to -1 to 1 - 0= no correlation, 1= high positive -1= high negative

What is criterion validity?

The extent to which the test predicts or is related to an outcome eg. does performance on a IQ test predict academic success?

What is reliability in relation to tests?

Concerns measurement consistency or the ability of a test to produce consistent results - Is it consistently measuring the construct in question? - Are all of the items appropriate and contributing to the overall interpretation of the test?

What are the two types of reliability?

Internal | External

What is internal reliability?

Concerns the extent to which a measure is consistent to itself - Also referred o as internal. consistency

What is external reliability?

Concerns the extent to which a measure varies from one use to another test-retest reliability: Stability over time Inter-rater reliability: the degree to which different. rater's give consistent estimates of the same behaviour

What are some sources of error in tests?

Test construction Test administration Test scoring and interpretation

WEEK 10 - Testing Flashcards

(39 cards)