week 3 - selection of items Flashcards
item analysis table
each column represents an item (a,b,c,d)
each row represents a respondent (1,2,3,4)
knwoledge based: insert a 1 in each cell for which the respondent answered correctly and a 0 for incorrect
- then add up all the scores to give total score for each row
facility index
indication of the extent to which all respondents answer an item in the same way
- means items are redundant and should get rid of them
item difficultu
implicit notion of right or wrong and suggets this estimate would only be appropriate for knowledge based tests
- this exists for all tests
facility index for person based questionaires
calculated by dividing number or respondents which obtain correct response for an item by the total number of respondents
ex. let’s say we have 100 people write a test
and for a specific item, 70 obtain the correct response.
Then the facility index would be calculated as 70/100 =
0.70
meanings of facility indexes
facility index for each item should lie between 0.25 and 0.75, averaging 0.5 for the entire questionnaire
less than 0.25 indicates the item is too difficult
more than 0.75 shows the item is too easy
facility index should be higher than chance
Calculating optimal difficulty
= ( 1 + chance) / number of options
ex. true/false time (2 options, chance 0.5)
optimal difficulty = (1 + 0.5) / 2 = 1.5/2 = 0.75
facility index for person based
summing the scores for the item for each respondent, then dividing this total by the number of respondents
- If the facility index is 2 in a four option item due to everyone selecting option 2, then the item is not useful as everyone is selecting the same choice
Discrimination
ability of each item to discriminate among respondents according to whatever the questionnaire is measuring
- Items should be selected if they measure the same
knowledge or characteristic as the other items in the
questionnaire
how is discrimination measured
correlating each item with the total score from summing all the other items in the questionnaire
- min. 0.20 is considered acceptable
- Items with negative or zero correlations are always
excluded
item characteristic curve
- visual approach to examining discrimination
- break the total score into a number of bins
- identify the proportion of correct responses on a given item in each bin
- connect dots and hope to see for a given item - as we move up in bins we will see proportion increasing
- Flat items would indicate a 0 correlation
ICC Matrix
- a tool to plot both difficulty and discrimination with many items
- can place a line at the recommended cut offs to identify which items are deemed good vs bad
use of distractors
each distractor should have an equal probability of being selected
- Items for which the distractor options are not roughly equal in proportion are considered to not be
functioning properly
modification
plan to use 50%+ more items than you will need for the final survey as you will throw many items out
considerations for total amount of items
- facility
- discrimination
- distractors
- number of items u require for final version
- how well items fit the blueprint
reliability
is an estimate of the accuracy of a questionnaire
- Generally want a minimum reliability of 0.70 for person-based and 0.80 for knowledge-based questionnaires
cronbachs alpha
is a measure of the internal consistency of the questionanire
- most widely used and accepted estimate of reliability
split half reliability
- questionnaire is divided into 2 halves
- typical way is to split into even and odd numbers - calculate the correlation between the two halves
if both halves produce a strong
positive correlation then it is
evidence towards reliability
ex. if the correlation between the two halves was
0.80 (fairly high)
- Split-half reliability = 2*0.80 / (1+0.80) = 1.60/1.80 = 0.88.
face validity
describes the apperarance of the questionnaire to respondents
- asks whether or not the questionnaire looks as if it is measuring what it claims to measure
- if not, people may not take it seriously and refuse to participate
content validity
relationship between the content and the purpose of the questionnaire
- purely subjective judgement
ex. a blueprint for a questionnaire used in a job
selection should match the job description.
standardization
involved obtaining scores on the final version of your questionnaire from appropriate groups of respondents
norms = the scores obtained
Standardization and Norms
- With good norms, it is possible to interpret the score of an
individual and make statements on whether their score was
typical or atypical
ex. wish to determine how a
person with a suspected clinical disorder compares with people
who have been diagnosed as having that disorder
reporting
after obtaining norm samples want to provide info about norms and sample characteristics to provide evidence of representativeness
- provide mean test score
- provide standard deviation test score
standard score/ z score
how many standard deviations their score differs from the mean
figure ranges between -3.00 and 3.00
z score = 0, they are right at the average.
z score = 1.00, they are one standard deviation above the
mean
z-score = -1.50, they are one and a half standard deviations below the mean
standardized scores
assumes individuals do not know about standard scores or what a standard deviation is
T score
multiply the standard
score by 10 and add 50. Then you run to the nearest
whole number
- Z-score larger than -5.0 would have a positive value
- very rare to obtain someone with a zsore lower than -5.0 or higher than 5.0
Stanine
commonly used standardized score for person based tests
- multiply z score by 2 then adding 5 and finally round to nearest integer
- produces integer bw 1-9
scaling
process of measuring objects in a way that maximizes precision, objectivity and communication
- provides a way to mathematically understand a stimulus response relationship
response-centered scaling
responses are scaled to place a subject along a psychological continuum based on strength of psychological trait they possess
subject centered scaling
summing or averaging all items
- most common
dimensionality
reflects the number and nature of variables assessed by its items
- can be unidimensional or multidimensional
- directly impacts scoring of a questionaire
- dictates number or meaningful scores
readability
how difficult your test will be in terms of the ability to understand in english
- When the value is greater than 10 it is often taken as the
number of years of education required.
interview as a test
ask questions and gather info, create categories and assign numbers to summarize indiivdual
ex. emplyment
good interview - social facilitation
mood of an interviwee can influence mood of interviewer and vice versa
good interview - attitude
- interpersonal influences
(the degree to which one can influence another) is related to interperosonal attraction (the degree to which people share a feeling). - As such it is important to be warm, accepting, open, and
understanding when conducting an interview
good interview - judgement
no judgement or uncomfy
- terms such as good, bad, excellent, terrible, and such
tend to make interviewees feel judged
good interview - limit probing
A probing question asks “why” or a follow up to the
respondents last response
- probing = defensiveness
good interview - open ended questions
mostly open ended
- open dialouge to discuss subject
good interview - flow
keep flow with transitions, repetition of words, paraphrasing or summarizing
- collect as much info as possible