Assessment and Testing Flashcards
What is appraisal?
Appraisal refers to the process of assessing or estimating attributes. It could include a survey, observations, or even clinical interviews. A test is simply an instrument which measures a given sample of behavior. Measure means that it connotes that a number or a score has been assigned to the person’s attribute or performance.
What is the study of psychometrics?
Psychometrics is the study of psychological measurement. Someone who primarily administers and interprets tests has the job title of a psychometrician. It is critical that counselors inform clients about the limitations of any tests they administer.
What is a test and what is test format?
A test is a systmatic method of measuring a sample of behavior. Test format refers to the manner in which test items are presented. Test formats can be subjective (a paradigm that relies mainly on the scorers opinion, can be impacted by personal bias). In an objective test, the rater’s judgement plays little or no part in the scoring process.
What is a free-choice test?
In a free-choice or free-response test or question, the person taking the test can respond in any manner he or she chooses. Although free choice responses can yield more information, they often take more time to score and increase subjectivity.
What is a forced-choice test?
Forced choice items can also be known as recognition items – I.e. multiple choice. On some tests, this format is used to control for the social desirability phenomenon (when people puts the answers he thinks is socially desirably). The MMPI uses forced choices to create a “lie scale” composed of human frailties we all posses, so the scale ferrets out people who try to make themselves look good vs. answering honestly.
What is the difficulty index?
TH isis the percentage of individuals who answered each item correctly. The higher the number of people who answered a question correctly, the easier it is – and vice versa. A .5 difficulty index would suggest that 50% of those tested answered the question correctly. Most theorists agree that a “good measure” will provide a wide range of items that even a poor performer can answer correctly.
What are recognition items?
Recognition items are response-types, like multiple choice, that gives the examinee two or more alternatives. A true/false test has dichotomous recognition items. If a test has 3 or more forced choices, psychometrician call it a multipoint item.
What is a normative test format?
Normative tests are used to compare someone to other people who took the same test. Can be used to assess a quality, trait, etc.
A client who takes a normative test can be compared to others who have taken the test. A normative interpretation is one in which the individual’s score is evaluated by comparing it to others who took the same test – a percentile rank is an excellent example.
What is an Ipsative test format?
Ipsative measures compare traits within the same individual - they do not compare a person to other persons who took the instrument (I.e. NOT the MMPI).
You cannot legitimately compare two or more people who have taken an ipsative test because the ipsative measure does not reveal absolute strengths. The person is measured in response to his or her own standard of behavior. (i.e. PHQ9, GAD7)
So when someone says, “Mr. Johnson’s anxiety is improving” she has given an ipsative description – and has nothing to do with comparing Mr. Johnson’s anxiety to another person’s. The ipsative approach yields a within-person analysis.
What is a speed test?
A speed test is a timed test that is really intended to be fairly easy – the difficulty is induced by the time limitations and not the difficulty of the tasks or questions themselves. A good timed speed test is purposely set up so that no one finishes it. A timed test is really a type of speed test, but a high percentage of the test takers completed it and it is usually more difficult and has a time limit (I.e. the NCE).
What is a power test?
A power test is designed to evaluate the level of mastery without a time limit. Like a speed test, this is ideally designed so that no one receives a perfect score.
How does an achievement test (also called an attainment test) differ from a personality test or interest inventory?
Achievement/attainment tests measure maximum levels of skill or present performance of skill. A personality test or interest inventory measures typical performance. Interest inventories are popular with career counselors because they measure what the client likes or dislike.
What is the Q-Sort?
This is a design often used to investigate personality traits. It involves a procedure in which an individual is given cards with statements and asked to place them in piles of “most like me” and “least like me”. Then the subject compiles them to create the “ideal self”. The ideal self can then be compared to his or her current self perception in order to assess self esteem.
What is a spiral test?
In a spiral test, the items get progressively more difficult.
What is a cyclical test?
A cyclical test has several items that are spiral (items get progressively more difficult) in nature. So in each section, the questions go from easy to more difficult.
What is a test battery?
In.a test battery, several measures are used to produce results that could be more accurate than those derived from a single source. This is considered a horizontal test. A horizontal test measures various factors (I.e. math and science) during the same testing procedure.
What are parallel forms of a test?
When a test has two versions or forms that are interchangeable, they are termed parallel forms or equivalent forms of the same test. From a statistical/psychometric standpoint, each form must have the same mean, standard error, and other statistical components.
What are the most critical factors in test selection?
Validity and reliability. Validity refers to whether the test measure what it says it measures. Reliability tells how consistently a test measures an attribute.
Which is more important - validity or reliability?
Experts nearly always consider validity to be the number one factor in the construction of a test. A test must measure what it purports to measure. Reliability is then the second most important concern. A scale, for example, needs to actually measure body weight to be valid. and to be reliable, it will need to give repeated readings that are the same if the same person keeps stepping on the scale.
What are the 5 types of validity?
Validity is a measure of whether a test really measures what it purports to measure – though note that a test that is valid for one population is not necessarily valid for another one. There are 5 basic types of validity:
- content validity - does the test examine or sample the behavior under scrutiny in a comprehensive way (I.e. an IQ test that only looks at memory can’t say it has examined the entire range of intelligence).
- Construct validity - this refers to a test’s ability to measure a theoretical concept like intelligence, self-esteem, artistic talent, etc. Any trait you cannot “directly’ measure or observe can be considered a construct.
- concurrent validity - this deals with how well the test compares to other instruments that are intended for the same purpose
- predictive validity - also known as empirical validity which reflects the test’s ability to predict future behavior based on established criteria. Sometimes concurrent validity and predictive validity are lumped together under the title of “criterion validity”
- Consequential validity - tries to ascertain the social implication of using tests
Can you have reliability without validity?
Yes. A test can be reliable but not valid – I.e. a scale that consistently reads 109lbs for someone who weighs 140lbs. So a test can have a high reliability coefficient but still have a low validity coefficient. Reliability places a ceiling on validity but validity doesn’t set limits on reliability.
What is face validity?
Face validity mere y tests you whether your test looks like it measures the intended test. I.e. does the Wechsler appear to be an IQ test?
What is incremental validity?
This has been used to describe a number of testing phenomena: it has been used to describe the process by which a test is refined and becomes more valid as contradictory items are dropped. Incremental validity also refers to a test’s ability to improve predictions when compared to existing measures that purport to facilitate selection in business or educational settings. When a test has incremental validity, it provides you with additional valid information that was not attainable via other procedures.
What is synthetic validity?
Synthetic validity is derived from the word “synthesized”.
This is technique for inferring the validity of a selection test or other predictor of job performance from a job analysis. It involves systematically analyzing a job into its elements, estimating the validity of the test or predictor in predicting performance on each of these elements, and then combining the validities for each element to form an estimate of the validity of the test or predictor for the job as a whole.
What is convergent validity?
This is a method used to assess a test’s construct/criterion validity by correlating test sores with an outside source (I.e. seeing if someone with a known phobia has test results that indicate the phobia).
What is discriminate validity?
This means that the test will not reflect unrelated variables. So if phobias are unrelated to IQ, there should not be a correlation between someone’s phobia and IQ tests. When a researcher is engaged in test validation, both convergent and discriminatory validity should be examined.
Is a valid test always reliable?
Yes, a valid test is always reliable. BUT a reliable test is not always valid.
What is test-retest reliability?
This is a method for testing reliability in which you give the same test to the same group of people two times and then correlate scores. This method tests for stability, which is the ability for a test score to remain stable or fluctuate if the client takes the test again. This method is generally only valid for traits like IQ that remain stable over time.
What is equivalent or alternate forms reliability?
This is when a single group of examinees takes parallel forms of a test and the researcher figures out a reliability coefficient based on the two sets of scores. (I.e. one group takes two tests that are designed to be equivalent and then scores are compared). Doing this well requires counterbalancing which means you split the group and one half gets Test A first and the other half gets Test B first – this controls for things like fatigue, practice, and motivation.
What is the split-half correlation method?
In this situation, the individual takes the entire test as a whole and then the test is divided in half. The correlation between the half scores yields.a reliability coefficient. But this only works if the researcher splits it using random numbers or even/odd numbers (vs. first and second half of the test) because it must account for practice and fatigue.
What is inter-rater/inter-observer reliability testing?
This is when several raters assess the same performance. This method is also called scorer reliability and is utilized with subjective tests like projectiles to ascertain whether the scoring criteria are such that two people who grade or assess the responses will get roughly the same score.
What does a reliability coefficient of 1.00 indicate?
This would indicate a perfect score and generally only occurs in physical measurement. An excellent psychological or counseling test would have a reliability coefficient of .90 which indicates that 90% of the score measured the attribute in question and 10% of the score is indicative of error. A personality test typically has a reliability coefficient around .70 (70% of the score is accurate and 30% is inaccurate).
Although .70 is generally acceptable for psychological attributes, admissions for jobs, schools, etc should be at least .80 and some experts will not settle for less than .90
What is the coefficient of determination?
This is when you have to determine the variance of one factor accounted for by another. To do that, you merely square the correlation (I.e. reliability coefficient). So if the correlation between two instances of a test to the same population (test-retest) is .70, you would square that to get .49 which would be the coefficient of determination.
What does IQ mean?
Intelligence quotient. The early ratio formula for the Binet IQ score was Mental Age/chronological age x 100. The score indicated how you compared to others in your age group. IQ testing has been the subject of heated debate.
What did Francis Galton conclude about IQ?
Sir Francis Galton of England has been recognized as one of the major pioneers in the study of individual differences. He believed that exceptional mental abilities were genetic and ran in families. He did research and concluded that intelligence was normally distributed like height and weight and that it was primarily genetic. He felt that intelligence was a single or so-called unitary factor.
Who is Charles Spearman?
In 1904, he postulated two factors that were thought to be applicable to any mental task: a general ability G and a specific ability S which were thought to be applicable to any mental task.