PSYC549 – Applied Measurement Techniques Flashcards
Achievement Test
this is a test which is designed to measure previous learning. In contrast to an aptitude test, an achievement test assesses knowledge and skills that one already possesses. Achievement tests rely heavily on content validation procedures.
In conjunction with aptitude and intelligence testing, achievement testing is included in the category of ability tests (as opposed to personality tests)
The test can be group or individual
Achievement tests are employed for a variety of vocational, professional, and diagnostic purposes
Ex—the school psychologist administered norm-referenced achievement tests to the 6th grade students who were having difficulty passing their classes. He discovered that the students scored in the 49th percentile meaning that they were below average according to the population norm. By reviewing and analyzing the tests he was able to pinpoint areas of reading and math that the students needed the most help with.
Aptitude test
Aptitude refers to the potential for learning or acquiring a specific skill. In contrast to an achievement test, an aptitude test assesses an individual’s ability to acquire certain knowledge and skills. Aptitude tests rely heavily on predictive criterion validation procedures.
In conjunction with achievement and intelligence testing, aptitude testing is included in the category of ability tests (as opposed to personality tests)
Example:
A college requires applicants to complete the SAT, an aptitude test. The scores on this exam are used to predict future college performance.
Assessment interview
an assessment interview is an initial interview in which the counselor is gathering information about the patient and beginning to form a conceptualization of their case and their particular problems. Interviews may be structured, in which case a certain order of questions and format is strictly adhered to, or they may be unstructured, in which the interviewer is free to follow their own course of questioning. Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or an instinct that the interviewer has
Ex: Therapist preferred to use a mix of structured and unstructured techniques for his assessment interviews, structured questioning ensured that he would get a broader picture of the problem. Ex: What brings you in today? Assessment for suicide, Orientation assessment-person, place, and time. He then used an unstructured format which gave him the freedom to explore areas of interest. (Therapists notices the client’s toxic relationship with mother and wants to focus on this.)
Clinical vs. Statistical significance
a result is statistically significant if it is unlikely to have occurred by chance; a result is clinically significant if it results in a noticeable and appreciable difference in the client’s everyday functioning. Statistical significance is determined by a mathematical/statistical procedure; clinical significance is determined by the amount of change the client sees in their life. Statistical significance does not necessarily equate to clinical significance.
EX—The new anti-depressant drug showed high statistical significance of p< .045. However, the psychiatrist realized that the drug company had used such rigid requirements for the sample population in order to raise the internal validity that it had compromised the external validity of the drug.
In the professional’s opinion the drug did not improve the pt’s depressive symptoms enough and produced some harsh side effects and was expensive. Therefore, the clinical significance did not warrant changing the current meds he prescribed for his depressed pts.
Criterion-related validity
in the context of testing, this is a type of validity in which a test or measure is assessed according to the extent to which it corresponds with a particular criterion or standard.
There are two types of criterion-related validity: concurrent and predictive (function of the time-interval being assessed).
Concurrent validity is a measure of agreement between the results obtained by the given survey instrument and the results obtained for the same population by another instrument acknowledged as the “gold standard”.
Predictive validity is how well the test predicts future performance in relation to some criterion or measure. Ex—SAT score predict success in college
EX.—For example, a business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population. he then compared scores. The correlation between the two sets of scores was high and representative of the concurrent validity of the new IQ-test.
He began using the new IQ test in place of the old test while assessing applicants during a job performance test which he developed for a company that is supposed to assess the ability of individuals to perform particular job tasks. The test has already proved to have a high predictive validity and is used in recruiting new employees.
Correlation vs. Causation
in the context of research, correlation refers to the existence of a relationship between two or more variables, (measures the strength and directions of relationship) while causation refers to a cause-effect relationship between two or more variables. Correlation is not the same thing as causation – however, a correlation between two variables is necessary before it can be established that one causes a change in the other. Only experimental studies can establish causation- that one variable causes a change in another variable.
EX—in a study There were correlations found that children who play violent video games are more likely to engage in physical fights and delinquent behavior–this was especially true for children who played for longer amounts of time.
However correlation does not imply causation, as these results were found only in a small percentage of children who already exhibited aggressive traits and a high stress level.
They found that the traits of aggression and stress were predictive of delinquent behavior and physical fights, and that playing violent video games does not cause a child to be violent. Researchers also found that parent involvement and parent/peer support was an important factor in the study.
Criterion-referenced scoring/tests
in the context of testing, these are tests in which the test-taker is asked to demonstrate a specific skill or ability. In contrast to norm-referenced tests,the results are compared to a well-defined mastery criterion and are not compared to norms (i.e. to other individuals who’ve taken the test).
Most tests and quizzes that are written by school teachers can be considered criterion-referenced tests. The objective is simply to see whether the student has learned/ mastered the material.
EX—Driving tests are criterion-referenced tests, because their goal is to see whether the test taker is skilled enough to be granted a driver’s license. A client was anxious about taking this test and believed she would never be able to drive and be independent..
Cross-validation
Cross-validation is a validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is a process by which a method that works for one sample of a population is checked for validity by applying the method to another sample from the same population.
EX—The new test to measure anxiety in children was administered to two new samples of children to check the validity of the initial validation.
This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.
Normal Curve
the normal curve is a bell-shaped graph that represents a hypothetical frequency distribution in which the frequency of scores is greatest near the mean and progressively decreases toward the extremes.
The mean, median, and mode of a normal curve all have the same value. This falls at the center of this symmetric distribution and splits it in half, such that 50% of the observations are above and 50% of the observations are below it. Many physical or psychological characteristics, such as height, weight, and scores on many standardized intelligence tests fall on a normal curve.
Example:
EX—the child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165 which placed him in the 99th percentile, more than 3 SDs above the mean on the normal curve, of children according to the normal distribution of IQ scores.
Projective tests
- originated in psychoanalytic psychology, which argues that humans have conscious and unconscious attitudes and motivations that are beyond or hidden from conscious awareness.
Testing is based on the projective hypothesis which holds that an individual puts structure on an ambiguous situation in a way that is consistent with their own conscious & unconscious needs.
- Projective tests, in contrast to objective tests, are designed to let a person respond to ambiguous stimuli which may reveal hidden emotions and internal conflicts.
The responses are content analyzed for meaning.
Most projective tests do not withstand a vigorous examination of their psychometric properties, and thus can be controversial.
Example: Rorschach Ink Blot, Draw a person,
Another popular projective test is the Thematic Apperception Test (TAT) in which an individual views ambiguous scenes of people, and is asked to describe various aspects of the scene; for example, the subject may be asked to describe what led up to this scene, the emotions of the characters, and what might happen afterwards. The examiner then evaluates these descriptions, attempting to discover the conflicts, motivations and attitudes of the respondent. In the answers, the respondent “projects” their unconscious attitudes and motivations into the picture, which is why these are referred to as “projective” tests.Client sees images as threatening and frightening, the tester might infer that the subject may suffer from paranoia.
Objective tests
these are structured tests in which each item has unambiguous stimuli and answers are scored quantitatively; these tests have clearly stated questions and answers. Objective tests do not have a subjective element and ,therefore, are not influenced by rater variables (such as bias.)
Objective tests are often contrasted with projective tests, which are sensitive to rater variables.
Objective tests tend to have more validity than projective tests; however, they are still subject to the willingness of the test-taker to be open about his/her personality and as such can sometimes be badly representative of their true personality.
EX—The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the
Rohrschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easy to score.
Cross-validation
Cross-validation is a validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is a process by which a method that works for one sample of a population is checked for validity by applying the method to another sample from the same population.
EX—The new test to measure anxiety in children was administered to two new samples of children to check the validity of the initial validation.
This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.
Reliability (types of)
in research design, reliability is the extent to which a test or measure yields consistent results, or the extent to which a measurement is free of measurement error. It is the first characteristic of psychometric soundness.
Types:
1.Test/retest - used to evaluate the error associated with a test taken by the same individual at two different times.
- Parallel-forms reliability, two separate but equivalent forms of a test are developed and the scores between them are correlated.
- Split-half (also known as internal-consistency) reliability, one test is split in half and then the two halves are correlated with each other.
- Inter-rater reliability is the correlation between different raters’ scorings; evaluates observer differences
Example:
When developing a new version of an IQ test, developers administered the test at several different times to evaluate the instrument’s test-retest reliability.
Standard deviation
this is the average amount that scores differ from the mean score of a distribution. The standard deviation is a highly useful measure of the variability of a set of scores. The standard deviation is found by taking the square root of the variance, which is the average squared deviation around the mean.
A psychologist administered a criterion referenced test to assess student knowledge of sexual harassment. (0-100) . In the first group, the mean score was 70 and the standard deviation was 4. This means that overall the subjects were similar (indicated by the mean of 70) and also that most of them tended to score pretty close to 70 (indicated by the standard deviation of 4).
In the second group, the mean score was also 70 but the standard deviation was 20. Although one still might say that overall the students did OK (indicated by the mean of 70), the high standard deviation indicates that there was a lot of variability. Some students must have scored quite low and others must have scored quite high. So even though the means were the same, the standard deviations paint very different pictures of student performance on these exams.
Standard error of measurement
- -The standard error of measurement (SEM) estimates how repeated measures of a person on the same instrument tend to be distributed around his or her “true” score. The true score is always unknown because no measure can be constructed that provides a perfect reflection of the true score.
- -The SEM can be used to estimate the interval in which an individual’s true score would be expected to fall. It also reminds the evaluator that the test score is just an estimate and is not exact.
- – SEM has an inverse relationship with the reliability coefficient; the higher the reliability of a test the lower the SEM.
Using the 68% confidence level if a child receives an IQ
test score of 100 with a SEM of three (3) points, there is a 68% probability that the child’s true score falls within the
range of 97 to 103. It would not be appropriate to select the highest or lowest numbers within that range as the best estimate of the child’s true score.
In fact, the best estimate of any child’s true score on a given test is the obtained score–given appropriate test administration procedures are followed, there is good effort and motivation
on the part of the examinee, and there are no conditions within the testing situation that would significantly influence test scores.