PSYC549 – Applied Measurement Techniques Flashcards

1
Q

Achievement Test

A

this is a test which is designed to measure previous learning. In contrast to an aptitude test, an achievement test assesses knowledge and skills that one already possesses. Achievement tests rely heavily on content validation procedures.

In conjunction with aptitude and intelligence testing, achievement testing is included in the category of ability tests (as opposed to personality tests)
The test can be group or individual
Achievement tests are employed for a variety of vocational, professional, and diagnostic purposes

Ex—the school psychologist administered norm-referenced achievement tests to the 6th grade students who were having difficulty passing their classes. He discovered that the students scored in the 49th percentile meaning that they were below average according to the population norm. By reviewing and analyzing the tests he was able to pinpoint areas of reading and math that the students needed the most help with.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Aptitude test

A

Aptitude refers to the potential for learning or acquiring a specific skill. In contrast to an achievement test, an aptitude test assesses an individual’s ability to acquire certain knowledge and skills. Aptitude tests rely heavily on predictive criterion validation procedures.

In conjunction with achievement and intelligence testing, aptitude testing is included in the category of ability tests (as opposed to personality tests)
Example:
A college requires applicants to complete the SAT, an aptitude test. The scores on this exam are used to predict future college performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Assessment interview

A

an assessment interview is an initial interview in which the counselor is gathering information about the patient and beginning to form a conceptualization of their case and their particular problems. Interviews may be structured, in which case a certain order of questions and format is strictly adhered to, or they may be unstructured, in which the interviewer is free to follow their own course of questioning. Structured interviews are generally more reliable and valid, but lack the freedom of unstructured interviews to pursue a topic of interest or an instinct that the interviewer has

Ex: Therapist preferred to use a mix of structured and unstructured techniques for his assessment interviews, structured questioning ensured that he would get a broader picture of the problem. Ex: What brings you in today? Assessment for suicide, Orientation assessment-person, place, and time. He then used an unstructured format which gave him the freedom to explore areas of interest. (Therapists notices the client’s toxic relationship with mother and wants to focus on this.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Clinical vs. Statistical significance

A

a result is statistically significant if it is unlikely to have occurred by chance; a result is clinically significant if it results in a noticeable and appreciable difference in the client’s everyday functioning. Statistical significance is determined by a mathematical/statistical procedure; clinical significance is determined by the amount of change the client sees in their life. Statistical significance does not necessarily equate to clinical significance.

EX—The new anti-depressant drug showed high statistical significance of p< .045. However, the psychiatrist realized that the drug company had used such rigid requirements for the sample population in order to raise the internal validity that it had compromised the external validity of the drug.

In the professional’s opinion the drug did not improve the pt’s depressive symptoms enough and produced some harsh side effects and was expensive. Therefore, the clinical significance did not warrant changing the current meds he prescribed for his depressed pts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Criterion-related validity

A

in the context of testing, this is a type of validity in which a test or measure is assessed according to the extent to which it corresponds with a particular criterion or standard.

There are two types of criterion-related validity: concurrent and predictive (function of the time-interval being assessed).

Concurrent validity is a measure of agreement between the results obtained by the given survey instrument and the results obtained for the same population by another instrument acknowledged as the “gold standard”.

Predictive validity is how well the test predicts future performance in relation to some criterion or measure. Ex—SAT score predict success in college

EX.—For example, a business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population. he then compared scores. The correlation between the two sets of scores was high and representative of the concurrent validity of the new IQ-test.

He began using the new IQ test in place of the old test while assessing applicants during a job performance test which he developed for a company that is supposed to assess the ability of individuals to perform particular job tasks. The test has already proved to have a high predictive validity and is used in recruiting new employees.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Correlation vs. Causation

A

in the context of research, correlation refers to the existence of a relationship between two or more variables, (measures the strength and directions of relationship) while causation refers to a cause-effect relationship between two or more variables. Correlation is not the same thing as causation – however, a correlation between two variables is necessary before it can be established that one causes a change in the other. Only experimental studies can establish causation- that one variable causes a change in another variable.

EX—in a study There were correlations found that children who play violent video games are more likely to engage in physical fights and delinquent behavior–this was especially true for children who played for longer amounts of time.

However correlation does not imply causation, as these results were found only in a small percentage of children who already exhibited aggressive traits and a high stress level.

They found that the traits of aggression and stress were predictive of delinquent behavior and physical fights, and that playing violent video games does not cause a child to be violent. Researchers also found that parent involvement and parent/peer support was an important factor in the study.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Criterion-referenced scoring/tests

A

in the context of testing, these are tests in which the test-taker is asked to demonstrate a specific skill or ability. In contrast to norm-referenced tests,the results are compared to a well-defined mastery criterion and are not compared to norms (i.e. to other individuals who’ve taken the test).

Most tests and quizzes that are written by school teachers can be considered criterion-referenced tests. The objective is simply to see whether the student has learned/ mastered the material.

EX—Driving tests are criterion-referenced tests, because their goal is to see whether the test taker is skilled enough to be granted a driver’s license. A client was anxious about taking this test and believed she would never be able to drive and be independent..

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cross-validation

A

Cross-validation is a validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is a process by which a method that works for one sample of a population is checked for validity by applying the method to another sample from the same population.

EX—The new test to measure anxiety in children was administered to two new samples of children to check the validity of the initial validation.
This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Normal Curve

A

the normal curve is a bell-shaped graph that represents a hypothetical frequency distribution in which the frequency of scores is greatest near the mean and progressively decreases toward the extremes.
The mean, median, and mode of a normal curve all have the same value. This falls at the center of this symmetric distribution and splits it in half, such that 50% of the observations are above and 50% of the observations are below it. Many physical or psychological characteristics, such as height, weight, and scores on many standardized intelligence tests fall on a normal curve.

Example:
EX—the child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165 which placed him in the 99th percentile, more than 3 SDs above the mean on the normal curve, of children according to the normal distribution of IQ scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Projective tests

A
  • originated in psychoanalytic psychology, which argues that humans have conscious and unconscious attitudes and motivations that are beyond or hidden from conscious awareness.

Testing is based on the projective hypothesis which holds that an individual puts structure on an ambiguous situation in a way that is consistent with their own conscious & unconscious needs.

  • Projective tests, in contrast to objective tests, are designed to let a person respond to ambiguous stimuli which may reveal hidden emotions and internal conflicts.
    The responses are content analyzed for meaning.
    Most projective tests do not withstand a vigorous examination of their psychometric properties, and thus can be controversial.

Example: Rorschach Ink Blot, Draw a person,
Another popular projective test is the Thematic Apperception Test (TAT) in which an individual views ambiguous scenes of people, and is asked to describe various aspects of the scene; for example, the subject may be asked to describe what led up to this scene, the emotions of the characters, and what might happen afterwards. The examiner then evaluates these descriptions, attempting to discover the conflicts, motivations and attitudes of the respondent. In the answers, the respondent “projects” their unconscious attitudes and motivations into the picture, which is why these are referred to as “projective” tests.Client sees images as threatening and frightening, the tester might infer that the subject may suffer from paranoia.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Objective tests

A

these are structured tests in which each item has unambiguous stimuli and answers are scored quantitatively; these tests have clearly stated questions and answers. Objective tests do not have a subjective element and ,therefore, are not influenced by rater variables (such as bias.)

Objective tests are often contrasted with projective tests, which are sensitive to rater variables.

Objective tests tend to have more validity than projective tests; however, they are still subject to the willingness of the test-taker to be open about his/her personality and as such can sometimes be badly representative of their true personality.

EX—The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the
Rohrschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easy to score.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cross-validation

A

Cross-validation is a validation technique for assessing how the results of a statistical analysis will generalize to an independent data set. It is mainly used in settings where the goal is prediction, and one wants to estimate how accurately a predictive model will perform in practice. It is a process by which a method that works for one sample of a population is checked for validity by applying the method to another sample from the same population.

EX—The new test to measure anxiety in children was administered to two new samples of children to check the validity of the initial validation.
This cross-validation was necessary because chance and other factors such as cultural differences and SES may have influenced the original validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reliability (types of)

A

in research design, reliability is the extent to which a test or measure yields consistent results, or the extent to which a measurement is free of measurement error. It is the first characteristic of psychometric soundness.

Types:
1.Test/retest - used to evaluate the error associated with a test taken by the same individual at two different times.

  1. Parallel-forms reliability, two separate but equivalent forms of a test are developed and the scores between them are correlated.
  2. Split-half (also known as internal-consistency) reliability, one test is split in half and then the two halves are correlated with each other.
  3. Inter-rater reliability is the correlation between different raters’ scorings; evaluates observer differences

Example:
When developing a new version of an IQ test, developers administered the test at several different times to evaluate the instrument’s test-retest reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard deviation

A

this is the average amount that scores differ from the mean score of a distribution. The standard deviation is a highly useful measure of the variability of a set of scores. The standard deviation is found by taking the square root of the variance, which is the average squared deviation around the mean.

A psychologist administered a criterion referenced test to assess student knowledge of sexual harassment. (0-100) . In the first group, the mean score was 70 and the standard deviation was 4. This means that overall the subjects were similar (indicated by the mean of 70) and also that most of them tended to score pretty close to 70 (indicated by the standard deviation of 4).

In the second group, the mean score was also 70 but the standard deviation was 20. Although one still might say that overall the students did OK (indicated by the mean of 70), the high standard deviation indicates that there was a lot of variability. Some students must have scored quite low and others must have scored quite high. So even though the means were the same, the standard deviations paint very different pictures of student performance on these exams.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard error of measurement

A
  • -The standard error of measurement (SEM) estimates how repeated measures of a person on the same instrument tend to be distributed around his or her “true” score. The true score is always unknown because no measure can be constructed that provides a perfect reflection of the true score.
  • -The SEM can be used to estimate the interval in which an individual’s true score would be expected to fall. It also reminds the evaluator that the test score is just an estimate and is not exact.
  • – SEM has an inverse relationship with the reliability coefficient; the higher the reliability of a test the lower the SEM.

Using the 68% confidence level if a child receives an IQ
test score of 100 with a SEM of three (3) points, there is a 68% probability that the child’s true score falls within the
range of 97 to 103. It would not be appropriate to select the highest or lowest numbers within that range as the best estimate of the child’s true score.

In fact, the best estimate of any child’s true score on a given test is the obtained score–given appropriate test administration procedures are followed, there is good effort and motivation
on the part of the examinee, and there are no conditions within the testing situation that would significantly influence test scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Test bias

A

the tendency of a test to systematically over- or underestimate the true scores of individuals who take the test or to those who are members of a particular groups (e.g., ethnic minorities, sexes, etc.)
-It is an important source of error to be aware of and to recognize.

EX—returning veterans were being tested for mood disorders. Administers realized that the measure being used was under-estimating the true scores of the Hispanic population of soldiers. They hypothesized that a factor contributing to differences in scores was the collectivistic culture of Hispanics. This test bias was causing the psychologist to miss problems of depression and PTSD in the male Hispanic soldiers due to their misinterpretation of the questions and their culture of not sharing their fears and feelings

15
Q

Construct

A

Refers to any complex psychological construct. Examples include: love, anger, happiness, introversion, anxiety. A construct is not a concrete material found in the visible world. We know what love and anger look like, but it cannot be measure in inches and lbs. Constructs can be more subjective than objective, and they often arise from theories or one’s ideas and can have multiple definitions. Psychological tests have been created with the goal of measuring constructs in a more scientific and objective manner.

Ex: Psychologist administers a test designed to measure the construct of anxiety. Items deal with fidgeting, worrying, and ability to concentrate– all ideas which deal with the construct of anxiety.

16
Q

Norm-referenced Tests

A

In contrast to criterion-referenced tests, these are tests in which each test-taker’s results are compared to norms.
Norms are not standards; In order to establish norms, tests are administered to a large population that is selected carefully in order to represent the proper population that is intended to be served.
Norms should be derived from a sample population that is current, relevant and representative of the population/individual being evaluated.

EX:-Child has behavior problems in class, –excessive fidgeting and often irritated. The school psychologist administered a standardized intelligence test (a norm-referenced test) to determine the student’s abilities in comparison to other children his age. He placed in the 99th percentile, more than 3 SDs above the mean on the normal curve, of children according to the normal distribution of IQ scores. It was discovered that he was acting out bc the material was too easy.

17
Q

Standardization sample

A

this is a cross-section of a population that will be taking a given test, who are administered the test under standard conditions. Population sample should be representative, relevant and current. Their results are used to establish a normal score range on norm-referenced tests.
Without such norms, the meaning of scores would be difficult if not impossible to evaluate.

Example:
Kelly completed an IQ test, Dr. Tom compared her scores with those of a standardization sample in order to determine whether her score was above or below average.

19
Q

Variance

A

in statistics, this is a measure of variability and is defined as the average squared deviation around the mean.-Measures the spread of scores w/;in a distribution allowing the info to be interpreted based on central tendency/the mean The variance is squared because the sum of the deviations around the mean would always equal zero. While variance is a widely referenced and useful measure for statistical analysis, but is not as useful as a descriptive statistic. (It needed to analyze ANOVA)

Dr. Tom examined the results from 3 sets of data in a study.
At first glance they seemed similar because they had the same mean, however, the variance was different for each set with no variance in one, small in 2nd , and large in 3rd. This led the Dr. to study more closely the differences among each experimental group.

21
Q

Validity (Types of)

A

in research design, this is the measure of how well a particular measure fulfills the function for which it is being used.

You can have reliability without validity but not validity without reliability.

There are three primary types of validity: content, criterion-related, and construct.

Content validity refers to how well a measure encompasses the full domain of what it is trying to measure.

Criterion-related validity refers to extent to which a test corresponds with a particular criterion against which it is compared. There are types of criterion-related validity: predictive and concurrent.

Construct validity is the degree to which the test actually measures the construct or trait that it claims it measures. There are two forms of construct validity: divergent and convergent.

Ex:For example, a business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population. he then compared scores. The correlation between the two sets of scores was high and representative of the concurrent validity of the new IQ-test.

He began using the new IQ test in place of the old test while assessing applicants during a job performance test which he developed for a company that is supposed to assess the ability of individuals to perform particular job tasks. The test has already proved to have a high predictive validity and is used in recruiting new employees.