Psych testing Flashcards

Question 1

Q

A. What are the two parts of test content evidence of validity.

B. For each of the two parts of test content evidence for validity, give an example of how a test might fail to meet these forms of evidence.

C. How would this test content evidence for validity be identified (i.e., what would a psychologist do to evaluate the evidence of test content validity?)

Answer

A

A. Relevance and Representativeness

B. Test items are not relevant if their content is not related to the construct intended to be measured e.g. if an employment test uses vocabulary that is beyond the English level needed on the job, certain groups would be at a disadvantage as the test assesses their understanding of words and not their capabilities.
A test would not be representative of the content if it does not cover critical elements of the construct or is biased towards one aspect e.g. if an English test only assesses spelling and not reading and writing, then it would not be an accurate representation of one’s English ability.

C. Get expert judges to assess the validity of test content.

Question 2

Q

Name at least four sources of evidence for test validity

Give a brief description of each of these.

Answer

A

Content
Item response processes - the cognitive process that is being tested should be captured by the items e.g. eye tracking, thinking aloud
Internal structure - items should be related to one another theoretically and empirically
Relationship to other variables - the construct should converge onto similar constructs and be different from other unrelated constructs (discriminant). The items should predict a criterion e.g. intelligence test scores are correlated with grades.
Consequences of testing - the tests should be used for the intended purpose and not have negative consequences such as discrimination e.g. tests leading to league tables and students avoiding the lower scoring schools.

Question 3

Q

Consider a test measuring the personality trait of suggestibility, assessed with self-report items such as:
When I hear an unfamiliar statement, I think it is true: 1) strongly disagree; 2) disagree; 3) agree; 4) strongly agree

Name the five sources of validity evidence that could be obtained for this test.

Imagine you are a test developer who needs to design studies to collect validity evidence for this test. Describe a study or series of studies you could run to collect AT LEAST THREE forms of validity evidence.

Answer

A

Content, item response process, internal structure, relationship to other variables, consequences of testing

Suggestibility: how readily people accept and act on the suggestions of others - related to self-esteem, assertiveness

Content validity: judge whether the items are relevant to suggestibility and whether they represent the construct.
Internal structure: do the items relate to each other
Relationship to other variables: the test should be related to similar variables but not to different ones.
Consequences of testing: the test should only assess the extent of a participant’s suggestibility and not determine whether highly suggestible people lack confidence in all areas of the life.

Question 4

Q

A. Briefly describe computerized adaptive testing (CAT), outlining how it differs from non-adaptive testing.

B. What are the advantages of CAT?

C. What are the disadvantages of CAT?

D. Critically evaluate the CAT for a hypothetical application: nation-wide selection of students for selective high schools.

Answer

A

In adaptive testing, items are given based on the test-taker’s previous results. This ensures that their level of ability is matched by the items so they do not get all the answers correct or incorrect. Computerised adaptive testing uses an algorithm to adjust the difficult of the items for each individual.

Advantages: It reduces the cost of booking areas for people to take tests, and eliminates test proctors as the test is taken on the computer. This method avoids ceiling effects (when the items are too easy and people score high) and floor effects (when the items are too difficult and people score low). It also reduces cheating as they get different items. It reduces fatigue and boredom.

Disadvantages: It requires a large bank of test items and the analysis of all items in order to suit them to the test-taker. This is time-consuming and expensive. As the test is completed on a computer, technological issues can occur.

CAT for nation-wide selection of students for selective high schools: As there are many individuals taking tests, it would be more cost-effective to have students take them on a computer. However, since this is high stakes testing and determines whether a student proceeds to go to a selective school, there may be issues with computers and availability of technology therefore it is not appropriate for the purpose of this test.

Question 5

Q

A test-developer wants to collect data to test whether her rating scale of maturity (designed for sixth-graders) is reliable and valid. Critically evaluate EACH of the following scenarios for collecting such evidence:

A. She tests 50 students from the nearest primary school to her university (the Tiliopoulos School for the Gifted Child), getting them to complete the test 2 months apart, and also gets peer-ratings of maturity.

B. She tests 100 students drawn from 5 schools: 1 Catholic girls school, 1 Catholic boys school, 2 public mixed-gender schools from high and low SES areas, and 1 private mixed-gender school). They complete the assessment 2 years apart so she can monitor the growth of maturity. She obtains teacher and peer ratings of maturity

Answer

A

A. Small sample size - less reliable
Nearest primary school is for gifted children - test would produce ceiling effects, more mature -> not a valid test for this sample
Low test-retest reliability - 2 months, so they might remember the items
Low inter-rater reliability - could be biased as peers are rating them. It would be improved if teachers rated them as well.

B. Larger sample size - more reliable
Sampled from different types of schools - more reliable
High test-retest reliability
High inter-rater reliability - teachers and peer ratings can be correlated

Question 6

Q

What are the six domains of occupational interests in Holland’s interest inventory?

Provide a brief definition of each of these domains.
What is the framework for organizing these (i.e., how do the six interests relate to each other)?

Answer

A

Realistic - correlated .06 with social - practical jobs that are hands-on e.g. chef, plumber, firefighter, florist, driver, surgeon (close to C, I)
Investigative - scientific pursuits, thinker e.g. professor, pharmicist, psychologist, dietician (close to R, A)
Artistic - creative e.g. artist, designer, writer, musician (close to I, S)
Social - helping others e.g. cousellor, teacher, customer service (close to A, E)
Enterprising - business, leading, persuaders e.g. PR, marketer, manager, entrepreneur, HR (close to S, C)
Conventional - organising e.g. accountant, actuary, maths teacher (close to E, R)

Question 7

Q

Name and briefly describe FIVE major uses of psychological tests. An example use would be “certification for employment: when tests are used to provide a credential required to practice a particular occupation” (you will not receive marks for re-stating this example use).

For at least three of these major uses, name one or more example tests.

Answer

A

Classification: education - OC, Selective, GAMSAT, UMAT
Diagnosis: Depression, Anxiety and Stress Scale, McGill Pain Questionnaire
Coaching: Career Assessment Inventory
Forensic: to assess the individual’s ability to stand trial, for compensation purposes
Research: personality research uses NEO-PI-R, Satisfaction with life scale

Question 8

Q

Based on research and theory in response distortion, what recommendations would you make to psychologists using psychometric tests for high-stakes applications?

Answer

A

Make the items not transparent so they are harder to fake

Question 9

Q

A. Name and briefly describe each of the three research methods used to study faking on personality assessments

B. Critically evaluate each of these methods

Answer

A

Group Comparison - compare test-takers to another group to measure how much of the responses are likely to be fake.
Instructed Faking - compare the answers when told to answer honestly, to answers when told to maximise score. However, people might not be answering honestly due to self-deceptive bias and they are not completely aware of their own personality.
Incentive Manipulation - compare scores when there is an incentive to do well (e.g. $) and when there is not. If their answers are different, then there is a higher chance they are faking when there is an incentive, while the scores when there is no incentive are more accurate.

Question 10

Q

Test-takers are known to fake their responses on personality assessments.

A. Name and briefly describe the biases cause response distortion in psychological testing.
B. Give an example item for each of these.
C. What is the likely effect size for changes to big five personality traits due to response distortion?

Answer

A

Moralistic bias - people want to appear good and deny negative aspects e.g. never lying, stealing, losing temper
Egotistic bias - people value independence and try to appear brave, competent, talented
e.g. I can achieve xyz (when they are only somewhat competent)

Instructed faking causes large changes (0.93) on neuroticism - people want to fake lower N. There are large effects for O, C and moderate effects on E, A.

Group comparison shows moderate changes on C and N.

Question 11

Q

A. Name the two ways that mental speed is measured.
B. Describe the procedure that is used in each case.
C. How do each of these paradigms for measuring mental speed relate to intelligence?

Answer

A

Hick’s paradigm: Tests the time it takes for someone to react to a stimulus (simple RT) or many stimuli (choice RT). They start with their finger on the home button then when a light comes up, they must press a response button. Their RT is the time it takes for them to release the home button. Someone with a faster RT when there are multiple stimuli to choose from has greater mental speed. This is associated with higher IQ as the task is more complex with a greater number of alternatives to process.
Inspection time: The time it takes to judge a stimulus accurately based on its features after perception. E.g. line length test - two lines of different lengths are covered with a backwards mask then shown briefly. A lower IT is associated with higher IQ.

In both procedures, people can use strategies to increase their RT and IT, therefore they may not be accurate representations of intelligence.

Question 12

Q

What is a “lagged panel model”?

Describe the lagged panel models that were used to test the likely causal direction underlying the inspection time/intelligence association?

What is the causal direction (and is it the same for Gf and Gc)?

Answer

A

A lagged panel model illustrates the relationship between variables over time.

The correlation between inspection time at T1 and intelligence at T2 was greater than the corr between intelligence at T1 and inspection time at T2. Therefore, early IT caused later intelligence.

The correlation between auditory inspection time at T1 and scores on the Mill Hill Vocabulary Scale at T2 was larger than the corr between vocab scores at T1 and auditory IT at T2. Therefore, early auditory IT caused later intelligence.

Direction is the same for Gf and Gc.

People with a faster mental speed are more intelligent.

Question 13

Q

There are several tests that may be used as alternatives to the MSCEIT (Mayer-Salovey-Caruso Emotional Intelligence Test)

Name and describe an assessment of emotion recognition
Critically evaluate this assessment, outlining both the positives and negatives

Answer

A

Reading the Mind in the Eyes Test tests recognition of emotion just by looking at eyes. The emotion stays the same and does not represent real life emotion recognition such as talking to people (external validity).

Question 14

Q

There are several tests that may be used as alternatives to the MSCEIT (Mayer-Salovey-Caruso Emotional Intelligence Test)

Name and describe an assessment of strategic EI
Critically evaluate this assessment, outlining both the positives and negatives

Answer

A

STEM/STEU, MEIS - stream 1
MEIS (multifactor EI scale) is an ability-based EI test. It requires the individual to make a judgement of their own emotion understanding, management.

Strengths: the blends and changes tests are short, and there are mostly clear answers. As it is based on a judgement , it is harder to fake better performance on such items.

Weaknesses: the correct answer can be subjective - e.g. the emotion that results in blending a few simple emotions. If someone does not understand an emotion, it is difficult to choose the right answer - verbal intelligence might be measured.

Question 15

Q

Training programs targeting emotional intelligence are stronger for some types of EI than others. Which types of EI show the strongest effects of training?
How large are the improvements in EI?

Answer

A

Effects of training are strongest for strategic EI (EU, EM) which require analysis of emotion. The effect size was .83-1.3 which was an increase in 13-20 IQ points for EI. EU and EM improved by 5-6.5 IQ points.

Question 16

Q

Describe two criteria that tests of emotional intelligence must meet to be considered an intelligence.
Based on the evidence from MacCann et al (2014), does the MSCEIT meet these criteria (briefly describe the results of this study in your answer)?

Answer

Study These Flashcards

A

EI branches must relate to each other and the primary mental abilities underlying EI must not relate stronger to any branch than the EI construct. In the study, EU, EP and EM formed one construct. Strategic EI was more related possibly as there were more tests of these abilities.
EI branches must relate to intelligence, showing positive manifold to intelligence measures (g). In the study, the abilities all correlated positively with measures of g. EU was most strongly related to g.
EI must be different to known measures such as Gf and Gc, and not replicate an existing construct. In the study, EI was not acculturated knowledge about emotions, therefore distinguishing it from Gc. Correlations between EP, EU, EM were higher than correlations between those abilities and other intelligence. Women scored higher on MSCEIT while there were no sex differences on Gc.

Question 17

Q

A. What are the two major assessment approaches for measuring job performance?

B. Give at least one example of each.

C. What are the strengths and weaknesses of each approach?

Answer

Study These Flashcards

A

Subjective measures: ratings by peers, supervisors, clients, self - behavioural observational scale (almost never-almost always), checklists of descriptions that describe the worker
Subjective measures can be subject to errors such as halo effect, horns effect, leniency/severity errors

Objective measures: production counts, biodata- absenteeism
Objective measures are not always possible e.g. production counts for a job that doesn’t have specific items to produce, or behaviours that are difficult to individually track. However, absenteeism is a good indicator of a worker’s job performance.

Question 18

Q

State the three types of factors that can increase job satisfaction, including the sub-factors of each of these

For each of the three factors, describe at least one way that job satisfaction could be increased for a call centre worker.

Do you think that increasing call centre worker job satisfaction would increase job performance? Why or why not?

Answer

Study These Flashcards

A

Work - rotation of job, responsibility, extra tasks
For a call centre worker, their job would not change very much however they can be given more responsibility, be in charge of a group.
Pay - merit-based, fair, skill-based, share the profit
For a call centre worker, they could receive a bonus for exceeding the company’s targets, acting as an incentive to achieve more.
Hours - flexible, compressed hours
For a call centre worker, they may have family so having flexible hours and working when the children are at school could enhance their satisfaction.

Job performance can increase if they are satisfied with their work, pay and hours but it depends on the person. They might be more productive if they feel that the company has their interests at heart. If they have a good relationship with the company, then that could motivate them to perform better.

Question 19

Q

Describe the size and direction of relationships between big five personality domains and job proficiency.

Results from meta-analyses suggest two moderators of these effects. State what these moderators are, and describe how they affect these relationships.

Answer

Study These Flashcards

A

Conscientiousness is the only trait that predicts job proficiency (small effect around .2).

Moderators: task (required) or contextual performance (not mandatory, goes above and beyond) - achievement striving and dutifulness are strong predictors of performance. Personality impacts contextual performance.

Question 20

Q

Name and describe at least three applications for personality and intelligence testing in the workplace

Critically evaluate the use of personality assessments for job selection

Answer

Study These Flashcards

A

Selection of the best qualified individuals (norm-referenced testing) and selecting individuals who have more than the minimum level of knowledge needed for the role (criterion-referenced testing).

Military testing: selecting people into specific roles

Training/insight: assisting employees to leverage their strengths and work on their weaknesses

Psych testing Flashcards

(20 cards)