Psychometrics Final Flashcards

1
Q

What is reliability a measure of?

A
  • The consistency and stability of measurement over time
  • time is not always a factor
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the basic idea behind true score theory?

A
  • Observed scored = true score + random error
    • Postulated idea on how reality works
    • Foundation of assessment
    • Must hold true for measurement to work
    • We never observe the construct itself
    • We are assuming observable behavior relates to underlying psychological constructs
    • The observed score is a constant, so the greater the error we will see a reduction in the trueness of that observed score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two different types of error discussed in class and how do they differ

A

Random error

  • Will impact everyone but it will not be in the same direction.
  • Does not affect the mean, but will murky the water (because they are all in different direction)
  • Increases the variability around the average
  • “Noise”
  • Will always occur in measurement
  • Will effect everything in a sample individually
  • Will impact everyone differently
  • You will always have random error

Systematic error

  • Will impact everybody in a similar or singular way. They will impact in the same direction but at different levels of impact.
  • Changes the average
  • Called “Bias”
  • There is not always systematic bias because it can be controlled for
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which form of error causes a change in the mean (average) score observed?

A
  • Systematic error causes a change in the mean score
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why is one form of error called bias and the other called noise?

A
  • Systematic is called bias
    • Affects the average
  • Random error is called noise
    • This makes the picture murky, harder to pick up true ability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some ways to reduce measurement error?

A

Pilots

Thorough

Double

Checking

Saves

Multiple-Lives

Pilot testing

  • Way to have some foresight that we won’t normally have
  • We can often times identify potential systematic errors
  • Addressses systematic error and random error

Thorough Training

  • Especially when you have multiple individuals collecting data or multiple raters
  • Data is ambiguous, subjective, or open to interpretation
  • Addressses systematic error and random error

Double Check the Data

  • Not just plausibility, but possibility as well.
  • Possibility is easy to check in SPSS
  • Plausibility is more difficult and arguably more important.
    • Ex: Incredible effects that were not quite expected when she did not reverse coding scores
    • You need to really know the construct and the expectations but do not let numbers dictate your thinking, you need to use your knowledge and judgment as well
  • Addresses systematic error

Statistical Correction

  • Can be simple (mean) to complex (statistical adjustment)
  • Adjust mean
    • Ex: mean score adjustment when afternoon class scored lower on exam because of construction
  • Statistical modeling of error
  • Addresses systematic error

Multiple Measures

  • Administer multiple measures of the same construct, you can triangulate between them to look for systematic error (method bias).
    • Ex: assessing intelligence in children. You can ask parents, teachers, and a formal assessment. If you have two corresponding and one that is significantly different than the others, then the one that is different could be a systematic bias.
    • Addressses systematic error and random error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Know the differences between the following:

  • Inter-rater reliability
  • Test-retest reliability
  • Alternate-form reliability
  • Internal consistency reliability
A

Inter-rater reliability

  • 2 independent raters
  • 1 time
  • 1 assessment
  • Assessess the raters' amount of agreement
  • Categorical
    • Percent of shared agreement between raters
  • Continuous
    • Correlation between the observers

Test-retest reliability

  • 1 rater
  • 2 points in time
  • 1 assessment
  • Used to assess consistency of a measure one time to another
  • Timing is critical

Alternate-form reliability / Parallel Forms

  • 1 rater
  • 1 point in time
  • 2 assessment
  • Used to assess consistency of same knowledge base across 2 assessments

Internal consistency reliability

  • 1 rater
  • 1 point in time
  • 1 assessment
  • Looks at consistency across items within the measure at the same time
  • 3 types
    • Average inter-item correlation
    • Split-half reliability
    • Cronbach's alpha
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Inter-rater Reliability

A

Inter-rater reliability

  • 2 independent raters
  • 1 time
  • 1 assessment
  • Assessess the raters' amount of agreement
  • Categorical
  • Percent of shared agreement between raters
  • Continuous
  • Correlation between the observers
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Test-retest Reliability

A
  • 1 rater
  • 2 points in time
  • 1 assessment
  • Used to assess consistency of a measure one time to another
  • Timing is critical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Alternate Form Reliability

A
  • 1 rater
  • 1 point in time
  • 2 assessment
  • Used to assess consistency of same knowledge base across 2 assessments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Internal consistency reliability

A

nternal consistency reliability

1 rater

  • 1 point in time
  • 1 assessment
  • Looks at consistency across items within the measure at the same time
  • 3 types
    • Average inter-item correlation
      • How well each item compares to other items
      • Correlate all items to the other items
      • Should not see a lot of variatsion
      • Assume unidimensionality
      • Tells you which item are problematic
    • Split-half reliability
      • Randomly select half of the scores, sum them, run a correlation
    • Cronbach's alpha
      • Do every split half, and average them
      • Most stable
      • Tells you how much of a problematic effect an item has
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Average inter-item correlation

A
  • Part of intercal consistency reliability
  • How well each item compares to other items
  • Correlate all items to the other items
  • Should not see a lot of variatsion
  • Assume unidimensionality
  • Tells you which item are problematic
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Split-half reliability

A
  • Part of internal consistency reliability
  • Randomly select half of the scores, sum them, run a correlation
  • If you have a lot of poor items it will show a bad correlation
  • With short measures with a few items one bad item can throw it off
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Chronbach’s Alpha

A
  • Part of internal consistency reliability
  • Do every split half, and average them
  • Most stable, preferred method
  • Tells you how much of a problematic effect an item has
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

systematic error

A

• No shift in variability • Affects the average • This is called the bias • Random error does not always have systematic error • When we have systematic error, we will always have random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distinction between Validity and Reliability

A
  • Reliability
    • Consistent measurement over time, raters, or forms.
  • Validity
    You are measuring what you think you are measuring
  • Much more important than reliability
    • You could reliably measure the wrong thing
  • A measurement is considered valid when the test overlaps with the constructs of interest
  • you can have something that is reliable and not valid, but not the other way around
    • scale example
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is it imprecise to say that a test is valid?

A
  • Validity is a matter of degree.
  • No test is valid or not valid.
  • “If a test is not valid then it does not exist”
  • Measurement being valid is not a yes or a no, but to what degree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
  • construct underrepresentation
  • construct irrelevant variance
  • Valid measurement
  • How does increased or decreased validity play into the relationship between these?
A

construct underrepresentation

  • the aspects of the construct that our test does not tap into
  • and what we do not know

construct irrelevant variance

  • factors that influence responses on the test that go beyond the actual construct itself
  • includes random and systematic error
  • pulling away from a perfect true score
  • The test

Valid measurement

  • the overlap between construct underrepresentation and construct irrelevant variance

Increased validity-

  • they overlap more
  • minimizing construct underrepresentation and construct irrelevant variance

Decreased validity

  • the overlap less
  • construct underrepresentation and construct irrelevant variance are pulling away from the valid measurement
    *
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Content validity

A
  • Examination of aspects of the test itself to ensure that we have as much overlap as possible with the construct
  • How well our measurement is tapping into the actual construct
  • Good match between test content and the domain = High content validity
  • Conceptual as opposed to statistical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How to assess content validity

A

Describe the content domain

  • In terms of boundaries and structure
    • Boundary refers to all boundaries or aspects that are operative within the construct.
    • Structure has to do with the relative weight or importance of each of these aspects within the construct. (how important each of them are)

Determine the areas of the content domain that are measured by each test item

  • Do not want a single item to tap into multiple constructs

Compare the structure of the test with the structure of the content domain

  • Is our measure representing the construct appropriately?
    • First, we can have a relative number of items within the measure that reflects the weight within the construct
    • Or, we can create our scoring rubric to reflect the structure of the construct.

Additional info:

you can score to adjust the structural issues

for a boundary issue the weights will change

for a large boundary issue the test may need to change

this is not a statistical process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Construct validity

A
  • Examination of the relationship between test scores and those of other measures.
  • This is a statistical process where we look to see the amount of overlap.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Differences between content and construct validity

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Why is an assessment of content validity seen less in psychological measurement than in educational testing?

A
  • Educational content is much less ambiguous than constructs we deal with in psychology
  • It is a confirmatory process rather than statistical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Methods of determining construct validity

A

Correlational study

  • Correlations between measures of certain behaviors (that are either related to or unrelated to our construct of interest) and our test
  • Generally done, tends to be the preferred approach
  • Convergent validity
  • Discriminany validity

Factor analysis

  • Analyzing which groups of items “hang” together
  • Work best when dealing with constructs with multiple aspects

Experimental manipulation

  • Manipulate the construct of interest (i.e. induce fear) and see if it relates to different scores on our test
  • Why don’t we do that all the time? We can’t assign to suicide and trauma conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Correlational study

A
  • Way to assess construct validity
  • Correlations between measures of certain behaviors (that are either related to or unrelated to our construct of interest) and our test
  • Generally done, tends to be the preferred approach
  • Assessment of convergent and discriminate validity.
26
Q

Factor Analysis

A
  • Way to assess construct validity
  • Analyzing which groups of items “hang” together
  • Work best when dealing with constructs with multiple aspects
    • Ex: Athletic self-appraisal scale, when we put together our measure we are looking at the construct validity we are looking at thousands of participants’ data with our finders crossed hoping that it will map onto the theory. They wanted it to map onto the boundaries that match the theory or construct behind it
27
Q

Experimental Manipulation

A
  • Way to assess construct validity
  • Manipulate the construct of interest (i.e. induce fear) and see if it relates to different scores on our test
  • Don’t see it after but it is often when you can do it
  • Say you give someone a measure, say of fear. You have an individual and they are looking at the construct to see if it maps onto fear. The person scores a 7. Right when they put down their last response you whip out a machetti and chase them around the room. As you are going to hit then with them machettit they re fill out the measure and they score a 99. You can be sure that the measure measures fear because you manipulated the situation
  • Why don’t we do that all the time? We can’t assign to suicide and trauma conditions
28
Q

Convergent validity

A
  • Type of construct validity
  • we expect to find significant correlations between measures of behaviors related to our construct and our test
  • You would expect that the scores you obtain will correlate with other measures that tap into the same construct.
  • Conceptually, you can have a negative correlation and have convergent validity if it is significant and the construct that the other test is tapping into is the constructural antithesis you are tapping into.
  • Generally, you will correlate and look at a significant correlation.
29
Q

discriminant validity

A
  • a type of construct validty
  • it is correlational
  • Behaviors not associated with our construct of interest should NOT BE CORRELATED with our test
    • Our measure of depression and STAI (anxiety inventory)
30
Q

Face Validity

A
  • If a test looks like it measures its target construct
  • Don’t confuse this with the empirical approach
  • We don’t always want high face validity
    • If knowing what is being measured will change how people respond we need to have low face validity
    • If you know the construct is safe, we can have high face validity because we don’t want to waste their time
31
Q

What is heterotrait monomethod?

A
  • This is multiple-trait, and one method.
  • also known as method bias.
  • if there is one found to be significant, then there could be an observer issue
  • if they are all significant to one side, then that is a true bias.
32
Q

heterotrait, heteromethod

A

discriminant validity (MTMM)

  • Different trait, different method
  • we don’t want a significant correlation
33
Q

monotrait, heteromethod

A

Convergent Validity (MTMM)

  • Same trait, different method
  • Look for significance
  • validity diagonals on MTMM
34
Q

Criterion

A

A measure that could be used to determine the accuracy of a decision

35
Q

Criterion-Related Validity

A
  • Validty for decision making
  • Association of test scores that are obtained on our measure with some sort of quantifiable outcome.
  • There are predictive and concurrent validation strategies within this.
36
Q

Predictive validation strategies

A
  • Involve correlations
  • Test related to criterion
  • Different from concurrent because of when it is presented
  • This is the gold standard
  1. Obtain test scores
  2. Hire everyone
  3. Obtain performance measures and correlate with these test scores
  • cons
    • time
    • place
    • cost
    • don’t turn anyone away (morally iffy)
    • maintain peak for the “trial”
    • might not have enough positions
    • may have unproductive people just standing around
  • pro
    • range restriction does not exist (positive thing)
    • matches population
37
Q

Concurrent Validation Strategies

A
  • Involve correlation
  • Test related to criterion
  • Practical alternative
  • Test scores and criterion scores from a preseleced population are obtained at the same time
  • Pros
    • More practical
    • Easier
    • Cost effective
    • Coefficients similar to those of predictive
  • Con
    • Range restriction exists
      • Would know this ahead of time
      • Reduces correlation
      • Direct (predictor), indirect(criterion)
        • it can be direct through the selection of a predictor (maximize the amount of positives and negatives)
        • indirect through the criterions
      • Cannot screen out failure
38
Q

Base and Selection Rate Essay Question

A

If you have high validy it will move your percentage for true positives and true negatives. Even if validity wasn’t as high, but was significant, the percentages would still be maximized, but they wouldn’t be as high. This is with the outcome with the use of a test as a predictor.

39
Q

Why is it good to have a measure that appears to be valid, but not at the expense of empirical validity?

A
  • If you have a construct like social desirability, you don’t want it to blatantly obvious so that people give answers that make them look like a good person
    • If knowing what it is that is being measured is going to influence the way that people respond, we want to shy away from something with very high face validity.
    • However, if it does not matter then it is better to have a measure with high face validity. Go with a measure with high face validity because people hate their time being wasted.
    • Face validity is important unless it is going to change the response behavior
  • Empirical validity is more valid because people are not giving simply desireable responses
    • Face validity can clue people in, making it less valid
40
Q

What is Base Rate, and how is it calculated?

A
  • Level of performance on the criterion in the general population.
  • How many people would be successful fro the general population
  • e.g.: There is a lower base rate for brain surgery than for baking cookies.
  • Calculation example: If 75% of the population would would be successful, the base rate would be .75.
41
Q

What is selection ratio, and how is it calcuated?

A
  • Ratio of positions to applicants
  • number of positions you have open compare to the number of people applying
  • Calculation example: If you have 30 people apply for 3 jobs, the selection ration = 10% or .10.
42
Q

4 Possible Outcomes for Decisions

A
  1. Accept someone that will be successful * True Positive*
  2. Reject someone that will be successful * False Negative
  3. Accept someone that would fail * False Positive
  4. Reject someone that would fail * True Negative*
43
Q

What is the impact of a large base rate? What is the impact of a small base rate?

A

Base Rate is Large

  • Everybody would be successful
  • High number of true positives and high number of false negatives
  • High number of fals begatives, because people that are really sccessful are being turned away due to having a limited number of positions

Base Rate is Small

  • Hadly anybody would be successful
  • High number of true negatives, high number of false positives
44
Q

What leel of base rate is ideal in terms of accurate decision making?

A
  • looking for a base rate of .5.
  • This will keep the influence of base rate minimal
  • you can have an estimation of what the base rate is, and use estimation to see how that base rate is impacting the decision making
45
Q

What level of selection ratio has the biggest impact of correct decision making?

A

If the selection ratio is high, then the number of positions and applicants are equal.

  • There is no point in doing the assessment
  • high selection ratio will keep you from having an impact on your decision
  • When selection ratio is low, it will have the biggest impact on validity
46
Q

Response Bias

A
  • Cognitive influence in which the respondent feels compelled to respond in a certain way rather than reflect their certain beliefs
  • it is a method of responding that goes beyond the construct
    • construct irrelevant variance
  • Effects reliability and validity
    • Decreases both of these
47
Q

What is acquiescence bias?

A
  • When an individual agrees with statements without regard for the statement’s meaning (Yea-Saying).
48
Q

Why is it difficult to distinguish between possible acquiescent responses and valid responses for an individual?

A
  • Difficult because it is hard to tell what a genuine response is, youll need to use multiple measures or reverse code
    • An easy way to spot if a person does this is to have both positively and negatively worded items in the measure (ex: I love kittens, I hate kittens)
49
Q

Extreme reporting

A
  • response bias
    • Overuse of extreme options
    • Hard to identify on an a single item
      • Look at a pattern of items
50
Q

What is selection bias and when does this occur?

A
  • bias occurs when the survey sample does not accurately represent the population
    • underrepresentation
51
Q

Moderate reporting

A
  • response bias
    • Avoidance of extreme options
    • What do you do?
      • It is obvious toss, but could be there belief
      • 1 or 2 instances in a large data set leave them in
      • keep track of how many and state it in limitation
    • when most likely to occur
      • continuous scale of measurement
      • pick a side on an issue
      • respondent has doubt about the issue
52
Q

What is social desirability response bias? What are some possible sources that can increase the likelihood of an individual slipping into a social desirablity response bias?

A
  • It is the tendency for a person to respon in a way that seems socially appealing, regardless of their true characteristics.
  • People can slip into this based on their region or if there is a consequence for not answering in a socially desirable way.
53
Q

Malingering

A
  • response bias
    • An individual fakes had for secondary gain
    • Occurs in 7.3-27% of general psychology evaluations
    • Example
      • VA
      • worse vets got services faster
    • Test for malingering
      • Want to identify secondary gain
54
Q

In regard to social desirability: What is the difference between impression management and self-deception?

A
  1. Impression Management - when someone is trying to impress someone else
  2. Self-deception - when someone is fooling themselves to believe that they are better than they are (a pervasive trait)
55
Q

undercoverage

A
  • Under coverage occurs when some members of the population are inadequately represented in the sample
    • Example: literary digest
      • Favored by the wealthy
      • Underrepresents low-income
    • Used telephone directory
    • Car registration
56
Q
  • What are non-response bias and voluntary response bias? When/how might these happen
  • what impact might they have on sampling?
A

Non-response bias:

  • Sometimes, individuals chosen for the sample are unwilling or unable to participate in the survey.
  • Bias that results when respondents differ in meaningful ways from non-respondents.

Voluntary Response Bias:

  • Voluntary response bias occurs when sample members are self-selected volunteers, as in voluntary samples.
  • Ex: Call in radio show
57
Q

What is sampling error? Under what circumstances might sampling error increase?

A
  • The variability among statistics from different samples. (deviation from one sample to another)
  • even though one sample is different from another, the mean average will be representative of what the population perameter is
  • a small sample size will increase sampling error
58
Q

What impact does sample size have on sampling error?

A

Increasing the sample size tends to reduce the sampling error; that is, it makes the sample statistic less variable.

59
Q

Voluntary Response Bias

A
  • Sample members are self-selected volunteers, as in voluntary samples
  • Examples: call-in radio shows
    • Over represent people with strong opinion
60
Q

Bias Due to Unrepresentative Samples

A
  • A good sample is representative. This means that each sample point represents the attributes of a known number of population elements.
  • Bias often occurs when the survey sample does not accurately represent the population. The bias that results from an unrepresentative sample is called selection bias.
61
Q

What impact does increasing sample size have on survey bias?

A

Increasing sample size does not affect survey bias. A large sample size cannot correct for the methodological problems (undercoverage, non response bias, etc.) that produce survey bias.