First Exam Flashcards

1
Q

Reliability

A

How consistent is the entire instrument, the closer it is to 1, the more reliable the instrument is

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Psychometric theory looks at 2 things

A

The entire test (reliability) and the other side looks at item quality (non-dichotomous & dichotomous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you construct an instrument?

A
  • looking at the entire test (reliability) and item quality (non-dichotomous & dichotomous)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Entire test had 4 different types of reliability

A

-inter-rater, test-retest, internal consistency, parallel forms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Non-dichotomous and how it relates to variance

A

-you want higher variance to have a better normal curve, the more items you increase the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Validity

A

Accuracy, all of probability is based on infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reliability and error

A

Error can affect the consistency of scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

2 types of error

A

Systematic error & random error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Systematic error

A

Errors that occurs consistently because of a particular characteristic of the person being tested (reading proficiency)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Random error

A

Errors that occur by chance (black out, distraction) (distraction) (more common)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Different types of random error

A

Content differences, subjective scoring and temporal instability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Content differences (content based)

A

Non-standardized administrations (may inadvertently speak differently when administering test) ex: court ordered testing or a child using restroom during test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Subjective scoring

A

Raters difference- subjective viewing of the client maybe different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Temporal instability

A

One day test taker had the flu ex: things change day to day, the first day of testing went good, but second day there was an earthquake their performance went down

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are some ways to decrease measurement error

A

Writing clear items, making test instructions easily understood, adhering closely to the prescribed conditions for administering a instrument, training raters on themselves, make subjective scoring rules as explicit as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Where does most measurement error come from?

A

It has to do with the person administering the test, but it will change as one becomes more experienced

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Test-retest reliability (coefficient of stability)

A

When you take a single group of subjects and you repeatedly test on the same instrument at different times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the gold standard for test-retest reliability

A

2 weeks between the first test and the second test, this is where you get optimal test-retest reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

In test-retest reliability what is the difference between the shorter and longer gap?

A

Longer the time gap lower correlation, shorter the gap we get more similar factors that contribute to the error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Artificial inflation

A

When researchers use the shorter gap to get a better correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Parallel forms reliability

A

Assessing if two forms of the same instrument produce similar results when testing the same person (sometimes hard to achieve)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is form A & form B (parallel forms reliability)

A

How reliable they are with one another, the have these two spots to eliminate the practice affects

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a key problem with parallel forms reliability?

A

Difficult to randomly divide and hard to create large number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is a key problem with parallel forms reliability?

A

Difficult to randomly divide and hard to create large number of items

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a key part of parallel forms reliability?

A

Developing a large number of items and then randomly divide them into test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Coefficient of equivalence

A

How correlated the scores are of a persons taking similar tests with two different forms of

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

When should the two forms for parallel forms reliability be sent out?

A

They should be administered at least 2 weeks apart

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What happens if correlations between two testing is lower than .2?

A

There is significant measurement error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What happens if you administer the forms on the same day for parallel forms reliability?

A

test may reflect state rather than trait and you will not have a statically significant difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Internal consistency reliability

A

How related items are within the entire scale and within the subscales

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What do we want with internal consistency reliability?

A

The content should be similar for the reliability to be high, you need adequate number of items and want the item to underlie appropriately a particular construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Different types of internal consistency reliability

A

Split half reliability, Kudar Richardson #20 (KR 20), Cronbach Alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Split half reliability

A

Split the examinees scores into halves and then correlate the scores of both halves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How does split half reliability look like in speeded tests?

A

May produce artificially high internal consistency for odd and even split, if he/she runs out of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How to get good idea of split half reliability

A

They will take the odd questions and split those in half with the even questions, this will allow a better idea of split half reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are some problems with split half reliability?

A

Natural order of test taking (content is not the same with the first half as the second half) & Issue of a timed test (some people don’t get to the second half)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Kudar Richardson #20 (KR 20)

A

Formula that allows for split half reliability that is done under the assumption that the questions are scrambled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

How does Kudar Richardson stop a confound in your test?

A

By stopping the natural order

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

The drawbacks of KR 20

A

Only works with dichotomous scaling systems (only allows for right or wrong question responses)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Cronbachs Alpha

A

Can be used to assess internal consistency for those tests that have different scoring systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

When can and how can cronbachs alpha be used?

A

Can be used on any scoring system and allows for scrambling of the questions, used more than any other measure of internal consistency, equivalent to all split half correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Internal consistency & cronbachs alpha

A

High coefficient alpha does not always mean that you are measuring only one factor or latent construct (unidimensionality)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What do we assume in internal consistency?

A

We assume there is unidimensionality but more tests are inadvertently dimensional or multidimensional

44
Q

What do dimensional or multidimensional tests look like?

A

It can mean more than one factor is being measured (ex. AP history test measures knowledge, but also writing ability)

45
Q

How will cronbachs alpha be increased or artificially inflated?

A

If test takers are homogeneous group, need heterogeneity in the group (it will be more accurate if it is a general group of people)

46
Q

Interrater reliability

A

Assessing the degree of consistency between multiple raters

47
Q

2 kinds of interrater reliability

A

Kendall’s coefficient of concordance & Cohens Kappa

48
Q

Kendall’s coefficient of concordance

A

Degree of consistency amongst raters that rank order people/objects

Rank order consistency: miss universe, rank people in an order of 1,2,3,4,5 among different judges to see if they correlate with another

49
Q

Cohens Kappa

A

Degree of consistency amongst raters that classify items into discrete categories

50
Q

Example of cohens kappa

A

Assessing the same group of 30 people between two different raters, cohens kappa will identify who places which patients in depressed or not depressed

51
Q

Normal Curve

A

The probability that an observation under the normal curve lies within 1 SD of the mean is approx 0.68 & 2 SD of the mean is approx 0.95 & 3 SD of the mean is approx. 0.99

52
Q

Why is SEM important for testing?

A

SEM is based on the idea that your cannot test an individual infinite amount of times. Standardized error is always present

53
Q

How is JND difficult to apply to psychological constructs?

A

It is used to determine a level of sensory difference (like hearing or sight) and it is variability in the expression of disorders in humans

54
Q

What is item analysis and how is it related to test construction?

A

Examining the item quality to map the construct we have defined. We then look at dichotomous and no dichotomous measures to determine item quality (variance, covariance, etc.)

55
Q

How does one construct a test?

A

Need to determine what area or domain you want to examine; homogenous content; tests made for repeated use require validation

56
Q

Scaling models

A

Unidimensional, subject centered methods, stimulus centered methods and response centered approaches

57
Q

Subject centered methods

A

Test developers primary interest is locating the individual at different points on the continuum (likert scale)

58
Q

Stimulus centered methods

A

Psychophysics & JND- give tones to determine what is the absolute threshold to experiencing a sensation, but not always clear where the difference lies, not all of us agree on what the difference is. Need subject competency to tell the JND

59
Q

Response centered approaches

A

Each respondent is asked to rank order his or her preference for a set of stimuli or to rank order a set of statements in terms of their proximity to his or her own personal beliefs. Allows to scale psychological distance between separated categories

60
Q

Heterogeneity

A

Difference in character or content

61
Q

Heterogeneity

A

Difference in character or content

62
Q

Homogenous

A

Same character or content

63
Q

Meta-analysis

A

Multiple studies with the same research questions

64
Q

Bivariate

A

Split a variable into 2 parts

65
Q

Inferential statistics

A

Take sample data and make inferences on the population

66
Q

Descriptive statistics

A

Looks at trends in the sample and understand them based on the sample itself

67
Q

Assessment

A

Is an overall-testing score in the context of history (holistic)

68
Q

Testing

A

Is a quantitative score no larger context

69
Q

Niche building

A

Create, seek out and end up in environments that reinforce your traits, do this consciously and unconsciously

70
Q

Reliability and standard error of measurement

A

As reliability of the instrument increases the standard error of measurement goes down, if you know your test is getting consistency then of course your error will go down

71
Q

Achievement tests

A

They are trying to determine if a specific skills set or knowledge base has been acquired

72
Q

Popham & Husek (1969)

A

Learned that you cannot use traditional reliability since you are not interested in how someone does in comparison to a group of others— you are interested in how someone performs in regard to a specific criterion

73
Q

Criterion

A

Anything that has real world implication
Ex: a lawyer fails bar exam they cannot become a lawyer, these affect your real like because they affect you moving forward in a profession

74
Q

2 objectives achievement tests scores can give you

A

Relative position of the examinees score in a distribution of scores (z score) & the degree to which the person has attained the goal of a specific instruction (ex: comp exam)

75
Q

Z score

A

Measured in terms of standard deviations from the mean, Relative position of the examinees score in a distribution of scores

76
Q

Proportion correct score

A

Percentage of correct answers from a randomly determined number of test items (you don’t need to know how others performed if you know the percentage of correct answers obtained)

77
Q

Criterion referenced tests

A

Look at development (all of these tests are arbitrary), a test that measures a student’s performance against a set of predetermined standards or criteria.

78
Q

Domain score

A

The proportion of items in the domain that the examinee answers correctly

79
Q

Mastery allocation

A

Cutoff score that classifies examinees into two categories master vs. non-master (ex: EPPP)

80
Q

What does a z score allow for

A

Allows for comparison across variables that are calibrated or scaled differently, it is independent of scaling and calibrated

81
Q

what do z scores do for the WAIS/WISC (IQ tests) & MMPI

A

These both have different scoring which makes them not comparable, but the z-score is able to compare them

82
Q

Absolute error

A

Using an examinees score as mean score as a representation of his true universal score

83
Q

How is absolute error calculated

A

By summing all the error variance

84
Q

How criterion referenced reliability is examined

A

The lower the error the better the examinees score represents his domain referenced true knowledge

85
Q

Reliability of classification

A

-does the observed match on to what we predict, we want to know who passes and fails as well as how they are classified

86
Q

Predicted

A

What we observe to happen

87
Q

Observed

A

Is what has happened

88
Q

The percent that the items people are getting correct will affect

A

The reliability of achievement tests

89
Q

Where is the true reliability

A

It is in the middle, not the tails of the scoring, we want the middle to be lower to show a better reliability

90
Q

Should reliability be high?

A

Yes and all similar

91
Q

Homogenous samples and reliability

A

They have less reliability compared to heterogeneity

92
Q

Self report and reliability problems (2 major components)

A

Literal meaning and pragmatic meaning

93
Q

Literal meaning

A

Semantic understanding of sentence structure

94
Q

Pragmatic meaning

A

Inferences about the questions intent

95
Q

Issues with reliability & self report

A

Ex: how are you doing? Leads to interpretation by the participant in the conversation, this can cause issues with reliability because the client may interpret the question differently

96
Q

Self reports and reference periods

A

When asked to respond to something that occurred last week vs. last year, find differential responding

97
Q

Differential responding

A

There is an interpretation that the shorter the length implies frequency and longer the event more intensity

98
Q

Self reports and question context

A

Respondents change their answers based on researchers affiliation, or response categories themselves can change the way a patient may respond

99
Q

Self report and context

A

Preceding questions in a survey or questionnaire influences the ways in which respondents evaluate items

100
Q

Internet & psychological testing

A

Internet provides a cheaper and faster way to update tests, translate tests, interpret scores quickly, can get more respondents quickly, can provide access to test materials quite cheaply, allows those in rural areas to be tested

101
Q

Internet and ethical considerations

A

Test security, keeping the testing items secured, test may discriminate, language barriers, minors taking tests, not giving informed consent accurately, how do you give feedback to individual, how do you deal with emotional trauma from results

102
Q

Psychologists should use what type of tests?

A

Tests whose validity and reliability have been established for the population being tested

103
Q

How do you evaluate an item or test question is good?

A

Done through statistical analysis of the test questions

104
Q

Intrinsic traits

A

qualities that are inherent to something or someone, and are not dependent on external circumstances

105
Q

Difference between multidimensional and unidimensional in cronbachs alpha

A

Multidimensional has lower cronbachs alpha, uni has higher