Test 2 Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Define Reliability

A

the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

reliability coefficent

A

the results obtained from the statistical evaluation of reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

define systematic error

A

when a single source of error always increases or decreases the true score by the same amount

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

define true score

A

the amount of the observed score that truly represents what you are intending to measure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

define error component

A

the number of other variables that can impact the observed score

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is internal consistency

A

it measures the reliability of test scores on the number of items on the test and the intercorrelation among the items. therefore it compares each item to every other item
- How related the items (or groups of items) on the are to one another. This is whether knowledge on how a person answered one item on the test would give you information that would help you correctly predict how he or she answered another item on the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the bench mark number for internal consistency

A

.30/ .70 . 70% true score and 30% error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is item-total correlations

A

the correlation of the item with the remainder of the items (the percentage of error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

define average intercorrelation

A

the extent to which each item represents the observation of the same thing observed (connection between the items)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is a split half

A

refers to determining a correlation between the first half of the measurement and the second half of the measurement
o divide the test into two halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the odd even method

A

refers to the correlation between even items and odd items of a measurement tool

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

advantages and disadvantages of the split half/odd-even method

A

Advantages:

  • simplest method- easy to perform
  • time and cost effective
  • because you only need one administration

Disadvantages

  • many ways of splitting (odd-even, 1st vs 2nd half, random)
  • each split yields a somewhat different reliability estimate
  • which is the real reliability of the test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is test-retest reliability

A

measured by computing the correlation coefficient between scores of two administration
the same test is administered to the same group of people but there is a certain amount of time in between each test administration

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the benchmark number for test - retest reliability

A

.50 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

define practice effects

A

occurs when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the second time they do the test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

define memory effects

A

which means that a respondent may recall the answers from the original test, therefore inflating the reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is interrater reliability

A
  • Interrater reliability means that if two different rater scored the scale using the scoring rules, they should attain the same result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

how is interrater reliability measured?

A

measured by % of agreement between raters or computer the correlation coefficient between scores of two raters for the set of respondents (the raters’ scoring is the source of error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

intrascorer reliability

A

whether each clinician was consistent in the way he or she assigned scored from test to test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what is the benchmark score for interrater reliability

A
  • Here the criterion of acceptability is pretty high (ex. a correlation of at least .80 or agreement above 75%), but what is considered acceptable will vary from situation to situation

.80 and above

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

define parallel/alternative forms method

A

refers to the administration of two alternate forms of the same measurement device and then comparing the scores.
- Both forms of the tests are given to the same person and then you compare the scores

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

advantages and disadvantages of parallel/alternative forms method

A

Advantages
- eliminates the problem of memory effect
- reactivity effects (ie. Experience of taking the test) are also partially controlled
- can address a wider array of sampling of the entire domain than the test-retest method
possible disadvantages
- are the two forms of the test actually measuring the same thing (same construct)
- more expensive because more man power is required to make two tests
- requires additional work to develop two measurement tools because two tests have to be created

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what is generlizability theory

A
  • theory of measurement that attempts to determine the multiple sources of consistency and inconsistency- known as factors or facets
  • Identifies both systematic and random sources of inconsistency allow for the evaluation of interaction from different types of error sources
  • Looks at all possible sources of errors and then separates each source of error and evaluates its impact on reliability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

what is are the limitations of the generalizability theory

A
  • u cannot measure every single source of error
  • tougher to complete generalizability theory because a lot of the work has to be done upfront. A lot of the upfront work is done in regard to what data to collect, how much data to collect, what measures. All these sources of error have to thought about upfront. With CTT you can do the test first and then look at the factors with regards to reliability.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

what is standard error of measurement (SEM)

A

an estimate of how much the observed test score might different from the true test score

a statistic that obtains the confidence interval for many obtained scores. It represents the hypothetical distribution we would have if someone took a test an infinite # of times

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

how to calculation SEM

A

SD(sq root of 1 minus reliability)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

define confidence interval

A

Give an estimate of how much error is likely to exist in an individual’s observed score, that is, how big the difference between the individual’s observed score and his or her true score is likely to be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

what is cronbachs alpha

A

coefficient of internal consistency- commonly used. Looks at interval scale. Determines which questions on the scale are interrelated. Used for test questions such as rating scales that have more than one correct answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

what is Kuder Richardson (KR-20)

A

used for dichotomous items (ex. 0 or 1, true or false). Dichotomous scale. Ordinal in nature. Used when There is either a right or wrong answer. There is only one correct answer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

what is spearman brown

A

used in split- half analysis is used to adjust the reliability coefficient. It is designed to estimate what the reliability would be if the tests had not been cut in half

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

what is cohens kappa

A

inter-rating reliability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

what is the benchmark for split-half

A

.70 and above

33
Q

what is the benchmark for parallel or alternative form

A

.70 and above

34
Q

define heterogeneity of items

A

the greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the change for low reliability correlation coefficients. Ex. test contains multiple choice, true and false, fill in the blank, etc

35
Q

define homogeneity of items

A

the greater the homogeneity (similarity in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients. The similarity of the questions ex. test contains only multiple choice

36
Q

define validity

A

refers to measuring what we intended to measure, can we do it accurately

37
Q

what does the validity coefficient represent

A

the amount or strength of evidence of validity based on the relationship of the test and criterion

38
Q

define construct validity

A

gradual accumulation of evidence that the scores on the test relate to observable behaviours in the way predicted by the underlying theory

involves comparing a new measure to an existing, valid measure
Usually existing valid measures don’t exist. That is often why the new scale is being created in the first place

39
Q

what is evidence based on test content

A

Involves logically examining and evaluating the content of a test (including the rest questions, format, wording, and tasks required for test takers) to determine the extent to which the content is representative of the concepts that the test is designed to measure without

40
Q

what is evidence based on relations to other variables

A

Involves correlating test scores with other measures to determine whether those scores are related to other measures to which we would expect them to relate. We would also like to know if the test measures are not related to other measures to which we would not expect them to relate to

41
Q

what is evidence based on internal structure

A

Focuses on whether the conceptual framework used in test development could be demonstrated using appropriate analytical techniques

42
Q

what is evidence based on response processes

A

Involves observing test takers as they respond to the test or interviewing them when they complete the test

43
Q

what is evidenced based on consequences of testing

A

Differentiating between intended and unintended consequences of testing

44
Q

define content validity

A

is when we evaluate the test and we look at things such as test questions, the format, the scoring and the wording

45
Q

define psychological construct

A

traits or characteristics that tests are designed to measure (usually not observable)

46
Q

define concrete construct

A

attirbute or characteristic, make it easier to define and also created items for. these are easily observable when compared to abstract characteristics or traits. ex. playing a piano

47
Q

define abstract construct

A

characteristics or attributes that are harder to observe for instance intelligence

48
Q

what is construct explication

A

process of providing a detailed description of the relationship between specific behaviours and abstract constructs. the process of trying to figure out what items are inside or outside the test construct/content

49
Q

the 3 steps of construct explication

A
  1. identify behaviours related to the construct
  2. identify other constructs and decide whether they are related or unrelated to the construct being measured
  3. identify behaviours that are related to the additional constructs and determine if these are related or unrelated to the construct being measured
50
Q

define nomological network

A

a method for defining a construct by illustrating its relation to as many other constructs and behaviours as possible

51
Q

define content validity ratio (CVR)

A

provides a measure of agreement among the judges/experts

52
Q

define face validity

A

Face validity answers the question “does it appear to the test taker that the question on the tests are related to the purpose for which the test is given”
Face validity is only concerned with how test takers perceive the appropriateness of the test

53
Q

advantages of face validity

A
  • If the respondent knows what information we are looking for, they can use “context” to help interpret the questions and provide more useful, accurate answers
  • The respondent can make an educated decision
54
Q

disadvantages of face validity

A
  • If the respondent knows what information we are looking for, they might try to bend & shape their answers to what they think we want
  • Ie. Faking good or faking bad
55
Q

define convergent validity

A

the extent to which the scale correlates with measures of the same or related concepts

56
Q

define divergent/discriminant validity

A

the extent to which the measure does not correlated with measures of unrelated or distinct concepts

57
Q

what is the multitrat-multimethod (MMTMM) matrix method

A

The researcher chooses two or more constructs that are unrelated in theory and two ore more types of test to measure each of the constructs
used to assess a test’s construct

58
Q

define heterotrait heteromethod

A

multiple traits and multiple ways of assessing those traits

59
Q

define heterotrait monomethod

A

more than one trait acorss the same way of assessment

60
Q

define monotrait-heteromethod correlations

A

same trait measured by two different methods

61
Q

define monotrait-monomethod correlation

A

same trait using the same method

62
Q

list of the multitrait- multi method matrix pairs from highest to lowest correlation

A
Highest- 
Monotrait monomethod 
monotrait heteromethod 
heterotrait monomethod 
heterotrait heteromethod
-Lowest
63
Q

define factor

A

a combination of variables that are intercorrelated and thus measure the same characteristics

64
Q

define factor analysis

A

statistical techniques used to analyze patterns. of correlations among different variables and measures
- Factor analysis looks at the relationship between all the factors and creates groups of factors based on the relationships between the factors

65
Q

what is the goal of factor analysis

A

to reduce the numbers of dimensions needed to describe data derived from a large number of data

66
Q

how is factor analysis done

A

a series of mathematical calculations, designed to extract patterns of intercorrelations among a set of variables (ex. division questions are correlated with division question and multiplication questions with multiplication)

67
Q

what is the subjective element to factor analysis

A

There is a subjective element to factor analysis because once the statistical results have been computed the researcher must review the grouping to see if they make sense based on the construct the test items were designed to measure

68
Q

define exploratory factor analysis

A

Researchers do not propose a formal hypothesis about the factors that underlie a set of test scores, but instead use the procedure broadly to help identify underlying components

69
Q

define confirmatory factor analysis

A

The researcher specifies in advance what they believe the factor structure of their data should look like and then statistically tests how well that model actually fits the data
The researcher relies on existing theoretical or empirical knowledge to design the model that is being tested
Evidence for construct validity would be provided if the results from the factor analysis fit the model created by the researcher. If not the model should be revised and retested

70
Q

define kaiser guttman criteria

A

retains factors with eigenvalues greater than 1.0

to be considered a factor is much have a eigenvalue greater than 1.0

71
Q

define eigenvalue

A

is the calculation that go into a factor

72
Q

define scree plot

A

plot factors on the horizontal axis and eigenvalues on the vertical axis. look for an elbow

73
Q

advantages of factor analysis

A
  • Simplifies interpretation

- Can learn more about the composition of variables

74
Q

disadvantages of factor analysis

A
  • Do the combining of factors capture the essential aspects of what is being measured?
  • Are the factors generalizable to other populations (ex. different cultures, gender, individuals with disabilities)
75
Q

define criterion related validity

A

measures the relationship between the predictor and the criterion, and the accuracy with which the predictor is able to predict performance on the criterion

76
Q

define concurrent criterion related validity

A

criterion date are collected before or at the same time that the predictor is administered

77
Q

define predictive criterion related validity

A

criterion data are collected after the predictor is administered

78
Q

define subjective criteria

A

based upon a individuals judgement ex. peer ratings

79
Q

define objective criteria

A

based upon specific measurement (how fast someone is, how many absence from class)