Test 2 Flashcards
Define Reliability
the degree to which test scores for an individual test taker or group of test takers are consistent over repeated applications
reliability coefficent
the results obtained from the statistical evaluation of reliability
define systematic error
when a single source of error always increases or decreases the true score by the same amount
define true score
the amount of the observed score that truly represents what you are intending to measure
define error component
the number of other variables that can impact the observed score
what is internal consistency
it measures the reliability of test scores on the number of items on the test and the intercorrelation among the items. therefore it compares each item to every other item
- How related the items (or groups of items) on the are to one another. This is whether knowledge on how a person answered one item on the test would give you information that would help you correctly predict how he or she answered another item on the test
what is the bench mark number for internal consistency
.30/ .70 . 70% true score and 30% error
what is item-total correlations
the correlation of the item with the remainder of the items (the percentage of error)
define average intercorrelation
the extent to which each item represents the observation of the same thing observed (connection between the items)
what is a split half
refers to determining a correlation between the first half of the measurement and the second half of the measurement
o divide the test into two halves and then compare the set of individual test scores on the first half with the set of individual test scores on the second half
what is the odd even method
refers to the correlation between even items and odd items of a measurement tool
advantages and disadvantages of the split half/odd-even method
Advantages:
- simplest method- easy to perform
- time and cost effective
- because you only need one administration
Disadvantages
- many ways of splitting (odd-even, 1st vs 2nd half, random)
- each split yields a somewhat different reliability estimate
- which is the real reliability of the test
what is test-retest reliability
measured by computing the correlation coefficient between scores of two administration
the same test is administered to the same group of people but there is a certain amount of time in between each test administration
what is the benchmark number for test - retest reliability
.50 and above
define practice effects
occurs when test takers benefit from taking the test the first time (practice) which enables them to solve problems more quickly and correctly the second time they do the test
define memory effects
which means that a respondent may recall the answers from the original test, therefore inflating the reliability
what is interrater reliability
- Interrater reliability means that if two different rater scored the scale using the scoring rules, they should attain the same result
how is interrater reliability measured?
measured by % of agreement between raters or computer the correlation coefficient between scores of two raters for the set of respondents (the raters’ scoring is the source of error)
intrascorer reliability
whether each clinician was consistent in the way he or she assigned scored from test to test
what is the benchmark score for interrater reliability
- Here the criterion of acceptability is pretty high (ex. a correlation of at least .80 or agreement above 75%), but what is considered acceptable will vary from situation to situation
.80 and above
define parallel/alternative forms method
refers to the administration of two alternate forms of the same measurement device and then comparing the scores.
- Both forms of the tests are given to the same person and then you compare the scores
advantages and disadvantages of parallel/alternative forms method
Advantages
- eliminates the problem of memory effect
- reactivity effects (ie. Experience of taking the test) are also partially controlled
- can address a wider array of sampling of the entire domain than the test-retest method
possible disadvantages
- are the two forms of the test actually measuring the same thing (same construct)
- more expensive because more man power is required to make two tests
- requires additional work to develop two measurement tools because two tests have to be created
what is generlizability theory
- theory of measurement that attempts to determine the multiple sources of consistency and inconsistency- known as factors or facets
- Identifies both systematic and random sources of inconsistency allow for the evaluation of interaction from different types of error sources
- Looks at all possible sources of errors and then separates each source of error and evaluates its impact on reliability
what is are the limitations of the generalizability theory
- u cannot measure every single source of error
- tougher to complete generalizability theory because a lot of the work has to be done upfront. A lot of the upfront work is done in regard to what data to collect, how much data to collect, what measures. All these sources of error have to thought about upfront. With CTT you can do the test first and then look at the factors with regards to reliability.
what is standard error of measurement (SEM)
an estimate of how much the observed test score might different from the true test score
a statistic that obtains the confidence interval for many obtained scores. It represents the hypothetical distribution we would have if someone took a test an infinite # of times
how to calculation SEM
SD(sq root of 1 minus reliability)
define confidence interval
Give an estimate of how much error is likely to exist in an individual’s observed score, that is, how big the difference between the individual’s observed score and his or her true score is likely to be
what is cronbachs alpha
coefficient of internal consistency- commonly used. Looks at interval scale. Determines which questions on the scale are interrelated. Used for test questions such as rating scales that have more than one correct answer
what is Kuder Richardson (KR-20)
used for dichotomous items (ex. 0 or 1, true or false). Dichotomous scale. Ordinal in nature. Used when There is either a right or wrong answer. There is only one correct answer
what is spearman brown
used in split- half analysis is used to adjust the reliability coefficient. It is designed to estimate what the reliability would be if the tests had not been cut in half
what is cohens kappa
inter-rating reliability
what is the benchmark for split-half
.70 and above
what is the benchmark for parallel or alternative form
.70 and above
define heterogeneity of items
the greater the heterogeneity (differences in the kind of questions or difficulty of the question) of the items, the greater the change for low reliability correlation coefficients. Ex. test contains multiple choice, true and false, fill in the blank, etc
define homogeneity of items
the greater the homogeneity (similarity in the kind of questions or difficulty of the question) of the items, the greater the chance for high reliability correlation coefficients. The similarity of the questions ex. test contains only multiple choice
define validity
refers to measuring what we intended to measure, can we do it accurately
what does the validity coefficient represent
the amount or strength of evidence of validity based on the relationship of the test and criterion
define construct validity
gradual accumulation of evidence that the scores on the test relate to observable behaviours in the way predicted by the underlying theory
involves comparing a new measure to an existing, valid measure
Usually existing valid measures don’t exist. That is often why the new scale is being created in the first place
what is evidence based on test content
Involves logically examining and evaluating the content of a test (including the rest questions, format, wording, and tasks required for test takers) to determine the extent to which the content is representative of the concepts that the test is designed to measure without
what is evidence based on relations to other variables
Involves correlating test scores with other measures to determine whether those scores are related to other measures to which we would expect them to relate. We would also like to know if the test measures are not related to other measures to which we would not expect them to relate to
what is evidence based on internal structure
Focuses on whether the conceptual framework used in test development could be demonstrated using appropriate analytical techniques
what is evidence based on response processes
Involves observing test takers as they respond to the test or interviewing them when they complete the test
what is evidenced based on consequences of testing
Differentiating between intended and unintended consequences of testing
define content validity
is when we evaluate the test and we look at things such as test questions, the format, the scoring and the wording
define psychological construct
traits or characteristics that tests are designed to measure (usually not observable)
define concrete construct
attirbute or characteristic, make it easier to define and also created items for. these are easily observable when compared to abstract characteristics or traits. ex. playing a piano
define abstract construct
characteristics or attributes that are harder to observe for instance intelligence
what is construct explication
process of providing a detailed description of the relationship between specific behaviours and abstract constructs. the process of trying to figure out what items are inside or outside the test construct/content
the 3 steps of construct explication
- identify behaviours related to the construct
- identify other constructs and decide whether they are related or unrelated to the construct being measured
- identify behaviours that are related to the additional constructs and determine if these are related or unrelated to the construct being measured
define nomological network
a method for defining a construct by illustrating its relation to as many other constructs and behaviours as possible
define content validity ratio (CVR)
provides a measure of agreement among the judges/experts
define face validity
Face validity answers the question “does it appear to the test taker that the question on the tests are related to the purpose for which the test is given”
Face validity is only concerned with how test takers perceive the appropriateness of the test
advantages of face validity
- If the respondent knows what information we are looking for, they can use “context” to help interpret the questions and provide more useful, accurate answers
- The respondent can make an educated decision
disadvantages of face validity
- If the respondent knows what information we are looking for, they might try to bend & shape their answers to what they think we want
- Ie. Faking good or faking bad
define convergent validity
the extent to which the scale correlates with measures of the same or related concepts
define divergent/discriminant validity
the extent to which the measure does not correlated with measures of unrelated or distinct concepts
what is the multitrat-multimethod (MMTMM) matrix method
The researcher chooses two or more constructs that are unrelated in theory and two ore more types of test to measure each of the constructs
used to assess a test’s construct
define heterotrait heteromethod
multiple traits and multiple ways of assessing those traits
define heterotrait monomethod
more than one trait acorss the same way of assessment
define monotrait-heteromethod correlations
same trait measured by two different methods
define monotrait-monomethod correlation
same trait using the same method
list of the multitrait- multi method matrix pairs from highest to lowest correlation
Highest- Monotrait monomethod monotrait heteromethod heterotrait monomethod heterotrait heteromethod -Lowest
define factor
a combination of variables that are intercorrelated and thus measure the same characteristics
define factor analysis
statistical techniques used to analyze patterns. of correlations among different variables and measures
- Factor analysis looks at the relationship between all the factors and creates groups of factors based on the relationships between the factors
what is the goal of factor analysis
to reduce the numbers of dimensions needed to describe data derived from a large number of data
how is factor analysis done
a series of mathematical calculations, designed to extract patterns of intercorrelations among a set of variables (ex. division questions are correlated with division question and multiplication questions with multiplication)
what is the subjective element to factor analysis
There is a subjective element to factor analysis because once the statistical results have been computed the researcher must review the grouping to see if they make sense based on the construct the test items were designed to measure
define exploratory factor analysis
Researchers do not propose a formal hypothesis about the factors that underlie a set of test scores, but instead use the procedure broadly to help identify underlying components
define confirmatory factor analysis
The researcher specifies in advance what they believe the factor structure of their data should look like and then statistically tests how well that model actually fits the data
The researcher relies on existing theoretical or empirical knowledge to design the model that is being tested
Evidence for construct validity would be provided if the results from the factor analysis fit the model created by the researcher. If not the model should be revised and retested
define kaiser guttman criteria
retains factors with eigenvalues greater than 1.0
to be considered a factor is much have a eigenvalue greater than 1.0
define eigenvalue
is the calculation that go into a factor
define scree plot
plot factors on the horizontal axis and eigenvalues on the vertical axis. look for an elbow
advantages of factor analysis
- Simplifies interpretation
- Can learn more about the composition of variables
disadvantages of factor analysis
- Do the combining of factors capture the essential aspects of what is being measured?
- Are the factors generalizable to other populations (ex. different cultures, gender, individuals with disabilities)
define criterion related validity
measures the relationship between the predictor and the criterion, and the accuracy with which the predictor is able to predict performance on the criterion
define concurrent criterion related validity
criterion date are collected before or at the same time that the predictor is administered
define predictive criterion related validity
criterion data are collected after the predictor is administered
define subjective criteria
based upon a individuals judgement ex. peer ratings
define objective criteria
based upon specific measurement (how fast someone is, how many absence from class)