Measurement Flashcards
What is validity?
The degree to which it measures what it is supposed to measure. Within validity, the measurement does not always have to be similar, as it does in reliability. However, just because a measure is reliable, it is not necessarily valid
Criterion Validity- to measure this, researchers must calibrate it against a known standard or against itself.
Consequential Validity-positive or negative social consequences of a test. The test must not have negative social consequences that seem abnormal.
Internal consistency-how consistently the items on a test measure a single construct or concept; only requires a group of people to take a test at once, no time allowance is needed.
Define content validity.
Content Validity
Also known as logical validity, is a verification that the method of measurement actually measures what it is expected to measure. Content validity is a type of validity that focuses on how well each question taps into the specific construct in question. Content validity requires the use of recognized subject matter experts to evaluate whether test items assess defined content and more rigorous statistical tests that does the assessment of face validity. We use CVI and CVR to do this.
Define Construct Validity.
Construct validity refers to the extent to which a higher-order construct, such as help seeking, teacher stress, or dyslexia, is accurately represented in the particular study. Construct validity is fostered by having a good definition and explanation of the meaning of the construct of interest. Construct validity evidence involves the empirical and theoretical support for the interpretation of the construct.
Define criterion validity.
To measure the criterion validity of a test, researchers must calibrate it against a known standard or against itself. Comparing the test with an established measure is known as concurrent validity; testing it over a period of time is known as predictive validity. In psychometrics, criterion or concrete validity is the extent to which a measure is related to an outcome. Criterion validity is often divided into concurrent and predictive validity.
Define concurrent validity.
Concurrent validity refers to a comparison between the measure in question and an outcome assessed at the same time. Predictive validiy measures the degree to which the test scores measuring one test criterion is consistent with other criterion being measured. Concurrent validity is established when the scores from a new measurement procedure are directly related to the scores from a well-established measurement procedure for the same construct; that is, there is consistent relationship between the scores from the two measurement procedures. Criterion validity is a good test of whether such newly applied measurement procedures reflect the criterion upon which they are based. When they do not, this suggests that new measurement procedures need to be created that are more appropriate for the new context, location, and/or culture of interest.
Define consequential validity.
Consequential validity refers to the positive or negative social consequences of a particular test. For example, the consequential validity of standardized tests include many positive attributes, including: improved student learning and motivation and ensuring that all students have access to equal classroom content.Consequential validity in testing describes the after effects and possible social and societal results from a particular assessment or measure. For an assessment to have consequential validity it must not have negative social consequences that seem abnormal.
What is reliability?
the consistency or stability of test scores. If scores are reliable, they will be similar on every occasion.
Identify ways of computing reliability.
Test-retest- the reliability of test scores over time
Give test once, 2nd time and correlate scores. High indicates score reliability.
Parallel Forms-consistency of a group of individual’s scores on alternative forms of a test designed to measure the same characteristic. IDentifical except for the items on the tests. Same # items, same difficulty level, items measure same construct; test is administered, scored, and interpreted the same way.
2 scores correlated, want high score and positive.
Interrator reliability-the degree of agreement or consistency between 2 or more scores, judges, or raters.
(Percentage agreement, Cohen’s Kappa, Generalizability)
What are the two types criterion validity?
Concurrent-comparison between the measure in question and an outcome assessed at the same time.
This is established when the scores from a new measurement procedure are directly related to the scores from a well-established measurement procedure for the same construct.
Predictive- tells you how well a certain measure can predict future behavior; one test is consistent with other tests.
Which kind of tests have more inter item consistency and why?
Homogenous tests because the items focus on one construct and sample a more narrow content area.
What are the two indexes of internal consistency?
Cronbach’s Alpha- aka coefficient alpha, The coefficient tells you the degree in which the items are interrelated. It should be greater than or equal to .70 for research purposes and somewhat greater in value for clinical testing purposes(single people). It provides an estimate of the average of all possible split-half correlations.
Split Half Reliability-splitting the test into 2 equivalent halves and then assessing the consistency of scores across the 2 halves of the test, specifically by correlating the scores from the 2 halves. Two ways; split down the middle (not recommended) or use odd-numbered items for one half of the test and even numbers for the other half. RA to 1 half or the other. Then, score ea 1/2, compute correlation between scores of two halves. Adjust the computer correlation using Spearman-Brown formula.
What are the differences between Classical Test Theory and IRT?
CTT-a theory about test scores that introduces 3 concepts-test score (often called the observed score), true score, and error score. It uses a common estimate of the measurement precision that is assumed to be equal for all individuals despite their attribute levels.
The longer the test, the more reliable.
IRT-item response theory, a general statistical theory about examinee item and test performance and how performance relates to the abilities that are measured by the items in the test.
A shorter test that is more reliable can be designed.
ABILITY LEVEL
IRT and discuss parameters and differences
Item difficulty is expressed in terms of trait level.
1 parameter-just item’s difficulty
2 parameter-Within this model, item discrimination and difficulty level
3 parameter-difficulty, discrimination, and guessing (likelihood of correctly answering item purely based on guessing
What is a cut score?
Selected point on the score scale of a test. The points are used to determine whether a particular test score is sufficient for some purpose. Cut scores should be based on a generally accepted methodology and reflected the judgement of qualified people.
Angoff Method
Ask experts to rate the probability that a minimally competent person will get each item correct. This can be used with tests that are not multiple choice. The cut score is computed from the expected scores fro the individual questions.