Chapter 5 Flashcards
Measurement
is the assignment of numbers to indicate different values of a variable.
Measures
are specific techniques or instruments used for measurement.
Evaluation
is the procedures for collecting information and using information to make decisions. Evaluation involves measurement, but may also involve sampling, design, and literature in the process of coming to a decision.
The purpose of measurement is _________________________________.
to obtain information about the variables that are being studied. It provides a systematic procedure for recording observations, performance, or other responses of subjects
Operational Definitions
Definitions of variables that specify how they were measured
Assessment
- assess is to measure
2. an assessment is synonymous with an evaluation
Nominal (Classificatory) Scale
A set of mutually exclusive categories with no order implied.
Ordinal Scale
A set of rank-ordered categories.
Interval Scale
Equal intervals between numbers
Ratio Scale
Numbers expressed as ratios
What is the following an example of?
The operators equal and not equal are allowed.
example of nominal scale
What is the following an example of?
The operators greater than, less than, equal, and not equal (and all their combinations) are allowed.
example of ordinal scale
What is the following an example of?
Fahrenheit and Celsius
example of interval scale
What is the following an example of?
Length and Kelvin temperature
example of ratio scale
Statistics
Mathematical procedures used to summarize and analyze data.
Descriptive Statistics
Indices that summarize characteristics of sample data.
Statistics
describe sample data
Parameters
describe population data
Frequency Distribution
Data organized into scores and the frequency with which each score occurred
Frequency Polygon
A line graph with frequency along the y-axis and score along the x-axis.
Histogram
A bar graph with the same characteristics as a frequency polygon
Normal Distribution
A frequency distribution with a characteristic bell shape. It is symmetrical.
Skewed Distribution
A non-symmetric distribution.
positively skewed
Occurs when most of the scores are at the low end of the distribution
negatively skewed
Occurs when most of the scores are at the high end of the distribution
Outliers
Atypical scores that are either extremely high or low
Histogram
Histograms use bars to represent frequency. The bars touch, and histograms are used for interval and ratio independent variables.
Measures of Central Tendency
Statistics that indicate the average or typical score in a distribution.
Mode
The most frequently occurring score or scores.
Median
The score (or potential score) at the 50 percentile. It splits the distribution with respect to frequency.
Mean
The sum of the scores divided by their number.
Measures of Variability
These measures tell us how spread out a distribution is.
Range
The largest score minus the smallest score.
Standard Deviation
The square root of the sum of squared deviations of scores from the mean, divided by their number. Also, the root mean squared deviation.
Variance
The standard deviation squared
Percentile Rank
The percentage of scores falling at or below a given score.
Correlation
A measure of the relationship between two variables
Scatterplot
A two dimensional graphic representation of the relationship between two variables. The variable represent each dimension, and each point represents an individual’s score on the X and Y variable.
Pearson Product Moment Correlation
The most common correlation coefficient, it detects linear relationships between variables.
validity
The overall evaluation of the extent to which theory and empirical evidence support interpretations that are implied in given uses of the scores
Older Definition: the degree to which an instrument measures what it says it measures or purports to measure
bar charts
Bar charts also use bars to represent frequency, but the bars do not touch. Bar charts are used for nominal and ordinal independent variables
Evidence Based on Test Content
The specified domain should be sampled completely, but the test should not sample outside the specified domain.
Determined by expert judges.
Evidence Based on Internal Structure
Items’ relations to each other, subtest scores, and total test scores should reflect the relations posited by theory and intended uses.
Determined by statistical analysis.
Evidence Based on Relations to Other Variables
Relationships with other variables reflect relationships predict by theory and intended uses
Construct Validity
The extent to which relationship predicted by the psychological theory of the construct are reflected in relationship between the measure and other measures
Convergent Validity
The measure is significantly correlated with other measures that it theoretically should be correlated with (e.g., another measure of the same construct)
Divergent Validity
The measure is not significantly correlated with other measures that it theoretically should not be correlated with (e.g., IQ and big toe size)
Criterion Validity
The measure is correlated with another measure (the criterion).
Concurrent Validity
The two measures are collected at the same point in time.
Predictive Validity
The criterion measure is collected at a later point in time than the predictor measure.
Sources of Validity Evidence
Evidence Based on Response Processes
Data are collected from participants, often using think-aloud protocols, to determine participants’ mental processes during answering items on the test
Sources of Validity Evidence
Evidence Based on Consequences of Testing
Data is provided showing that the measure results in consequences for the test-taker that can be supported based upon empirical evidence
___________ can be performed to help establish validity of measures to be used.
Pilot studies
Reliability
is the extent to which participant and/or rater scores are free from error.
Reliability coefficients (and validity coefficient, for that matter), vary between ______ and ______
0.00 and 1.00
Equivalence (a form of reliability)
Two alternate forms of the same test (constructed by randomly choosing items from the same universe of items) are administered to the same group of individuals at approximately the same time.
Equivalence and Stability
Two alternative forms of the same test are administered to the same group of individuals at two different times.
Internal Consistency
A measure of the consistency of items on a test in measuring a single construct.
Split Half
Most commonly the test is split into odd and even items. These subtest scores are correlated and corrected using the Spearman-Brown Prophecy formula
Kuder-Richardson
Used with right vs. wrong items. Two formulas exist, KR-20 and KR-21. It is equivalent to the average of all possible split half reliabilities. KR-21 assumes equal item difficulty, while KR-20 does not assume equal item difficult
Cronbach’s Alpha
Also equal to the average of all possible split half reliabilities, but does not require dichotomous scoring (e.g., can be used with Likert scale items)
Percent Agreement
Percentage of cases on which the two observers or measures agree.
Cohen’s Kappa
Percentage agreement corrected for what would be expected by chance
A measure cannot be valid if it is not _______.
reliable
Test Length
Longer measures are more reliable than shorter measures, all other things being equal
Participant Heterogeneity
More heterogeneous scores will have higher reliability (this is sometimes referred to as the problem of restriction of range)
The Nature of the Domain
Some things are more difficult to measure reliably than others (e.g., academic achievement measures usually have higher reliability than personality measures)
Standardization of Data Collection
The more standardized the data collection methods, the higher the reliability will be