Ch 5 Flashcards
Alternate forms reliability
Assessment of reliability by administering two different forms of the same measure to the same individuals at two points in time
Construct validity
The degree to which a measurement device accurately measures the theoretical construct it is designed to measure
Content validity
extent to which a measure represents all facets of a given construct. For example, a depression scale may lack content validity if it only assesses the affective dimension of depression but fails to take into account the behavioral dimension
Convergent validity
The construct validity of a measure is assessed by examining the extent to which scores on the measure are related to scores on other measure of the same construct or similar constructs
say you were researching depression in college students. In order to measure depression (the construct), you use two measurements: a survey and participant observation. If the scores from your two measurements are close enough (i.e. they converge), this demonstrates that they are measuring the same construct.
Cronbachs alpha
An indicator of internal consistency reliability assessed by examining the average correlation of each identity (question) in a measure with every other question
Discriminate validity
The construct validity of a measure is assessed by examining the extent to which scores on the measure are not related to scores on conceptually unrelated measures
Face validity
The degree to which a measurement device appears to accurately measure a variable
Internal consistency reliability
Reliability assessed with data correlated at one point in time with multiple measures of a psychological construct
A measure is reliable when the multiple measures provide similar results
Interrater reliability
An inficator of reliability that examines the agreement of observations made by two or more raters (judges)
Interval scale
A scale of measurement in which the intervals between numbers in the scale are all equal in size
Idem total correlation
The correlation between scores on individual items with the total score on all items of a measure
Measurement error
The degree to which a measurement deviates from the true score value
Nominal scale
A scale of measurement with two or more categories that have no numerical (less than, greater than) properties
Ordinal scale
A scale of measurement in which the measurement categories form a rank order along a continuum
Pearson product moment correlation coefficient
A type of correlation coefficient used with interval and ratio scale date. In addition to providing information on the strength of relationship between two variables, it indicates the direction (positive or negative) of the relationship
Predictive validity
The construct validity of a measure is assessed by examining the ability of the measure to predict a future behavior
Ratio scale
A scale of measurement in which there is an absolute zero point, indicating an absence of the variable being measured. An implementation is that ratios of numbers on the scale can be formed (generally, these are physical measures such as weight or times measures such as duration or reaction time)
Reactivity
A problem of measurement in which the measure changes the behavior being observed
Reliability
The degree to which a measure is consistent
Split half reliability
A reliability coefficient determined by the correlation between scores on half of the items on a measure with scores on the other half of a measure
Test retest reliability
A reliability coefficient determined by the correlation between scores on a measure given at one time with scores on the same measure given at a later time
True score
An individuals actual score on a variable being measured l, as opposed to the score the individual obtained on the measure itself
How do you measure reliability
Through true score and measurement error
When is reliability most likely achieved
When researchers use careful measurement procedures (like through training)
Making multiple measures (ex-on a personality test; it will have 10 or more questions designed to access a trait)
-Reliability is increased when number of ideas increase
How to asses reliability
Use Pearson product moment correlation coefficient to calculate correlation coefficients
The closer a correlation is to +1 or -1, the ______
Stronger the relationship.
Using the Pearson correlation coeffienct, a measure is reliability when….
Two scores are very similar
What is reliability correlation called when using Pearson product moment correlation coefficient ?
Reliability coefficient
What would it be an example of if a test of intelligence was measured to a group of people one day and again a week later
Test retest reliability
We can use correlation coefficients showing that two scores are similar
I hat should reliability coefficient be if it is reliable
At least .80
What is a drawback from test retest
Correlation might be artificially high because individuals remember how they respond the first time
How to avoid problem of test retest correlation being artificially high
Alternate forms reliability
Drawbacks of alternate forms reliability
Creating a second equivalent measure may require considerable time and effort
Psychological measures are made up of a number of different questions called…
Items
An indicator of internal consistency
Split half reliability
Term used to correlate reliability in split half reliability
Spearman brown split half reliability coefficient
Two ways to measure internal consistency reliability
Split half reliability
Cronbachs alpha
How to perform cronbachs alpha
Scores on each idem are correlated with scores on every other item.
A large number of correlation coefficients are produced
Average all these correlation coefficients
Why are item total scores informative
They provide info about each individual item
Items that do not correlate with the total score on the measure are actually measuring a different variable. They can be eliminated to increase internal consistency reliability
When is it useful to use item total correlations
When it’s necessary to construct a brief version of a measure.
Even though reliability increases with longer measures, a shorter version can be more convenient to administer and still retain acceptable reliability
Commonly used indicator of interrater reliability
Cohens kappa
Interrater reliability is used when
Making observations on people’s behavior to see if everyone agrees
Problem with reliability
Although it tells us about measurement error, it does not tell us about whether we have a good measure of the variable of interest
What refers to the adequacy if operational definition of variables
(To what extent does the operational definition of a variable actually reflect the true theoretical meaning of the variable?)
Construct validity
Concurrent validity
extent to which the results of a particular test or measurement correspond to those of a previously established measurement for the same construct.
Farinheight scale would be an example of
Interval measurement
Likert scale
A rating scale often found in survey totems that measures how someone feels about something
Strongly agree to strongly disagree
Likert scale is an example of an
Interval scale
Internal and external validity relects..
Weather or not results of a study are trustworthy or meaningful
Ratio scale
There’s an absolute 0
Ex-100 is twice more than 50 because 0 dollars is flat broke
Scores on test (when one can miss answers)
Reaction times
Physical measurements
Gender and undergraduate major are examples of …
Nominal (categorical)
Grades and level of education are examples of
Ordinal scales
On the discuss scale..comparing level of disgust with other personality characteristics would be an example of
Convergent validity
Filler idems
Put on test and surveys that are not calculated into results because they don’t deal with what is actually being measured
Disgust scale predicting differ by fears is an example of
Predictive validity
What Belmont principle may be an issue with naturalistic observation
Informed consent
How to help lessen demand characteristics
Filler idems
Advantages of repeated measures/within subject design
Greater statistical sensitivity by reducing random error
Fewer participants needed
Disadvantages of repeated measure design
Order effects
How to alleviate order effects
Counterbalancing