Module 4- Measurement Flashcards
How do we test Hypotheses?
- Empirically measure the variables
- quantify the variables; put numbers to them so we can measure it
Variable
- Any characteristic that can take on more than one value
- research is looking at the relationship between variables
Measurement
- quantification of the amount of some variable that is present
- assign numbers to indicate the amount of variable that is present
- can be arbitrary numbers
4 levels of measurement
- Nominal
- Ordinal
- Interval
- Ratio
Nominal Level of Measurement
- Numbers assigned are arbitrary and do not represent any underlying quantitative aspect of the variable
- mathematical property= property identity
ex. groupings of gender, colour, if they did smthg or not
Ordinal Level of Measurement
- ranking of the data but with no actual amount
- mathematical property= magnitude
- no assumption of equal intervals bw variables
ex. class scores ranked from 1-100, know who has more of the variable but don’t know the actual amount/ score
Interval Level of Measurement
- shows the actual amount of variable present
- assumes equal intervals bw scores/ variables
- mathematical property= magnitude and equal intervals
- DO NOT HAVE A TRUE ZERO POINT
- ex. cannot have a zero IQ or heartbeat
Ratio Level of Measurement
- mathematical properties= magnitude and equal intervals
- can have a true zero= absolute absence of the variable
ex. quiz would be a ratio measurement bc can get a zero and for it to be meaningful
level of measurement we choose for our operational definition will determine…
- how precise our measurement is
- what statistical analysis we conduct
Most precise levels of measurement are
Interval and Ratio
- allow for better statistical analysis
- most informative and precise measurement of the variable
want to operational define our variables at which level?
- highest level as possible bc we want the most precise measurement
(Interval or Ratio Scales) - we can take interval and ratio data and collapse it down to lower levels BUT we cannot take nominal data (low level ) and expand it up
2 dimensions of quality to asses if we are measuring our variables accurately/ how to evaluate measures
- Reliability
- Validity
Reliability
- concerned with measurement errors
- concerned with reproducibility
- give consistent and reproducible results
Validity
- concerned with to what extent do the scores represent the variable we intend to measure
- observed score reflect the intended construct
- are you measuring what you intended to measure
connection bw reliability and validity
- measure can be reliable and not valid
- reliability is a pre-req for validity BUT does not guarantee validity
Psychometrics
- focuses on judging and improving the reliability and validity of the psychological measures
no measure is perfect
- yes
- measure contains elements of the construct and error
True Score (T)
- error free score an individual would receive on a test if their was no measurement error
- individual score on a measure if no error
Data point or observed score can be represented by the following formula
X= T+ E
- X; Observed Score
- T; True Score
- E; Error
conceptual formal; don’t plug in numbers
- shows that observed score is comprised of true score of the construct and error
Random Error (ER)
- unpredictable and unsystematic
- does not impact all data points in the same way
- cancels out over a larger number of observations
- causes unreliability in measurements; unable to reproduce the observed score
Systematic Error
- non random and predictable
- does not cancel out over a large number of observations
- does impact the scores of the entire group
- contributes to invalidity bc no longer measuring the construct of interest
- skews data in a predictable way
2 types of systematic error
- Bias
- Error associated with measuring the wrong construct
Bias (EB)
- measurements can be biases bc of situations or devices (equipment, judges, raters…)
-ex. difference is test difficulty and difference in testing conditions. Bias bc the data will be skewed in a predictable way
Error
- difference bw the true score (actual amount) and the observed score (data)
How to deal with bias?
- calibrate devices and measure in the same way
- operational definitions are precise
- testing procedures are standardized
Error associated with measuring the wrong construct (EW)
- Measuring a theoretical construct other than the one we are interested in
- ex. when taking an exam, we might be measuring exam anxiety supposed to your understanding of the knowledge
conceptual formula including Ew and Er
X=T+Er+Ew
- note; assumed the researcher has taken care of bias so it is not included in the conceptual formula
Unreliability
- random error contributes to unreliability
- X=Er; measurement made of only random error and not measuring the construct
- random error is like meaningless noise than can’t be reproduced (like static)
- X=Er, not tapping into the true or wrong construct; measuring only random error
why can’t you reproduce random error?
- bc it is meaningless
Reliability conceptual formula
- X= T+ Ew
- no random error in reliable measures
- have Ew bc we could be measuring the construct of interest or another construct or a combo of both
- can reproduce the measure bc it is reliable
- ex. tune into a radio station and hear music. even though we are not sure if it is the station we want, we can reproduce the music again and again
Invalidity
- X= Ew
- measuring the wrong construct reliability
- measure is reliable (can reproduce) but not valid bc we are measuring the wrong construct
- ex. test of research methods in latin
- measuring the wrong construct of Latin proficiency but reliable because can get the same grade again
Validity conceptual formula
- X=T
- measuring the desired theoretical construct/ the true construct
- because we have high validity= have high reliability
- because we are confident we are measuring the true construct then we can test the theory and hyp
Real world reliability and validity
- no measure is going to be perfecting reliable, going to have some Er
- no measure is going to be perfectly valid, going to have some Ew
- problem; don’t know the actual amounts of T, Er and Ew in our observations therefore we have to estimate reliability and validity
Quantitative Index of Reliability (R)
- used to estimate reliability of measures
- largest when there is no random error; most reliable
- smallest when measure is entirely random error; no reliability
- reflects what proportion of X is meaningful and reproducible
Conceptual formula of R
R= T+Ew/ X
= T+Ew/ T+Ew+Er
- T and Ew are the reproducible and meaningful parts of the observation
Highest index of reliability
- Er=0
- R=1
- numerator will equal the denominator
Lowest Index of reliability
- X=Er
- R=0
- numerator is zero
- no reliable bc only made of random error
a good index of reliability ranges from
- 0-1
that is also the same range of the correlation coefficient. Therefore, correlation coefficients estimate the index of reliability
Internal Consistency
- extent to which individual items on a multi item scale tap into the same construct to produce similar scores
ex. want all items to measure depression - responses to each item should highly correlate if measuring the same construct
Cronbach’s Alpha
- also called Inter item correlation
- extent to which items correlate with eachother
- measure of internal consistency and assess reliability
- how closely related a set of items are
- reflection that participants are answering the items consistently and coherently
- ranges from 0-1
Cronbach alpha= 1
- highest internal consistency
- error free
maximal reliability - confident we are measuring one construct
Cronbach alpha=0
- lowest internal consistency
- measure is entirely error
- no reliability
Test- Re test Reliability
- Administering the same test to the same individuals at 2 different points in time and correlating the scores
- if measuring the same underlying construct, expect the 2 scores to correlate highly
- if do not correlate, have to question if measuring a consistent construct
when can test re test reliability be used?
- only used when measuring smthg that you would expect to stay consistent in the sample
Inter Rater Reliability
- consistency across researchers
- useful when data is subjective and involves multiple raters
- measures the consistency bw 2 or more raters in their assessments, judgements or ratings of a particular phenomenon or behaviour
Higher agreement in inter rater means
- higher correlation and high reliability
lower agreement in inter rater means
- lower correlation and lower reliability
if data is quantitative and continuous use..
Pearson r correlation coefficient
Face Validity
- subjective way of gauging validity
- determining if it makes sense or if the measure appears to tap into the construct
ex. measuring sleeping makes sense to tap into the construct of depression, but measuring how much someone likes peanut butter does not make sense to tap into the construct of depression
Content Validity
- how well a test measures all relevant aspects of the construct it is designed to measure
- verifying that all major domains or aspects of the construct are being targeted in the measure
ex. depression has emotional, behavioural and cognitive elements. high content validity would tap into all of these. low content validity would only measure one of them
done by consulting with experts and literature
Criterion Validity
- how well a measure correlates with a specific outcome (criterion)
- more objective and involves a quantitative index
- extent to which observations from this measure relate to an outcome or criterion
Ex. scores on the GRE and success in graduate school would be in the index of criterion validity
2 types of Criterion Validity
- Predictive Validity
- Concurrent Validity
Predictive Validity
- relation bw a measure of a construct and a future criterion
- how well a measure taken at one time predicts a criterion in the future
ex. how do GRE grades predict 1st year graduate studies marks
Concurrent Validity
- relation bw the measure of a construct and the outcome at the same time
- predicting current outcome
ex. administering GRE to graduate students and see how it correlates to current grades
Construct Validity
- Extent to which constructs we claim to be studying are the actual constructs we are measuring
- are we truly assessing the construct of interest?
Face validity, content validity and criterion validity overall establish
- Construct Validity
2 types of construct validity
- convergent
- divergent/ discriminant
Convergent Validity
- scores of one measure correlate/ converge highly with scores of another measure of the same construct
ex. GRE and SAT are 2 difference measures of the same construct (scholastic aptitude) ^ should correlate
ex. my depression scale scores should converge highly with old depression scales
Divergent Validity
- scores on measures do not correlate highly with scores of measures of other constructs
ex. measure of depression should not correlate highly with political views
want to ensure
- scores of different measures of the same construct should correlate highly
- scores of different measures of unrelated constructs should not correlate
Do we use multiple estimates of validity?
- yes
- use multiple estimates of validity to gain as much confidence as possible we are measuring the construct of interest