Study Cards Flashcards
What is the APGAR test? What does it measure?
Evaluates health of baby based on appearance, pulse, grimace, activity, respiration
Who is Alfred Binet? Why is he important?
French psychologist. Introduced the idea of intelligence testing
What is an operational definition?
The exact way a construct is measured, and what qualifies something as being in/out of a given category
What is an operational measure?
The exact way in which something is tested, and how it should always be tested (think procedure)
What is a normative group?
Aka reference group. The sample of the population used to attain a base/average score
What is a normal distribution? What is it used for?
A distribution that, when mapped out, forms a bell curve. Depicting the mean, median, and mode as equal.
Used as the assumption of the layout of datasets in a group
What are deviations?
The difference between the observed values and the mean
What was the first version of the Binet-Simon intelligence test? What did results show?
A group of children were asked to perform a series of tasks to asses the knowledge they have acquired
What were Binet’s original concerns with his intelligence test?
That it would be misused, and that children who were behind would be labeled “idiots” and unteachable.
What are some of Binet’s contributions to the natural and social sciences
- The development of scales of measurement
- The formal operationalization of constructs
- The development of non-verbal intelligence tests
- The proposal that intelligence is both acquired and innate
- The operationalization of terms and concepts
- The development of mental age
- The idea and use of normative groups
- Established the dominance of psychology in the field of testing
Who is Francis Galton? What did he contribute to psychology?
He was a psychologist who had a fascination with data collection and variability. He started the development of large scale data collection
What is the law of error? Is it 100% true?
In any group or set of measurements, the outliers tend to cancel each other out, forming a normal distribution. It is not always true, but used as an assumption of truth
What are distributions of error (deviations) and how do you calculate them?
A deviation shows how far, on a scale from -3 to +3, scores are away from the mean.
Observed score - mean = deviation
What are the first 3 principles of psychometrics?
- Defining and operationalizing is central to understanding if a claim is justifiable - always ask how a construct is measured and defined
- Variability exists everywhere - this is the essence of the law of error
- There is always a normative group - ask who is the sample and who created the sample
How does Anne Anastasi define a psychological test? Define the different aspects
An objective and standardized measure of a sample of behaviour
Objective: free of bias, clearly defined, little to no interpretation
Standardized: everyone gets the same test and is measured the same way
Sample of behaviour: This should be how they would act regularly, but the sample may not be representative
How does Lee Cronbach define psychological tests? How does it compare to the aspects of Anastasi’s definition?
Psychological tests are a systematic procedure for comparing the behaviour of two people.
Systematic vs standardized and objective: Cronbach recognized that tests cannot be 100% objective
What is psychometrics according to Thurstone? (2 parts)
A construction of instruments and procedures for measurement
The development and refinement of theoretical approaches to measurement
What is a construct? And how do they relate to the definition of psychometrics?
A construct is any idea or concept we’d like to measure
A. Constructing tests to measure these constructs
B. The methods and approaches must be refined when measuring these constructs
What are the 4th and 5th principles of psychometrics?
- Most (if not all) test questions, in any format, are imperfect indicators of the construct being measured
- Assigning numbers to data imposes a relationship among indicators that may not be justifiable
What does it mean to measure something? What are the 4 main scales of measurement?
The assigning of numbers to individual scores in a systematic way, according to one or another rule or convention
1. Ratio
2. Interval
3. Nominal
4. Ordinal
Explain the 4 main scales of measurement
Ratio: Equal intervals with a true zero
Interval: Equal intervals with NO true zero
Nominal: a categorical for, of organizing data
Ordinal: Determined rank or order, numbers have no value, intervals may be unequal
What is the 5th principle of psychometrics?
The leap of faith principle. By assigning numbers to data, you impose a relationship among indicators that might not be justifiable
What does a distribution measure in psychometrics?
The performance of the entire test
What are the 3 factors that ALWAYS affect variability?
Systematic effect, systematic bias, random effect
What is systematic effect?
It is the primary cause of the score. How much of the construct you have
What is systematic bias? Give an example
An affect that effects a subgroup. EX: a delayed train effects commuters
What is random effect? Give an example
Random factors that affect the score of an individual, but have no relationship to the construct. EX: poor sleep
What is the difference between a formal and an operational definition?
A formal definition defines the construct for what it is while an operational definition defines how it is measured
How does Plato’s allegory of the cave help us understand constructs?
It captures the challenges we face when measuring constructs that cannot be directly seen. The shadows and they symptoms are observed and interpretations must be made
What is Novak’s classical test theory?
A persons true score is different from their observed score (due to error)
How do you calculate true score (Novak)?
T= X +/- E
How does Galton’s law of error play in classical test theory?
Error is just as likely to be positive as negative
What is item response theory?
An attempt to directly estimate an individual’s ‘true score’ by examining how individuals respond to questions - as a function of their ability
What does an item response graph show?
Shows the minimum required ability to get an answer correct
What is scientific model testing?
The evaluation of different approaches to find which one best explains the data in that case
How does Ockham’s razor fit with scientific model testing?
When there are two theories that explain the data equally well, the most simple explanation is most often better
What is the definition of criterion validity?
Criterion validity is the correlating of scores with some external criterion that is relevant to the purpose of the test
What is scale validation?
The methods used to test validity
What are the features of scale validation according to Rulon?
- A test cannot be labeled as valid or invalid without respect to a given purpose
- Assessments of validity must include an assessment of the content of the instrument and its relation to the purpose
- Different forms of validity evidence are required for different types of instruments
- Some measures are obviously valid (face validity) and require no further study
What are the 4 main domains of validity?
- Content validity
- Structural validity
- External validity
- Item validity
What are the 3 types of content validity?
1.Domain representativeness
2. Domain relevance
3. Face validity
What is content validity?
The content represented by the construct
The degree to which a test measures all aspects of a criterion
What is domain representativeness?
The extent to which the questions/tasks/etc. measure the entire domain
What is domain relevance?
The extent to which the questions are relevant to assessing the construct
What is inclusionary criteria?
The signs and symptoms that MUST be present to have the construct
What is exclusionary criteria?
The signs and symptoms that CANNOT be present for the criteria
What type of validity includes inclusionary and exclusionary criteria? What is the interaction?
Domain relevance, these criteria are considered more important or more relevant
What is face validity?
Whether the test APPEARS to measure a given construct
What is structural validity?
The components that a test measures
What are the 2 components of structural validity?
- Dimensionality
- Order
What is dimensionality?
The number of factors the questions can be attributed to (pieces of the cake)
What is order?
The number of tiers that are needed to explain how the different factors are interrelated (layers of the cake)
What are the 4 factors of external validity
- Criterion validity
- Convergent and divergent validity
- Predictive validity
- Incremental validity
What is external validity?
The manner to which test scores are related to other constructs
What is criterion validity?
The extent to which test scores on questionnaire are related to some other outcome or condition
What is convergent validity?
The degree to which it a measure is correlated with other measures
What is divergent validity?
The degree to which a measure does not correlate with other measures
Explain the relationship chart of convergent and divergent validity?
Should converge: r>0.70 - good convergent validity, r<0.30 - poor convergent validity
Should diverge: r>0.70 - poor divergent validity, r<0.30 - good divergent validity
Anything in between is mild, and depends the theory.
What does a multi-trait multi-method matrix show?
It shows the correlates of different traits and how well they converge to measure the same construct
How do you read the multi-trait multi-matrix table?
The traits are listed down the side and along the top, grouped by test (method), and shows the correlation coefficient in the cross section of each individual trait
What are the factors of predictive validity? Define them
Concurrent (predicts a criterion measured at the same time) and prospective (predicts a criterion observed in the future) validity
What is incremental validity?
The degree to which a new (additional) measure adds the prediction of a criterion - beyond what can be predicted by some other measure
What are closed format tests?
Tests that have preset answers that cannot be changed or elaborated
What does it mean to have a dichotomous response?
The answer can only be yes or no
What is a likert scale response style?
A range of replies (typically from strongly agree to strongly disagree) in which a person rates how much they agree with a statement
What does it mean if a test response is rank-ordered?
The subject must rank each statement (example: most important - least)
What are open format tests?
The questions do not have predetermined responses, allowing for elaboration
What are open ended questions?
Questions that allow the participants to come up with their own responses
What is a visual-analogue response style?
When the respondents rate their level of a construct on a continuous scale
What are anchors? - give an example
They are statements that help specify what each number refers to in the real world
1. Rarely or never -
What is standard deviation?
The variability within a group - differences in individual scores
What is standard error?
Variability across distributions - differences between groups
What is estimated true score?
How ability and probability of correctness correlate
What is the mean - and the equation for it?
Mean: the average
Mean = the sum of the population scores / the number of scores
μ = ΣN/N
What is the equation for standard deviation?
Stand. Dev = the square root of the sum of scores - mean squared / number of scores
σ = √ (x-μ)^2 / N
What is variance - and the equation for it
The differences in scores
Variance - sum of (scores-mean) squared / total number of scores
σ2 = Σ(x-μ)^2 / N OR σ2 = σ^2
What is the line of best fit?
A line through a scatter plot that minimizes discrepancy between observed and predicted scores
Measures the degree of mis-fit between scores
What is a predicted score? How do you calculate it?
An estimated score for future tests
Regression = y intercept + slope * X
Y= aX+b OR Y= b0 + b1*X
What is effect size? How do you calculate it
The magnitude of differences between groups
Effect size - mean of group 1 - mean of group 2 / standard deviation
D = (x̄1 - x̄2) / s
What is sensitivity? How do you calculate it?
Of the people who actually have the condition, how many were designated to have it
Sensitivity - A / (A+C)
What is specificity? How do you calculate it?
Of the people who don’t actually have the condition, how many were designated not to have it
Specificity = D / (B+D)
What is positive predictive value? How do you calculate it?
Of the positive results, how many actually have the condition
PPV = A / (A+B)
What is negative predictive value? How do you calculate it?
Of the negative results, how many really don’t have the condition
NPV = D / (C+D)
What is base rate?
The guaranteed rate of prevalence in a population
What is a self report test?
A test completed by someone who reports their own experiences
What kind of test is the BDI? Key features
The beck depression index is a self report test that measures depression
A unidimensional test, the use of cutoff scores indicates a discrete condition, any combination of items can be used to designate the presence of depression
What is an informant based test?
A test completed on behalf of someone else
What are projective tests?
Tests that measure SUBCONSCIOUS impulses, emotions, difficulties, etc
What are objective tests? Why were they created?
Tests that use standardized measures that allow little to no interpretation
Created to account for the limits of projective tests
What is the RORS?
Projective testin which the patient interprets inkblots
What is an aptitude test?
A test designed to measure individual aptitudes, attitudes, preferences, etc
What is the MBTI? Key features?
The Meyers Briggs is a self report measure of psychological preferences in how people see the world and make decisions
Measures innate aptitudes that are either mental or physical
What are structured tests?
Tests in which the questions and structure are predetermined, no changes or follow up can be made
What are semi structured tests?
Tests in which the procedure and questions are predetermined but the doctor is able to add in and take out questions up to their discretion
What is the SCID? Key features?
The structured clinical interview for DSM is a semi structured test that helps clinicians assess the presence or absence of psychiatric symptoms to render formal diagnoses
It is semi structured, allowing for follow up and the adding/removing of questions
What is information variance
The way in which questions are asked and how tests are presented changes the amount of information that comes out of a test
What is criterion variance?
How a doctor interprets the information to make conclusions that can result in changes between scores
What are personality tests?
Tests designed to asses personality characteristics
What is the NEO PI-R? Key features?
A test that measures the degree of OCEAN
- openness, conscientiousness, extraversion, agreeableness, neuroticism
Uses a likert scale for questions, multidimensional- assesses each personality characteristic based on multiple smaller factors
What is OCEAN in the NEO PI-R?
Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism
What is the MMPI? Key features?
Minnesota Multiphastic Personality Index. Dsigned to address existing concerns on existing self-report measures, that assesses psychopathology and personality in a clinical setting, prioritizing criterion validity over face validity
What is the act frequency approach?
A measure of how behaviour and personality traits correlate
What is the BAI? Key features?
The Behavioural Acts Inventory. Designed to measure actions and behaviours to identify the correlates with personality
What are normative tests?
Tests designed to measure quantitative personality characteristics, comparing them to patterns of normality
What are the WAIS and WISC
Intelligence tests for adults (WAIS) and children (WISC) which evaluates intelligence and cognitive ability
What are achievement tests?
Tests that measure developed skills or knowledge
What is the GRE? Key features?
Graduate Record Examination that measures the acquired knowledge of students
Evaluates verbal reasoning, quantitative reasoning, analytical writing, critical thinking, and knowledge
What makes a test reliable?
When it produces the same score continuously over time
What does reliability measure?
How close our observed score approaches the true score
What is the expected score?
An estimate of true score
How do you calculate true score?
(E)rror*(x)observed=estimate of True
What is the ‘fast move’ of classical test theory?
If error is uncorrelated with test scores, then error from two different tests is also uncorrelated, meaning errors from one test will be uncorrelated with the True Score of another test
What are the 5 types of reliability?
Test-retest, Inter-rater, Parallel forms, Split half, Internal consistency
What is test-retest reliability?
The ability for a test to produce consistent scores from one time to another
What is inter-observer reliability?
The degree to which different observers give consistent estimates of the same construct
What is parallel forms reliability?
The consistency of two separate but similar tests
What is split half reliability?
The consistency between two halves of the same test
What is internal consistency (reliability)?
The consistency of the results across items of a test
How do you estimate reliability?
By comparing two different groups of items
What are the ways you can estimate reliability? Explain
Within a single test - one part vs another part
Across multiple test - test 1 vs test 2
What is used to measure internal consistency?
Cronbach’s alpha (a) and Cohen’s Kappa (k)
How do you calculate Cohen’s kappa (k)?
(Observed agreement - chance agreement)/(1-chance agreement)
How do you calculate chance agreement?
[probability of ‘yes’ from DR.a/probability of ‘yes’ from DR.b] X [probability of ‘no’ from DR.a / probability of ‘no’ from DR.b]
How do you calculate observed agreement?
(‘Yes’ from both + ‘No’ from both) / N
What is item analysis?
The analysis of how each individual item on a test performs
What assumptions are made when calculating true score?
That T = the average score on a test if taken repeatedly, that error is random and independent
What score would be ‘excellent’ for reliability?
a > 0.9
What score would be ‘good’ for reliability?
0.9 > a > 0.8
What score would be ‘acceptable’ for reliability?
0.8 > a > 0.7
What score would be ‘questionable’ for reliability?
0.7 > a > 0.6
What score would be ‘poor’ for reliability?
0.6 > a > 0.5
What score would be ‘unacceptable’ for reliability?
0.5 > a
What is item analysis?
The analysis of how each individual item performs and the correlation of individual items with the total score
What is item analysis used for?
To determine which items are the best measurement of a construct
What is item total correlation?
An assessment of total score - the cumulative degree of agreement for a construct
How do we calculate item total correlation?
Each individual score is averaged (across a ‘group’) for an item total. Each average item total is added and averaged for a total score. This average item agreement is plotted with the total score to find r (total, item)
What is distinctiveness in item analysis?
When a items are more highly correlated with one factor than the others
What is the item response model?
The probability of choosing an option correlated with the level of a construct required to choose a given option
How can the item response model be explained?
The amount of knowledge you need to get an answer right.
What are the features of item response curves?
Discriminability, difficulty, precision
What is discriminability in item response?
The slope. The point at which changes are easily observed
When is discriminability better and worse?
Better: steep slopes
Worse: flattened regions
What is difficulty in item response?
How much of the construct is needed before you choose that option (answer the question correctly)
How do you observed the difficulty?
Using the 0.5 threshold. The point on the x-axis at which the curve is at 0.5
What is more difficult and less difficult in item response?
More: when the slope is very shallow for a while, or it begins further down the x-axis
Less: when the slope begins early on the d-axis and/or is very steep right away
What is precision in item response?
An estimate of your level of ability
How do you determine precision in item response?
Using the area under the curve. The space between -2 to 2 (95%)
What does the 95% precision tell us?
Based on the option picked, we can infer with 95% certainty that their severity level falls within the 95% of the area under the curve
In what ways can you describe a curve in item analysis?
Is it flat? Sharp?
Where is the peak (most common area)
Does one curve override another?
Is a curve high for too long?
What is principle components analysis (PCA)?
The examination of the degree to which individual items are related to one or more underlying dimensions of variation (factors)
What are the goals of PCA?
Variable reduction
Structural analysis
Why do we use PCA?
To reduce the redundancy in tests and see if the same construct can be better explained by a short form test
What is a factor pattern matrix?
A visual representation of the relation of items to the factor(s) on a test
What is the example of a factor pattern matrix that we have seen in class?
The red and blue squares of the NEO PI-R
Using a factor pattern matrix, how do we know if the items are good indicators of the factor(s)?
Strong blue squares
Using eigenvalues
What are Eigenvalues?
Numbers that show the proportion of variance that each factor contributes
What is a good eigenvalue?
Any above 1
When creating a short form, how do we know what eigenvalues to get rid of?
The ones under 1 or where the curve goes flat, the smallest correlation, if an item correlates to multiple factors,
How do we observe incremental validity?
By comparing two measures - an existing and a new -to a gold standard
How can incremental validity be represented?
Graphically, through models
What are the sections of a graphical representation of incremental validity?
Just the gold standard, measure 1, or measure 2
The single overlap: GS-M1, GS-M2, M1-M2
The total overlap
What is model testing in terms of incremental validity?
The ability to create a predicted score on the gold standard, based on observations on the other two+ measures
Based on model testing, how do we know if a test has incremental validity?
If adding this scale to the calculation of predicted score on the GS closes the gap between the predicted and observed score, there is incremental validity
What is the theory about adding measures when model testing for incremental validity?
The more tests you add, the closer you SHOULD be to the observed score on the GS
How do you calculate predicted score when model testing for incremental validity?
SSE= ΣN(y-ŷ)^2
Sum of squares of error = sum of (observed -predicted scores) squared
What are the two prediction models when model testing for incremental validity? Explain
- Benchmark - the existing tests vs the GS
- (Existing test + new test) vs GS- does adding your test contribute anything
How can model testing be represented by the line of best fit?
Data points are the observed scores on the GS
Each measure has its line of best fit
The space between a point and the line shows the discrepancy between observed and predicted scores
How can you make your measure look better in model testing? Why?
Compare it to a poor benchmark
If the benchmark does a poor job when compared to the GS, it will make your scale look better
What are the outcomes of model testing?
- Both measures have incremental utility, one is not better than the other - retain both
- One measure has more incremental utility than the other - keep the better measure
- The measures do not contribute uniquely - choose one
- The measures have completely unique proportions of variation - retain both
How would you write a comparison of two tests?
USE- GS:HRSD, M1: CESD, M2:BDI
The CESD accounts for variance in the HRSD above and beyond the variance accounted for by the BDI
What is confirmatory factor analysis?
The examining of the structure of questionnaires and decision of what model best fits the data
What is used for confirmatory factor analysis?
Structural equation models
What is a structural equation model?
The imposition of a model on the data to evaluate fit
What is a latent variable?
The factors of a construct that cannot be directly observed, they are inferred using related questions
What are the observed indicators?
The questions
What are factor loadings in structural equation models?
Values that show how the latent variables relate to each other, and how the questions relate to the variables
In a visual SEM, what are the different parts?
Latent variables - circles - factors
Factor loadings - top r score - correlations
Error - bottom r scores
What is the saturated model of SEM?
Explanatory model in which EVERYTHING is related
The benchmark
What is the independence model in SEM?
A model in which none of the variables are correlated
For the saturated and independence models of SEM, r=what?
Saturated: r = 1
Null: r = 0
Other: 1>r>0
How does dimensionality factor into SEM?
Models can be uni-factoral and multi-factorial
What is a uni-factorial model in SEM?
Only one latent variable (circle)
What is a multi-factorial model in SEM?
Multiple latent variables (circles)
What is a nested model in SEM?
A model within another
How do you calculate the fit of a model in SEM?
By comparing the discrepancy between predicted and observed values to find which pattern of correlations is actually close to what has been observed
What are the 3 big psychometric wrongdoings?
- Creating a test that does not account for the behaviours of the target population
- Not having enough items
- Not using a test how it was intended
What is an example of no accounting for the behaviour of the target population?
Teenscreen - used to screen teens for those at risk of suicide, but the at risk ones typically don’t show up
Why are 1 item tests not a good measure?
Responses might be wrong, there is nothing else to verify
What is an example of not using a test how it was intended?
Using the WISC to identify children that are gifted