Exam 1 Flashcards
Standard Assessment
Compares others the same age
Administering & scoring is all the same
3 types of normative groups
- Random
- Representative
- Size
** make sure they are representative
What is the minimum size for a standardization sample
minimum 1000
(100 per age group)
What are 3 things to consider when planning a Diagnostic Evaluation?
- Context- (community based? or school with targeted population?)
- Purpose- (doing to ensure they seek further evaluation if they fail? or using to guide instruction levels?)
- Quality- (consider quality of your instrument, some are NOT very sensitive and you don’t want to give false info)
What are 2 things every [diagnostic] Evaluation Plan includes?
- Hearing screening
- Integration of multiple sources of data
-Historical: case study
-Standardized: tests
-Performance
How do you plan a Diagnostic Evaluation? (6 things)
Formulate: clinical question
Select: instruments
Collect: data
Integrate: all of data
Answer: clinical questions
Plan: what comes after (refer, intervention plan, etc.)
Normal Curve
“Bell Curve” probability distribution symmetric about mean
Standard Deviation
Measures how far score is from the mean
Standard Error of Measurement
SEM = SD/√1-r
(r = split-half reliability)
SEM: +/- 5 points
Which is then used to determine True Score
SEM= SD/ sq. root (1-r)
r = split-half reliability
Smaller SEM = low measurement error (SEM < 5 accepted)
Larger SEM = higher measurement error :(
SEM reflects point-range error (ex. SEM score 3.1 shows score +/- 3pts)
Confidence Intervals are based on SEM
Tell you how confident you can be a score falls within a certain range
Raw Score
Total # of items correct (includes items below the basal)
Standard Score
AVG= 85-115
SD=15
Mean=100
Child’s performance compared to defined peer group
(subtract mean from raw score and divide by standard deviation)
Composite Scores
Combination of multiple subtests measuring the same thing
A single number that combines multiple data points or variables into a single representation
Example: IQ tests - a composite measure that gives a single score based on a series of responses to various questions
Scaled Scores
Mean=10
SD= 3
+/-1 SD=7-13
a representation of the total number of correct answers (also known as raw scores) a candidate has provided that has been converted onto a consistent and standardized scale. Scaled scores indicate the same level of performance, regardless of which form a candidate has received.
T-scores
A type of standard score
A T-score is a scaled score that’s used to describe a person’s performance on a test or assessment. T-scores are often used in ability assessments and behavior rating scales. A T-score of 50 is considered average, and standard deviations are typically 10 points.
Age-equivalent
Average age child earned score; average age at which score occurs
Issues:
Qualitative- language complexity may not be accounted for
Clients should be compared to their peers rather than an age group
Equivalency: time and knowledge delays related to time (for example 1 year changes significance as a child gets older)
NOT a recommended measure
Grade-equivalent
Used for academic tests. Median, raw score for specific grade level
Percentile Rank
% of norm group that earned a raw score less than or equal to the test taker’s.
A percentile rank is the percentage of people who scored below a given T-score. For example, a T-score of 40 is roughly the 16th percentile, while a T-score of 70 is roughly the 98th percentile.
Confidence Interval
Range in which the child’s true score would be.
A good speech pathology report will include confidence intervals, which are a range of scores that are likely to contain the child’s true score. For example, a 90% confidence interval means that there’s a 90% chance that the child’s true score is within the range
Normative Group
Should be random and broad; not focused on one group
Features of Representativeness:
Age
Gender
Ethnicity
SES
Geography and location
Disability
Reliability
Refers to how much error is in the measure; looking for consistency across time
Reliability coefficient: r
Perfect relationship (r = 1)
No relationship (r = 0)
Good range (r = 0.80 or higher)
What are the 3 types of reliability
- Test-retest reliability
- Internal consistency
- Interexaminer reliability (inter-rater)
Test-retest Reliability
Looking to make sure scores stay relatively consistent between testing
Retesting should occur within a 2 week period (2 week being a maximum)
Reflected in SEM
Internal Consistency
Consistency of test content, how homogenous test items are
Assessed using split-half reliability coefficient
Represented by Cronbach’s alpha (low alpha= not consistent; high alpha= consistent)
Interexaminer Reliability (Inter-rater)
Consistency of the measure across examiners (results should be similar even if different examiners are administering the test)
Higher numbers= more related or similar scoring
0.90 is ideal; 0.80 is accepted (0.8-0.9 good) (1 is a perfect relationship)
Longer tests are more reliable
Meanings of Alpha
Low Alpha= not consistent
High Alpha= Consistent
Validity
Is the test measuring what it says its measuring?
Diagnostic accuracy
measures the presence or absence of a condition. Four things to consider when looking at the diagnostic accuracy are:
-Sensitivity
-Specificity
-Positive Predictive Value
-Negative Predictive Value
What are the 3 types of validity?
- Construct validity
- Content validity
- Criterion validity
Construct validity
Does the test measure what it’s supposed to measure?
The construct is the concept/theory that the test is trying to measure. a.
Ex 1: receptive & expressive language b.
Ex 2: vocabulary or pragmatics
*Convergent: has a high correlation to similar constructs
Relationship between two tests testing for similar things(ex. Relationship between two vocabulary tests)
*Divergent: has a low correlation to unrelated constructs
Example: Vocabulary test compared to a math test would have a low relationship/correlation
Types of Construct validity
Developmental studies
Ex. Older children should do better than younger children.
Contrasting groups
Ex. Those with a language disorder should do worse than those without one on an assessment.
Factor analysis
Ex. Measures which aspects of the construct are being measured.
Content validity
Does the test test for what it’s supposed to?
Does a vocab test include vocab words?
Measured through item analysis
What are 2 commonly used statistics for Content validity
- Item Difficulty:
How many children responded correctly to the test items
**0.3-0.7(Goal is 0.5) - Item Discrimination:
How similarly does the child respond to specific items on the test compared to the rest of the test (point by serial)
Criterion validity
Predictive:
How well does a particular test measure a trait in the future
How well does it predict (high correlation = highly predictable)
There’s no specific numbers that she’s looking for… but around this range .4 - .7
Concurrent:
How well is this test correlated to another test of the same trait or construct
high correlation(well established test) = high concurrent validity
What are 3 factors that affect validity?
Selection of appropriate measure:
Pick a good test that measures what you need to measure
Administration of the measure:
If you don’t follow what you’re supposed to do the test is no longer valid
Child:
Tired, hungry, upset, may not accurately perform and will lead to invalid results
Sensitivity and Specificity
These are important to an SLP when deciding what test measure to use.
Sensitivity: positive result when the child has the disorder
Specificity: negative result when the child does not have the disorder
A test that under-identifies has poor sensitivity & a test that over-identifies has poor specificity
Consider: is the test sensitive to who has the disorder? Does the test specify who does not have the disorder
Interpretations: 90% = good; 80-89% = fair; Below 80% = unacceptable
Positive & Negative Predictive Values
Positive: True positive/ False positive + False positive
Ex: If the test shows that the client has a language disorder, how likely is it that they actually have it?
Negative: True negative/True negative + False negative
Ex: If the test is negative, how likely is it that the client does not have a language disorder?
Both positive & negative predictive values have clinical relevance. They are useful in helping families interpret the information because the scores give clinicians a measured prediction. The values also tell us (the clinicians) how confident we can be in our predictions.
Ex: If the results show the child has a positive predictive value of 0.95, then you can be 95% sure the test shows that the child has a language disorder.
Internal consistency
Consistency of the test content; how homogeneous the items are; assessed using split-half reliability coefficient
α - Continuous 1, 2, 3, 4 etc
KR - Dichotomous 0 or 1 (right or wrong)
Alpha value is similar to a correlation coefficient
When Coefficient Alpha is lower than .80, lower reliability.
Cognitive Referencing
Compares “ability” to “potential”
Ability: performance on an achievement test or language test
Potential: Performance on an IQ test
NOT accurate/adequate