Experimenal Validity & Measurement Flashcards
Review of last class:
Internal Validity
External Validity
Probabilistic Knowledge
Maturation
Testing Effects
Statistical Regression to the mean
Selection of participants
Hawthorn Effect
What is criterion referenced? (2)
Individual’s performance compared to some absolute level of performance set by a researcher
E.g. set a minimum score, set performance expectations, have age of acquisition expectation
What is the difference between Validity and Reliability?
Reliability refers to the consistency of a measure (whether the results can be reproduced under the same conditions).
Validity refers to the accuracy of a measure (whether the results really do represent what they are supposed to measure)
What is norm referenced? (2)
Individual performance compared to grp norms
Standardized when raw score standardized in some way, smoothed out to ‘normal curve’ (e.g., z, T, percentiles)
What is Measurement Reliability?
Stability, consistency of the measurement
What are the four types of test reliability?
Intrasubject reliability
Intrarater reliability
Interrater reliability
Test Retest reliablity
What is Intrarater reliability?
Consistency of the data recorded by one rater over several trials
What is Intrasubject reliability?
The reproducibility of the identical responses (answers) to a variety of stimuli (items) by a single subject in two or more trials.
What is interrater reliability?
The degree of agreement among independent observers who rate, code, or assess the same phenomenon.
Reliability is reported as (3)
correlation, standard error of measurement (SEM) or % agreement
What is validity? (2)
The degree to which an instrument measures what it is intended to measure
How appropriate, meaningful and useful are inferenced drawn from the measure
What does high validity imply? (3)
A measurement is relatively free from error.
A valid test is also reliable.
A reliable measure is not necessarily valid.
What are the 3 types of Validity?
Construct validity
Content validity
Criterion-related validity
What is construct validity? (3)
Construct validity refers to whether a test measures an abstract construct adequately. An example is a measurement of the human brain, such as intelligence, level of emotion, proficiency or ability.
What are the methods for establishing construct validity? (2)
Known Groups Method
Factor analysis
Define known groups method: (2)
- One of the methods of establishing construct validity
- Degree to which an instrument can demonstrate different scores for groups known to vary on the variable being measured.
e.g., An intelligence test should differentiate individuals with DS and TD
e.g., A measure of functional independence
Should decrease with increasing age in seniors
Should be related to the level of care needed
Should be related to severity of impairment
e.g., A measure of functional hearing
Should decrease with hearing level (dB)
Should decrease in older adults
Define Factor Analysis:
A construct is made up of a variety of dimensions
- Each dimension can be assessed using a variety of tasks
- If we measure a series of variables/items associated with a construct
Some variables would be highly correlated
Some would have little correlation
Performance on variables that are similar would cluster together
The array of clusters define the construct
What is Content Validity?
Refers to the extent to which a measure represents all facets of a given construct.
What do we mean that content validity ‘‘Indicates how adequately/fully an instrument samples the variable being measured.’’? (2)
Samples all aspects of the construct
Reflects the relative importance of each part
What do we mean by ‘‘Establishing content validity is essentially a subjective process
‘’? (3)
- No statistic measures content validity
- Content validity is established through expert opinion, review of literature, operational definitions of the test variables
- Specific to the stated objectives
Explain Face Validity: (3)
a weak type of content validity
A subjective assessment that an instrument appears to test what it is supposed to test and usually not quantified
Attempt to quantify: # of raters who assess the test as having face validity
What is Criterion-Related validity?
Extent to which one measure is related to other measures or outcomes.
Explain Criterion-Related Validity: (4)
- Most objective measure of validity
- Often assessed by correlating performance on the measure of interest and the criterion measure
- Only useful if the criterion measure is stable and valid
- The target and the criterion measurements must be measured independently and without bias (e.g. blinded)
What are the two types of criterion-related validity?
Concurrent Validity
Predictive Validity
Describe Concurrent Validity: (2)
- One of the two types of Criterion-Related Validity
- Two measures are collected at relatively the same time and the performance on each is related
e.g., # words produced in a spontaneous speech sample and PPVT-V scores
Describe Predictive Validity: (2)
- One of the two types of Criterion-Related Validity
- The measure of interest is collected earlier in time and related to a criterion measure collected later
e.g., does a parent report measure of vocabulary at 2 years predict PPVT-V scores at 4 years
A test with good validity must be able to show two things; What are they?
Convergent validity
Discriminate validity
What is Divergent validity?
Low correlation with tests that measure different constructs
What is Convergent validity?
High correlations with tests that measure the same construct
What are threats to Measurement Reliability and Validity?
Ambiguous, unclear, inconsistent instructions
Observer Bias
Reactivity
Floor and ceiling effects (Validity only)
How can Observer bias affects reliability and validity? (2)
Confirmatory bias e.g., Identification of hypernasality before/after pharyngeal flap surgery in cleft-palate patient
Carryover effects e.g., first scoring affects the second scoring
How can Observer bias reliability and validity? (2)
Confirmatory bias e.g., Identification of hypernasality before/after pharyngeal flap surgery in cleft-palate patient
Carryover effects e.g., first scoring affects the second scoring
How can reactivity affect validity and reliability?
Influences that distort the measurement e.g., participant’s awareness of measurement.
How can floor and ceiling effects affect validity? (2)
Decreases the variability of a measure
Particularly difficult when measuring change over time
What is sampling? (4)
When a Sample is drawn from a target population
Practical difficulties in accessing a target population
Therefore, we select from an accessible population
Voluntary Participation
In order to generalize a target population, sample must be: (2)
Representative: same relevant characteristics
In the same proportions
What is sampling bias?
Introduced when certain characteristics in the sample are over- or under-represented relative to the target population
What is sampling bias?
Introduced when certain characteristics in the sample are over- or under-represented relative to the target population
Explain conscious sampling bias: (2)
- Purposeful selection
- Strategic limiting of population of interest
e.g., Election poll: Voters
e.g., High-functioning autism - Limits generalization in predictable ways
- Essentially saying population of interest is a subset of total population - only generalize to the subset
Explain uconscious sampling bias: (3)
- It is a problem
- It is Unplanned and unpredictable
e.g., Election poll: how reach - land lines, cell phone, internet
e.g., People who respond to polls - Limits generalization in unpredictable ways
How can we limit unconscious bias?
Probability Sampling
Explain Probability Sampling: (2)
Better control for unconscious bias than non-probability
Randomized selection procedures used
Why do we use randomized selection procedures? (4)
limit unconscious sampling bias
Assures that every member of the population has an equal chance of being chosen
Outcomes are more generalizable
Controls selection bias and therefore sampling error
BUT, no sampling is error free, therefore does not guarantee representativeness
Any sampling error assumed to be due to chance
Theoretically possible but participation always a choice for ethical reasons
When do we use Non-Probability sampling?
- When probability sampling is not possible
- Randomized selection procedures are NOT used
(Often because of population access difficulties (i.e., clinical populations)) - Sampling bias not controlled
(Therefore cannot assume that the sample represents the characteristics of the larger population
Limits generalizability)
What are 4 types of probability sampling techniques?
Simple random
Systematic
Stratified
Cluster
Explain Simple random sampling: (2)
- Each member of population has equal chance of being selected and selection of each is independent
- Simple random selection, without replacement
e.g., all clinicians in a membership list
Use a random number generator
www.random.org
Randomly select a start point and a direction of movement, sample consecutively up to required number (e.g., n = 15)
Explain Systematic probability sampling:
Less laborious, more convenient
Applies to ordered lists e.g., alphabetized list of Kindergarteners in a school board
Randomly select a start point
Sample using a predetermined sampling interval e.g., every 8th one
A problem is the list is ‘ordered’ in some significant way – ensure is randomly ordered or ordered based on irrelevant factor
Explain Stratisfied random probability sampling:
Used to insure to ensure sample has same proportion of subgroups in as in population or to get adequate numbers of subgroups
Improves the representativeness of the sample and precision of outcomes
Based on knowledge of variations of a characteristic in the population
e.g., ASD population: 4:1 male to female ratio
Partition the population into non-overlapping strata (levels, subsets)
e.g., males and females
Randomly sample in proportion to the distribution in the population
Note: Choose the stratification variable carefully: relevance to the study
Explain Disproportional probability sampling:
- Type of Stratisfied
- If a population subset of interest occurs infrequently enough to threaten statistical power, can sample disproportionately
e.g., Population = 6000 SLPs in Canada (CIHI)
Male to female ratio = 1:30; but may be interested in having male responses represented
If use simple random sampling, few males would be selected; n = 100, 3.3 males
Instead, select 50 males and 50 females
Then statistically weight the male scores to represent their proportional distribution in the larger population
Explain Disproportional probability sampling:
- Type of Stratisfied
- If a population subset of interest occurs infrequently enough to threaten statistical power, can sample disproportionately
e.g., Population = 6000 SLPs in Canada (CIHI)
Male to female ratio = 1:30; but may be interested in having male responses represented
If use simple random sampling, few males would be selected; n = 100, 3.3 males
Instead, select 50 males and 50 females
Then statistically weight the male scores to represent their proportional distribution in the larger population
Explain Cluster random sampling:
Using naturally occurring groups as sampling units
Often a population is too large or dispersed to obtain a complete listing of possible participants
e.g., Population: children in elementary schools in NS
Cluster or multi-stage sampling method
Randomly sample ‘families of school’ (clusters) e.g., schools
Randomly sample students within schools
What are three types of Nonprobability sampling?
Convenience
Snowball
Purposive
Explain Convenience sampling:
- Chosen as they become available
Volunteers
Self-selection introduces bias i.e., why did they volunteer?
Explain Quota sampling: (2)
- Convenience sample with restrictions
- Controls for potential confounds from known population characteristics
e.g., male: female ratio in ASD
Explain Purposive nonprobability sampling:
- Hand pick subjects on the basis of certain characteristics
e.g., Chart review, participation in an intervention - Used in qualitative research
Explain Snowball non-probability sampling:
Snowball sampling
e.g., word of mouth
How do we assign subjects to groups? (4)
- Random
By individual
By block (e.g., different blocks for severity level) - Systematic
- Consecutive
- Matched
What are the two main types of statistics?
Descriptive
Inferential
What are the levels of measurements?
Nominal (Names with no order)
Ordinal ( Ranked order)
Interval ( Equal Intervals with no true 0)
Ratio (Intervals with a true 0)
Discrete data refer to (2)
Nominal and Ordinal
Continuous data refer to (2)
Interval & Ratio
Continuous data refer to (2)
Interval & Ratio
What is a frequency distribution?
Number of times each value occurs in a the data set
What are the three measures of central tendency?
Mean
Median
Mode
Which type of scales would the mean use?
Interval or ratio
Which type of scales would the median use? (2)
Ordinal data
Interval or ratio if distribution not normal
Which type of scales would the median use? (2)
Nominal data
Ordinal data
Explain Skewness and measures of central tendency
Explain Skewness and measures of central tendency
What are the 3 measures of variablity?
Range
Variance
Standard Deviation
What is the normal curve?
A bell shape probability curve that tells you the probability distribution of a data set
What is the standard error of Measurement?
Estimate of how far an individual’s score is from the ‘true’ score or how it would vary with repeated measurement
Determined from how much ‘error’ is there in a measure, its reliability
What is the Standard Error of the Means?
Estimate of how far from the population mean a given sample’s mean is.
Based on notion of ‘mean of means’ (the average of a number of samples from a population if you collected multiple samples from 1 population)
Varies with sample size – larger sample, smaller SEM
Estimated from sample standard deviation and sample size
Used in calculation of statistical tests
What are confidence intervals?
- Calculated from Standard Error of the Means (SEM)
- Range in which you are confident, at a specific level, that the true population mean lies
What are confidence intervals?
- Calculated from Standard Error of the Means (SEM)
- Range in which you are confident, at a specific level, that the true population mean lies
Why do we use inferential statistics?
- Used to ‘infer’ sample results to population
- Determine if results (e.g., difference between groups) are ‘significant’
- Probability
(No finding is absolute
How likely the results consistent with there being a difference; Level of confidence that finding was ‘real’ and not due to chance
Across studies, replication important to confirm)
What are the steps in hypothesis testing? (6)
State null and alternative hypotheses
Set alpha level
Gather data
Perform statistical test
Compare calculated to critical value
Make decision
Explain what occurs in the first step of Hypothesis testing:
- Ho = Null Hypothesis
What you’re trying to reject/disprove
Expresses no difference or no relationship between the independent and dependent variables
Ho: μ1 = μ2
Also called the statistical hypothesis - Alternative to the Null
State that there is a relationship
Can be Directional or Non-directional
H1 : μ1 ≠ μ2
H1: μ1 > μ2 or H2 : μ1 < μ2
Also called the substantive hypothesis
What the researcher is predicting
Tested against the null hypothesis
Explain what occurs in the Second step of Hypothesis testing:
Also called significance level, probability level or confidence level
Conventionally set at p = .05
Preliminary/exploratory research may set p = .10
Should make adjustments to keep study-wide error at p = .05
How do we make correction/adjustments of alpha level for multiple tests? (4)
- Adjustment to alpha level to compensate for running multiple tests
- Controlling for Type 1 Error (saying is a difference when there isn’t)
- Keep experiment (family)-wide error rate at .05 by adjusting alpha level for each analysis
What is the simplest and most conservative way to make correction/adjustments of alpha level for multiple tests? (4)
- Bonferroni correction is the simplest and very conservative
.05/number of tests run
E.g., if run 3 comparisons, p = .05/3 = .016
What are the 3 determinants of probability of reaching significance?
- Sample size
- Between group differences
- With-in group differences (variance)
Explain what occurs in the Third step of Hypothesis testing:
Discussed sampling earlier, measurement etc
Explain what occurs in the Fourth step of Hypothesis testing:
- Use sample data to get calculated value
- Critical you use an appropriate statistical test
- Different tests give you different statistics (e.g., t, F, U, r, etc.)
Explain what occurs in the Fifth step of Hypothesis testing:
- Computer software will give you exact p value
- Are set values for cut between significant and not for each test statistic (i.e., t, F, etc.) for given alpha level and degrees of freedom (based on ‘n’ and number of groups done)
- Look for test statistic to be greater than or equal to critical value
Larger t, F etc. gives smaller p
Explain what occurs in the Fifth step of Hypothesis testing:
- Computer software will give you exact p value
- Are set values for cut between significant and not for each test statistic (i.e., t, F, etc.) for given alpha level and degrees of freedom (based on ‘n’ and number of groups done)
- Look for test statistic to be greater than or equal to critical value
Larger t, F etc. gives smaller p
Explain what occurs in the Last step of Hypothesis testing:
If p ≤ ‘alpha level’ (usually .05), you reject the Null Hypothesis
So accept there is a difference
Look at group means to see direction
2-tailed vs. 1-tailed tests
What are the two categories TYPES of errors?
Type 1
Type 2
What are type one errors?
False positives : accepting the alternative hypothesis while H0 is true
What are type one errors?
False positives : accepting the alternative hypothesis while H0 is true
What are type two errors?
False negatives: Do not reject H0 when H1 is true
What is statistical power?
Statistical test’s ability in a particular study to detect a difference
What occurs is Power is too low?
More chances to get Type ll
What is Power affected by? (4)
- Sample size (bigger sample, more power)
- Between group differences - size of difference between groups: effect size (larger difference, more power)
- Within group differences, amount of variance (smaller variance, more power)
- Alpha level (more liberal alpha, more power)
What is statistical significance?
a measure of reliability or stability of the difference
What is Practical significance?
The size of the difference
What is Effect size?
-Measure of size of difference
- A priori – researcher determines what would be an important difference
used in power estimates for planning study
- Calculated – computed from study data
- Unaffected by N
What is cohen’s d?
- Standard deviations separating group means, overlap of groups
- Small =.2, medium = .5, large = .8
What are Eta squared (ɳ2 ) & omega squared (ω2)?
- Measures of percentage of the variance accounted for by the independent variable
- Small < 06. medium .06 -.15, large > .15
What are two other types of significance?
- Clinical Significance
What does the difference between the groups mean
Importance/value of the difference
Some use interchangeably with practical significance
Done at the group level - Personal Significance
Distinction suggested by Bothe & Richardson, 2011
An individual client’s sense of value of change for her/him
When is the term of statistical trend used?
- Used when p value approaches significance
- People vary on acceptance of this type of reporting
- p = .05 is arbitrary, a convention
- Is a 6% probability of getting a difference at least as big unimportant/uninteresting??
- Typically call a ‘trend’ or ‘approaching significance’ when p between .06 – .10