Final Flashcards
What are the things that exist in the center of a normal curve?
Mean, median and mode
What does an inflection point on a normal curve mark?
A standard deviation from the mean
The distributions of most continuous random variables will follow the shape of the ____
The distributions of most continuous random variables will follow the shape of the normal curve
What does the empirical rule state?
- 68% of all values fall within 1 standard deviation of the mean
- 95% of all values fall within 2 standard deviation of the mean
- 99.7% of all values fall within 3 standard deviations of the mean
What are the 3 major types of central tendency?
Mean, median, and mode
____ refers to the measure used to determine the center of a distribution of data.
Central tendency refers to the measure used to determine the center of a distribution of data.
What is central tendency used for?
It is used to find a single score that is most representative of an entire data set
What is a data set with 2 modes called?
Bi-modal
A data set with more than one mode can be described as ___
Multi-modal
____is mostly used to represent the central tendency, but sometimes outliers can interfere with its usage
Mean is mostly used to represent the central tendency, but sometimes outliers can interfere with its usage
What is an outlier?
A value that is very different from the other data in the data set
What is a variable?
A property that can take on many values
What are the two kind of variables?
Quantitative variables and qualitative/categorical variables
What is a quantitative variable and what kind you do with it?
Variables measured numerically. With quantitative variables, can do things like add and subtract, multiply and divide, and get a meaningful result
_____ allow for classification based on some characteristic
*Qualitative/ categorical variables allow for classification based on some characteristic
Whta is a discrete variable?
A quantitative variable with a finite number of values. Ex: the amount of even numbers on a dice
What is a continuous variable?
A quantitative variable with an infinite number of values Ex: temp
What is an independent variable?
Any variable that is being manipulated
What is a dependent variable?
Any variable that is being measured
What are the four data types of measured variables?
- Nominal
- Ordinal
- Interval
- Ratio
_____ data (also known as qualitative/categorical data) is data that is split into categories (dichotomous)
Nominal data (also known as qualitative/categorical data) is data that is split into categories (dichotomous)
____ data is data where order matters, but distance between values does not
Ordinal data is data where order matters, but distance between values does not
____ data is where order matters, and distances between values are qual and meaningful, but there is no natural zero present
Interval data is where order matters, and distances between values are qual and meaningful, but there is no natural zero present
____ data is data where order matters, distances between values are equal and meaningful, and a natural zero is present
Ratio data is data where order matters, distances between values are equal and meaningful, and a natural zero is present
___ is best for numeric symmetrically distributed data
Mean is best for numeric symmetrically distributed data
___ is best for numeric non-symmetrically distributed data
Median is best for numeric non-symmetrically distributed data
What level of measurement is dichotomous?
Nominal
Gender is a ___ level of measurement
Gender is a nominal level of measurement
Time is a ___ level of measurement
Time is a ratio level of measurement
Age is a ___ level of measurement
Age is a ratio level of measurement
What is the simple confidence interval?
A range of values that we are confident contains the population parameter
What is point estimate?
A single value that represents the best estimate of the population value
In a confidence interval, the width concerns the ___ of the estimate
In a confidence interval, the width concerns the precision of the estimate
The point estimate is always in the ___ of the confidence interval
The point estimate is always in the middle of the confidence interval
What is the formal definition of a confidence interval?
If we repeated sampling an infinite number of times, 95% of the intervals would overlap the true mean
Not every value in a CI, is equally as ___
Not every value in a CI, is equally as probable
A more narrow confidence interval means that it is ____ precise
A more narrow confidence interval means that it is more precise
What are the factors that can narrow/increase a confidence interval?
- Larger sample size
- Less variance
- Lower selected level of
confidence (90% vs. 95%)
The null hypothesis is ___. And it states that _____
The null hypothesis is a sampling error. And it states that the population means(not sample means) are equal so the difference seen is not real
The alternative hypothesis states that the difference seen, represents __.
The alternative hypothesis states that the difference seen, represents a real difference.
What is a type 1 error in hypothesis testing? What is its symbol? This is considered a liar
When the null hypothesis is true, and we choose to reject it.
Symbol: “Alpha”
What is a type 2 error in hypothesis testing? What is its symbol? This is considered to be blind
When the null hypothesis is false, and we do not reject it. (accept it)
Symbol: Beta
___ is the maximum probability of type 1 error that a researcher is willing to accept
Alpha is the maximum probability of type 1 error that a researcher is willing to accept
When does the researcher set the alpha?
Set before running statistics
What is alpha usually set to?
0.05. (5%)
What is the simple definition of a p-value?
The probability of type 1 error if the null hypothesis is true
True or false.
You can have a probability of type 1 error what the null hypothesis is false
False
You can NOT have a probability of type 1 error what the null hypothesis is false
When is the p-value calculated?
After research
What is the formal definition of a p-value?
Probability of observing a value more extreme than actual value observed, if the null hypothesis is true
If the p-value is less than or equal to alpha, we ___ the null hypothesis
If the p-value is less than or equal to alpha, we REJECT the null hypothesis
If the p-value is greater than or equal to alpha, we ___ the null hypothesis
If the p-value is greater than or equal to alpha, we ACCEPT the null hypothesis
If we “fail to reject” (accept) Ho, we attribute any
observed difference to ____ only
If we “fail to reject” (accept) Ho, we attribute any
observed difference to sampling error only
We don’t interpret non-significant differences as “__”
maybe not even as “trends”
• We don’t interpret non-significant differences as “real” (maybe not even as “trends”)
We understand that a non-significant difference is
attributable only to __.
We understand that a non-significant difference is
attributable only to chance.
How do you use confidence intervals for hypothesis testing?
Look at the 95% CI of the mean difference, and evaluate whether or not it includes zero
If the confidence interval includes 0, it is ____ in hypothesis testing
If the confidence interval includes 0, it is nonsignificant in hypothesis testing
If the confidence interval excludes 0, it is ____ in hypothesis testing
If the confidence interval excludes 0, it is significant in hypothesis testing
What is the benefit of a CI over a p-value when hypothesis testing?
CIs give an estimate of effect size
P-values and CIs tells us about ___ not ____
P-values and CIs tells us about statistical significance not clinical significance
What is statistical power?
The probability of finding a statistically significant difference if such a difference exists in the real world
What are the main things that affect the statistical power of a study?
- Alpha
- Effect size
- Variance
- Sample size
Increasing alpha will ___ power
Increasing alpha will increase power
An effect size is known as the ____
An effect size is known as the mean difference
What is standardized effect size?
The mean difference divided by the variance
__ is the spread of scores
Variance is the spread of scores
Increasing the effect size will ___the power
Increasing the effect size will increase the power
Increasing the sample size will ___the power
Increasing the sample size will increase the power
___ is the best way to increase statistical power
Sample size is the best way to increase statistical power
Increasing variance will ___ power
Increasing variance will decrease power
What are the things that will decrease power?
- Decreased alpha
- Decreased effect size
- Increased variance
- Decreased sample size
What are the two types of power analysis?
- Power a priori
- Power post-hoc
What is power a priori?
A power analysis done before we collect data, to determine if the design is powerful enough
What is power post-hoc?
Power analysis done after the research is complete by the consumers to find if there was enough power/ if they failed to reject the null hypothesis
If a difference is found post-hoc/the null hypothesis was rejected, then the power issue is ___
If a difference is found post-hoc/the null hypothesis was accepted/fail to reject, then the power issue is moot/not a problem
If a difference not is found post-hoc/the null hypothesis was accepted/fail to reject, then the power issue is ___ and you have to do a ___
If a difference not is found post-hoc/the null hypothesis was accepted/fail to reject, then the power issue is huge and you have to do a post-hoc analysis
A priori is used to figure out how many subjects to use ___
A priori is used to figure out how many subjects to use before a study is started
What is the minimal accepted power during power a priori?
0.8
What are the 2 ways to determine a post doc analysis?
- Compute with traditional cohen approach
2. Determine with confidence interval analysis of effect size
What is involved in computing the post doc analysis with the traditional approach?
• Continuous scale result: 0.0 – 1.0 ( > 0.8 is default) • Based on: • Sample size • Alpha • Variance (observed) • Effect size (use MCID, not observed)
____ is the better way to determine the post hoc analysis, while with ____, the answer will probably be the same as a priori
Determine with confidence interval analysis of effect size is the better way to determine the post hoc analysis, while with compute with traditional cohen approach, the answer will probably be the same as a priori
If the MCID is excluded from the CI, then it is definitively negative and ___ powered
If the MCID is excluded from the CI, then it is definitively negative and adequately powered
If the MCID is included from the CI, then it is not definitive and ___ powered
If the MCID is included from the CI, then it is not definitive and inadequately powered/ underpowered
A two tailed testis testing to see ____
A two tailed testis testing to see if your calculated value is either above or below where it is expected to be
A one tailed test is testing to see if ____ or ___
A one tailed test is testing to see if your calculated value is above where it’s expected to be or below where it is expected to be
___ is the assumption you’re beginning with and is opposite of what you’re testing
Null hypothesis(H0) is the assumption you’re beginning with and is opposite of what you’re testing
___ is the claim you’re testing
Alternating hypothesis is the claim you’re testing
What is a t-statistical test?
Statistical method to decide whether an observed difference in sample scores represents a “real” difference in the population…. vs. just sampling error
How many groups are in a t-test?
2 groups
2 groups is another way of saying…?
2 levels of 1 IV
What does a t-test do?
Finds the difference between group means divided by the variability within the groups( standard error of the mean difference)
The error in a standard error refers to…?
All sources of variability within a set of data
that cannot be explained by the independent variable.
A within group variability with no variability is known as being ___ ?
A within group variability with no variability is known as being definitely different ?
A within group variability with little bit of variability is known as ___ ?
A within group variability with little bit of variability is known as probably different
A within group variability with larger amounts of variability is known as ___
A within group variability with larger amounts of variability is known as maybe not different
When the variability between groups are not necessarily the same, it is called…?
When the variability between groups are not necessarily the same, it is called a differing variance
What is a parametric statistics?
A branch of statistics which assumes that sample data comes from a population that follows a probability distribution based on a fixed set of parameters.
What are the basic assumptions for all parametric test?
- Samples are randomly drawn from populations
- Population is normally distributed
- Homogeneity of variance (roughly)
- Data from ratio or interval (i.e. continuous) scales
What rarely happens, but one still needs to be careful with when samples are randomly drawn from populations?
Generalization
What are the ways to test if the population is normally distributed?
- Statistically
- Graphically
- Common sense
When is the homogeneity of variance especially important?
With unequal group sizes
How is the homogeneity of variance tested?
Statistically
What statistical test is used for the t-test?
Levene’s test
What are the statistical hypotheses for the null hypothesis for a two-level design?
- The two population means are equal
- The hypothesis can be in a nondirectional format (not equal)
- Directional format (one is greater than the other)
A two-tailed test uses a ___ hypothesis
A two-tailed test uses a nondirectional hypothesis
A one-tailed test uses a ___ hypothesis
A one-tailed test uses a directional hypothesis
A two tailed test has ___ statistical power compared to the one tailed test
A two tailed test has less statistical power compared to the one tailed test
What are the two types of t-test?
- Independent/unpaired t-test
- Paired t-test
What happens in an unpaired(independent) t-test?
Testing to see if there is a difference between 2 groups
What kind of design is found in an unpaired t-test?
- Pretest-posttest design (compare change scores)
- Posttest only design
What happens in a paired(dependent) t-test?
Testing to see if there is a difference between conditions in the same person
What kind of design is found in a paired t-test?
- Difference scores or pretest-posttest
- Repeated measures design
A repeated measures factor is an example of a ___
A repeated measures factor is an example of a within-subjects factor
A non-repeated measures factor is an example of a ____ factor
A non-repeated measures factor is an example of a between-subjects factor
What is an ANOVA?
Statistical method to decide whether an observed difference in sample scores represents a “real” difference in the population…. vs. just sampling error, but with 3 or more groups/levels of 1 IV and or 2 or more IVs
What is the question asked in an ANOVA?
Are observed differences in whole set of means greater than would be expected by chance alone?
What statistic is looked at for ANOVA?
An f- statistic
What is an F-statistic?
The between group variability divided by the within group variability
What is the null hypothesis in the ANOVA?
All of the population means are even
What is the alternative hypothesis in the ANOVA?
At least one pair of samples is significantly different, but we don’t know which one
What are the basic assumptions for ANOVA?
- Samples are randomly drawn from populations
- Population is normally distributed
- Homogeneity of variance (roughly)
- Data from ratio or interval (i.e. continuous) scales
What does one need to be careful with when randomly drawing samples from the population?
Generalization
How can the normal distribution of a population be tested?
- Statistically
- Graphically
- Common sense
When is the homogeneity of variance especially important?
When there is an unequal group size
How is the homogeneity of variance usually tested?
Statistically
The types of ANOVA concern what…?
- Whether they are one way (1 IV) or multiple ways
- Whether the IV are between subjects(independent groups) or within subjects (repeated measure) or a mixed model
What is a mixed model?
Where there is 1 IV that is between subject and 1 IV that is within subjects
What are the types of ANOVA?
- One way ANOVA: independent samples
- Two way ANOVA: independent samples
- One way ANOVA: Repeated measures samples
- Two way ANOVA: Repeated measures samples
What is the characteristic of a one way ANOVA: independent variable?
1 IV with 3 or more levels
What does the result of an ANOVA show?
Whether or not there is a difference overall, but not where the difference is
What is the characteristic of a two way ANOVA: independent variable?
2 or more IV
What are the things you’re interested in when performing a two way ANOVA: independent variable?
- Main effect of IV A
- Main effect of IV B
- Main effect of IV A & B (interaction effect)
What is the interaction effect?
Saying that the scores across one of the IV depends on the levels of the other IV
It is really helpful to look at ____ when talking about interaction effects
It is really helpful to look at graphs when talking about interaction effects
What does it mean when the lines of an interaction effect graph are parallel?
There is no interaction
What does it mean when the lines of an interaction effect graph are not parallel?
There is an interaction
What is a disordinal interaction?
When the lines cross and significant main effects cannot be interpreted
What is an ordinal interaction?
When the lines don’t cross and significant main effects can be interpreted
The one way ANOVA: Repeated measures samples is more powerful that the independent ANOVA because ___
The one way ANOVA: Repeated measures samples is more powerful that the independent ANOVA because it has less error variance
What is the homogeneity of variance in the one way ANOVA: Repeated measures samples?
Sphericity
What is sphericity?
The homogeneity of variance of differences
How is sphericity tested?
Test with Mauchly’s Test of Sphericity
What is a non-significant finding of sphericity mean?
No difference in variance
If sphericity assumption is failed, what happens?
Use correction/adjusted p-value
What is a multiple comparison test used for?
To determine where the difference is
The multiple comparison test is also called the ____
The multiple comparison test is also called the pairwise comparisons
What are the different strategies of performing a multiple comparison test?
- Post-hoc
2. Planned comparison
When is a post-hoc performed?
Performed after ANOVA
___ multiple comparison strategy is the most common
Post-hoc multiple comparison strategy is the most common
The post hoc test ___ and therefore are exploratory
The post hoc test every difference and therefore are exploratory
When is a planned comparison performed?
Performed instead of ANOVA (a priori)
What does a planned comparison focus on?
Focused only on specific comparisons
How do you calculate the family wise type 1 error rate that is used for the one way ANOVA?
Add up all the alpha values
When the family wise type 1 error rate is too high, what do you do?
A Bonferroni Correction can be done
How is a Bonferroni Correction done?
Divide alpha by the number of statistical tests to be performed and use that for each post hoc test
What is the downside to the Bonferroni Correction?
Because it has less power and a higher chance of a type 1 error, must balance risk of Type 1 and Type 2 error
What are the types of post hoc test to perform in the order of least conservative/most likely to find a significant difference?
- Fisher’s least significant difference
- Duncan multiple range test
- Newman-Keuls method
- Tukey’s honestly significance difference
- Bonferroni t-test
- Scheffe’s comparison
What are the post-hoc test that are performed the most?
- Fisher’s least significant difference
- Tukey’s honestly significance difference
- Bonferroni t-test
What is the Fisher’s least significant difference test?
Essentially and unadjusted t-test (LSD)
Why is the Tukey’s honestly significance difference important?
“Middle of the road” in
terms of risk and most
commonly used
What does the Bonferroni t-test do?
Simply divides α by # of
comparisons
When is the Fisher’s least significant difference test, Tukey’s honestly significance difference important, and Bonferroni t-test suitable for use?
When an independent groups type test is being performed
What are the multiple comparison test to be used for repeated measures?
- LSD
- SIdak
- Bonferoni correction
LSD is an _____
LSD is an unadjusted paired t-test
Sidak is ___
Sidak is adjusted, but good balance of type 1 & type 2 error protection
The LSD test has a high risk of ___, type 1 error meaning it is less conservative
The LSD test has a high risk of high, type 1 error meaning it is less conservative
The bonferoni correction test has a high risk of ___ error and is more conservative
The bonferoni correction test has a high risk of type 2 error and is more conservative
What is an ANCOVA?
(Analysis of covariance) is a statistical technique that is used when you cannot control a variable through research design and sampling
What does the ANCOVA do?
It statistically adjust the dependent variable based on the covariate
ANCOVA produces ____
ANCOVA produces adjusted means
ANCOVA is a combination of ___ and _____
ANCOVA is a combination of ANOVA and linear regression
What are the assumptions of ANCOVA?
- Usual parametric assumptions
- Linear relationship between CoV and DV (with r>.6)
- Homogeneity of slopes
You can also use ANCOVA to adjust for ____ scores
You can also use ANCOVA to adjust for baseline scores
When do you do a non-parametric test?
When the basic assumptions for a parametric test are not met
Non- parametric statistics are based on…?
- Comparisons of ranks of scores
* Comparisons of counts(yes/no) or “signs” of score
Non- parametric statistics are ___ compared to parametric statistics
Non- parametric statistics are less powerful compared to parametric statistics
What kind of parametric test do you perform when you have 2 independent groups?
Unpaired t-test
What kind of parametric test do you perform when you have 2 related scores?
Paired t-test
What kind of parametric test do you perform when you have 3 or more independent groups?
One-way analysis of variance (ANOVA) (F)
What kind of parametric test do you perform when you have 3 or more related scores?
One-way repeated measures analysis of variance (MANOVA)
What kind of non-parametric test do you perform when you have 2 independent groups?
Mann-Whitney U test
What kind of non-parametric test do you perform when you have 2 related scores?
- Sign test
- Wilcoxon signed ranks test (T)
What kind of non-parametric test do you perform when you have 3 or more independent groups?
- Kruskal-Wallis analysis of variance by ranks (H or x^2)
What kind of non-parametric test do you perform when you have 3 or more related scores?
Friedman two way analysis of variance by ranks
True or False
You’re able to perform a non-parametric test on complex designs like a 2 x 3
FALSE
Unable to perform on more complex designs (e.g. 2x3)
What question is being asked in the comparison based on ranks in a non-parametric t-test?
Is the difference in ranks larger than would be expected by chance alone?
What question is being asked in the comparison based on signs in a non-parametric t-test?
Is the difference in sign frequencies larger than would be expected by chance alone?
What type of test do we use when the IV and DV are both on the nominal level?
Chi- Square
What are you looking at in a chi-square?
Are observed frequencies different than expected frequencies
What are the 2 types of chi square?
- Goodness of fit
* Tests of independence (association)
What do you do in the goodness of fit chi square test?
• Compare observed frequencies of 1 variable to uniform frequencies of another
What is an example of the goodness of fit chi square test?
• Eg: flip coin 50 times. Get 15 heads & 35 tails. Is this difference due to chance or a “real” bias?
____ chi square test is much more common?
Tests of independence (association)
What do you do in the tests of independence (association) chi square test?
Compare observed frequencies from 1 variable to observed frequencies of another variable
What is an example of the tests of independence (association) chi square test?
Eg: Is owning a mac laptop related to gender?
What is the McNemar test?
Requirement of chi-square is that variable levels must be independent (e.g. can’t be “healed” and “unhealed”)
___ is the form of a chi square test that is used for 2x2 with correlated sample
McNemar test* is the form of a chi square test that is used for 2x2 with correlated sample
What is a phi coefficient?
A correlation coefficient for 2 nominal variables/ degrees of association for 2x2
The phi coefficient is based off the ___
The phi coefficient is based off the chi-square test
What is the IV level of measurement for a t- test?
Nominal
What is the IV level of measurement for an ANOVA?
Nominal
What is the IV level of measurement for a non parametric test?
Nominal
What is the DV level of measurement for a t- test?
Continuous
What is the DV level of measurement for an ANOVA?
Continuous
What is the DV level of measurement for a non parametric test?
Ordinal
What is the question asked with a t-test?
Difference between means?
What is the question asked with an ANOVA?
Difference between means?
What is the question asked with a non parametric test?
Ranks different?
What is the IV level of measurement for a correlation?
Continuous
What is the IV level of measurement for a regression?
Continuous
What is the DV level of measurement for a correlation?
Continuous
What is the DV level of measurement for a regression?
Continuous
What is the question asked with a correlation?
Strength of association?
What is the question asked with a regression?
Strength of prediction?
What does a correlation have to do with?
A pair of scores and how much they co-vary
What does it mean for something to co-vary?
Directly or inversely proportional. When one is high, so is the other and vice versa
What are the things that a correlation looks at?
- Do they vary together (covary)?
- How strong is their linear relationship?
- What is the nature of the relationship?
A correlation has to be ___
A correlation has to be linear
What is a correlation coefficient?
A number that quantifies the strength of a linear relationship that can range from -1 to 1
What does it mean when a correlation coefficient is closer to 1, whether positive or negative?
Closer to |1.00|, higher strength of relationship
What does the sign of the correlation coefficient indicate?
The direction
The tighter the grouping of the linear relationship, the ___ the correlation coefficient
The tighter the grouping of the linear relationship, the higher the correlation coefficient
What does a 0.00- 0.25 coefficient correlation mean?
Little or no relationship
What does a 0.26- 0.50 coefficient correlation mean?
Fair relationship
What does a 0.51- 0.75 coefficient correlation mean?
Moderate to good
What does a 0.75- 1.00 coefficient correlation mean?
Good to excellent
What is the coefficient of determination?
The square of the correlation coefficient
What is the coefficient of determination equal to?
The percent of variance in one variable that is explained (or accounted for) by the other variable
What is the significance of the coefficient correlation?
To test the null hypothesis
What is the null hypothesis as it relates to the coefficient correlation?
The correlation between variable x and variable y is not significantly different from zero.
Coefficient correlation is very sensitive to ___
Coefficient correlation is very sensitive to * sample size*
What is the most common type of correlation coefficient?
Pearson Product-Moment Correlation Coefficient (r)
When is the Pearson Product-Moment Correlation Coefficient applicable?
When both variables continuous (Interval or Ratio scale)
What is the Spearman Rank (rho) Correlation Coefficient (rs)?
Non-parametric analog of Pearson r
When is the Spearman Rank (rho) Correlation Coefficient (rs) applicable?
When 1 continuous, 1 ordinal variable or 2 ordinal variables
When do you use a Point Biserial Correlation (rpb)?
When one variable is dichotomous, and the other variable continuous (interval or ratio)
When does a Point Biserial Correlation (rpb) not work?
dichotomous nominal (e.g Age & Race)
Computationally, a Point Biserial Correlation (rpb) is the same as a ___
Computationally, a Point Biserial Correlation (rpb) is the same as a Pearson’s r
The results of a Point Biserial Correlation (rpb) is the same as ___
The results of a Point Biserial Correlation (rpb) is the same as a t-test
When do you use a Rank Biserial Correlation (rrb)?
When one variable is dichotomous (nominal), and the other variable is ordinal
A Rank Biserial Correlation (rrb) is computationally about the same as ___
A Rank Biserial Correlation (rrb) is computationally about the same as Spearman Rank
When do you use a Phi coefficient (Φ)?
When both variables dichotomous
A Phi coefficient (Φ) is computationally same as ___ (special case)
A Phi coefficient (Φ) is computationally same as Pearson’s r (special case)
A scatterplot is ___ with a Phi coefficient (Φ)
A scatterplot is worthless with a Phi coefficient (Φ)
Can a Phi coefficient (Φ) work with a non- dichotomous nominal?
NO
A Phi coefficient (Φ) is similar to a ____, but unlike it, a Phi coefficient (Φ) gives gives strength of relationship, while the ___ only gives statistical significance
A Phi coefficient (Φ) is similar to a chi square test, but unlike it, a Phi coefficient (Φ) gives gives strength of relationship, while the chi-square test only gives statistical significance
A correlation does not tell you ___
Does NOT assess differences or agreement
How can an extreme outlier affect the interpretation of a correlation?
Can create inflated correlation with only a few extreme data points
Can a correlation data be generalized beyond the range of scores in the sample?
Can’t generalize beyond range of scores in sample
Low correlation may be due to ___ range
Low correlation may be due to limited range
What is reliability?
Extent to which a measurement is consistent and free from error
What can a reliable measurement be expected to do?
A reliable measure can be expected to repeat the same score on two different occasions provided that the characteristic of interest does not change
Reliability is closely tied to the concept of ___
Reliability is closely tied to the concept of measurement error
What are the continuous data reliability coefficients?
- Pearson correlation (r)
* Intraclass correlation coefficient (ICC) (best)
What are the discrete/ categorical data reliability coefficients?
- Percent agreement
* Kappa (best)
What are the problems with using a Pearson correlation (r) to quantify reliability?
- Assesses relationship, not agreement
2. Only two raters or occasions could be compared
Why do we prefer to use ICCs and Kappa for quantifying reliability?
Both ICCs and kappa give single indicators of reliability that capture strength of relationship plus agreement in a single value
____ is stated in terms of variance
Reliability coefficients is stated in terms of variance
What is the range of a reliability coefficient and what does it mean?
Range 0-1
0 = no reliability, 1 = perfect reliability
The more error variability you have, the ____ reliability coefficient will be
The more error variability you have, the lower your reliability coefficient will be
Reliability coefficient will be bigger, when ___ is larger
Reliability coefficient will be bigger, when true variance is larger
What is the equation for the reliability/ correlation coefficient?
True score variability divided by true score variability plus error variability
What does a high error variability do to correlation coefficient?
It will reduce it
What will not having enough true score variability do to correlation coefficient?
It will reduce it
What will happens to correlation coefficient with a large true variance?
It will be bigger
What are the things that an ICC measures?
Measures degree of relationship (association) and
agreement simultaneously
ICCs give ____ estimate of reliability (can compare different things)
ICCs give standardized estimate of reliability
ICC is often reported in conjunction with ____
ICC is often reported in conjunction with * Standard error of the measurement (SEM)*
ICC is designed for____ data but can be used with ___ data
ICC is designed for interval/ ratio data but can be used with ordinal data
When can can ICC be used with ordinal data?
If intervals “assumed” to be equivalent
SEM gives ____ estimate of reliability (i.e. in units
of measurement)
SEM gives “unstandardized” estimate of reliability (i.e. in units of measurement)
The 6 types of ICC dependent on ….?
- Purpose of study
- Design of study
- Type of measurements taken
ICC type defined by ___
ICC type defined by two numbers in parentheses
What does each number in the parenthesis of an ICC type mean?
The first number is the model and the second number is the form. (2, 6) 2 = model, 6 = form
How many models of ICC are there?
3
What is model 1 of an ICC?
- Each subject measured by a different set of raters; raters “randomly” chosen
- Rarely used in clinical research
What is model 2 of an ICC?
Each subject measured by same raters; raters “randomly” chosen & representative of rater population; results generalizable
What is ICC model 2 commonly used for?
Most common for inter-rater reliability or test-retest reliability
What is model 3 of an ICC?
Each subject measured by same rater(s); raters are only ones of interest; results not generalizable
What is ICC model 3 commonly used for?
Most common for intra-rater reliability
Rank the models of ICC in order from most conservative to least conservative
- Model 1 (most conservative, lowest number)
- Model 2 (neutral)
- Model 3 (least conservative, highest number)
When can a model ICC be used for inter rater reliability?
Can be for inter-rater reliability if study raters only ones of interest
What does the form/ 2nd number in parenthesis of an ICC represent?
Second number in parentheses represents number of observations used to obtain reliability estimate
When is form = 1?
If only one observation per subject per rater (or rating)
When is form a number more than 1?
If multiple observations averaged to get single number for analysis, form = number of observations averaged
What ICC is best for clinical measures?
ICC > 0.90
What ICC has good reliability?
ICC > 0.75
What ICC has poor to moderate reliability?
ICC < 0.75
The interpretation of an ICC depends on ____
The interpretation of an ICC depends on intended use
ICC estimate based on ____ will always be substantially higher than estimate based on ____
ICC estimate based on average measures will always be substantially higher
than estimate based on single measure
What are the characteristics of reliability for categorical scales?
- Based on frequency table
- Agreements on on diagonal
- Disagreements are all others
What is percent agreement?
How often the raters agree
How do you calculate percent agreement?
Divide number of agreements by total of all possible agreements
What is the problem with a percent agreement?
- Does not account for agreement due to chance
* Tends to overestimate reliability
What is the kappa coefficient?
Proportion of agreement
between raters after chance agreement has been removed
On what kind of data is a kappa coefficient used?
Can be used on both nominal and ordinal data
What does a weighted kappa do?
Can choose to make “penalty” worse for larger disagreements
What can the weight of a weighted kappa be?
Weights can be arbitrary, and
symmetric or asymmetric
A weighted kappa is best for what kind of data?
Best for ordinal data
The kappa interpretation depends on ____
The kappa interpretation depends on the weights used
What does a kappa value of <0.4 mean?
Poor to Fair agreement beyond chance
What does a kappa value of 0.4–0.6 mean?
Moderate agreement beyond chance
What does a kappa value of 0.6–0.8 mean?
Substantial agreement beyond chance
What does a kappa value of 0.8–1.0 mean?
Excellent agreement beyond chance
Internal consistency is often used to do what?
Often used to construct and evaluate scale / questionnaires
What does internal consistency estimate?
Estimate how well the items that reflect the same construct yield similar results. So, do different questions measure same concept or indicator?
What does cronbach’s alpha (a) do?
Represents correlation among items and correlation of each individual item with the total score
What is recommended that cronbach’s alpha be between?
Recommended that cronbach’s alpha be between 0.70 to 0.90
Cronbach’s alpha can have ___ or ____ on test/questionnaire
Cronbach’s alpha can have dichotomous or multiple-choice responses on test/questionnaire
What can cronbach’s alpha (a) help eliminate?
Can help eliminate items from test/questionnaire that are not homogenous to the set or are not contributing unique information
What is response stability?
A way to quantify stability of repeated measures over time
Response stability is basically the same as ___
Response stability is basically the same as test-retest reliability
What are the different ways to test response stability?
- SEM: standard error of the measurement
- MDC: minimal detectable difference/change
- CV: coefficient of variation
Standard error of measurement is a ___ measure of reliability, while ICC and kappa is a ____ measure of reliability
Standard error of measurement is a absolute measure of reliability, while ICC and kappa is a relative measure of reliability
SEM is in units of _____
SEM is in units of measurement as variable
What is SEM theoretically?
Standard deviation of the distribution of theoretical multiple measurements
An SEM can be used to create a ____
An SEM can be used to create a 95% CI around a measurement
What is the MDC?
Amount of change in a variable that must be achieved to reflect a true change/difference
___ is a mathematical multiple of SEM
MDC is a mathematical multiple of SEM
What is the coefficient of variation (CV)?
A standardized way to measure variability. (SD divided by the mean times 100)
What is the coefficient of variation helpful in comparing and why?
Unit-less, so is helpful comparing variability between two distributions on different scales
What is an alternate form reliability?
Comparing different methods of testing same phenomenon with different instruments (goniometer vs inclinometer)
What analysis or agreement is seen with an alternate form reliability?
- Limit of agreement
- Bland- altman analysis
What is a bland- altman plot?
When you plot the mean of two measures on the x- axis and the difference between the 2 measures on the y- axis, and the center of the plots is a bias
What does a tighter range on the bland altman plot mean?
There is more agreement between the two measures
When is there no bias on a bland altman plot?
When the line of bias is at 0
When is there a consistent bias on a bland altman plot?
When the points on the plot are on one side of the bias line
When is there an asymmetrical bias on a bland altman plot?
When the points are split between the two sides of the bias line
What is epidemiology?
A study aimed at studying determinants of disease, injury or dysfunction in populations
Epidemiology is another way of saying ____
Epidemiology is another way of saying risk
Risk in PT can be expressed in terms of _____
• Experiencing an adverse outcome
• Patients not improving with treatment
• Requiring more invasive or expensive subsequent
interventions in spite of treatment
Epidemiology generally uses observational designs with ___ variables
Epidemiology generally uses observational designs with dichotomous variables
What studies are intended to study risk factors?
Case-Control & Cohort Studies
Case-Control & Cohort Studies looks at the ____ between disease & exposure
Case-Control & Cohort Studies looks at the association (“cause”) between disease &
exposure
The IV and DV in case-control & cohort studies are what kind of variables?
Dichotomous
In case-control & cohort studies, there is ___ strength in thinking something is causal of the other
In case-control & cohort studies, there is less strength in thinking something is causal of the other
How are subjects in a cohort study selected?
Subjects selected based on
exposure or not
Is a cohort study usually prospective or retrospective?
Usually prospective, but
can be prospective or retrospective
Does a cohort study work for rare conditions?
Doesn’t work well for very
rare conditions
What does a cohort study examine?
Examine if there is a different
incidence of disease
How are subjects in a case control study selected?
Subjects selected based on
whether or not they have
disorder
Where should the controls of a case control be selected from?
Controls should be selected
from same population as Cases
What does a case-control study examine?
Examine if exposure is different between cases and control
What condition does a case control work especially well for?
Works especially well for very
rare conditions
What are the primary ways to quantify risk?
- Relative Risk (RR)
* Odds Ratios (OR)
What do the primary ways to quantify risk actually quantify?
Both quantify strength of association between “exposure” and “disease”
In what study is RR used and in what study is OR used?
- RR in Cohort studies
* OR in Case-control studies
What does it mean when an RR or OR = 1 ?
- = “null value”
* No association between an exposure and a disease
What does it mean when an RR or OR > 1?
- A positive association between an exposure and a disease
* The exposure is considered to be harmful
What does it mean when an RR or OR < 1?
- A negative association between an exposure and a disease
* The exposure is protective
RR is the ratio of ___ compared to ____
Incidence of disease among
exposed individuals compared to Incidence of disease among
unexposed individuals
Since OR is selected based on whether they have disease or not, so can’t determine rate of ___
Since OR is selected based on whether they have disease or not, so can’t determine rate of “incidence”
OR is the ratio of ___ compared to ____
Odds of exposure among cases (with disease) compared to Odds of exposure among controls (w/o disease)
The computation of OR is kinda like ___
The computation of OR is kinda like kappa
____ uses relationships (correlation) as a basis for prediction
Regression uses relationships (correlation) as a basis for prediction
What are the characteristics of a linear regression?
X and Y are correlated • X = independent variable (= predictor variable) • Y = dependent (or criterion) variable • We use X to predict Y • The value of Y depends on X • (Thats why Y is called the dependent variable)
What is the error from line/ residual in a regression line?
The distance between each data point and the line of best fit
Residuals are squared to eliminate ___ and penalize for ___
Residuals are squared to eliminate sign and penalize for worse errors
What is the line of best fit?
Line with least squared errors
Is regression a parametric or non parametric statistic?
Parametric
What are the assumptions of a linear regression analysis?
- Linear relationship = approximation of true line in population
- For every X there is a normal distribution of Y
• Sample data include random samplings from these distributions on Y - Homogeneity of variance
What is a way to test the assumptions of a linear regression?
Analysis of residuals by:
Plot Residuals on Y-axis, vs predicted values on x-axis
What assumption of linear regression does the analysis of residuals test the most?
Homogeneity of variance
What are you looking for in the analysis of residuals to test linear regression assumptions (assumptions are met)?
Looking for the residual’s distance between the predictive value and the actual value be symmetric and consistent throughout
What does the analysis of residuals graph look like when the assumptions of linear regression are not met?
- The graph starts to get wider the further it goes(data is further away from the line, the higher you go)
- Data is not symmetric
What happens if the linear regressions assumptions are not met?
Use a non linear regression
What are the thing that helps a researcher determine whether to retain or discard a data with an outlier?
• Due to peculiar circumstances?
• Can discard if error identified
• Generally not justified on statistical grounds
alone
What are the peculiar circumstances that have to be taken into consideration when determining whether to retain or discard a data?
- Measurement error
- Recording error
- Equipment malfunction
- Miscalculation
- Aberrant subject (should have been excluded)
What are the things that looks a the accuracy of prediction of the regression equation?
• Correlation coefficient (R)
Coefficient of determination (R2)
• ANOVA of Regression
What are the characteristics of a correlation coefficient as it relates to the accuracy of prediction?
- Rough indicator of goodness of fit for regression line
* Same as correlation coefficient (r)
What does the coefficient of determination represent?
Proportion of variance in Y scores that can be explained by X scores
What does the ANOVA of regression test?
Tests hypothesis that predictive relationship occurred by chance (Ho: b = 0)
What does it mean when b=0 in an ANOVA of regression?
If b (slope) = 0, line is horizontal = no relationship
What happens when p< than alpha in an ANOVA of regression?
If p < than alpha, reject the null and conclude the predictive relationship is
significant
How many predictors are in a simple linear regression model and how many are in a multiple linear regression model?
There is only 1 predictor in a simple model and there are multiple predictors in a multiple linear regression model
What are the assumptions of a multiple linear regression analysis?
- Linear relationship = approximation of true line in population
- For every X there is a normal distribution of Y
• Sample data include random samplings from these distributions on Y - Homogeneity of variance
- DV = continuous measure
Coefficient of determination is the square of ____
Coefficient of determination is the square of correlation coefficient
What is an adjusted R squared and what do you get punished for?
Chance corrected R2, get punished for having more predictor variables
What is the goal of a linear regression?
The more you can predict with fewer variables, the better
What is a regression coefficient?
- The value/slope in the linear equation
* The rate of change in Y for each unit change of X
What is a standardized beta weight helpful for?
Helpful to know relative contribution of each predictor
variable
Which will always be higher or the same, out of an R square or an adjusted R square?
The R square will always be higher than or equal to the adjusted R square
What is multicolinearity?
When the Xs in the model are substantially correlated with each other
What does multicolinearity create a problem with?
Creates problems with interpretations of b weights
What is the risk of the force entry of all possible predictors in a multiple regression method?
- Risk of multicolinearity (correlation between predictors)
- Risk of retaining non-contributing predictors
- Risk of more predictors than justified by sample size
How is the criteria in a stepwise procedure set?
Criteria set to retain or reject predictors
Which predictor is entered first in a stepwise procedure?
Predictor with highest partial correlation entered first
What does a stepwise procedure result in?
Should result in model with greatest parsimony and
least multicolinearity
What is a parsimony model?
A model that is the most predictive, with the least amount of variables
What is a simple correlation?
The overlap between 2 variables
What is a partial correlation?
The unique correlation between 2 variables
What is a forward stepwise regression method?
A method that starts with no predictors, then adds them, starting with the strongest
What is a backward stepwise regression method?
A method that starts with all predictors, then removes them, starting with the weakest
What is a stepwise stepwise regression method?
A method that starts with no predictors, then add,
but can also remove
What is the level of measurement for predictors/ IV in a stepwise multiple linear regression model?
- Most predictors are continuous scales
- Can also use dichotomous or ordinal scale predictors
- But not multicategory nominal (e.g. race)
A large number of predictors is needed in a stepwise multiple linear regression hence it requires ___
A large number of predictors in a regression requires a very large sample size
What is the rule of thumb for the predictors of a stepwise multiple linear regression model?
At least 10-15 subjects per predictor in model
What happens if there are too many or too few predictors in a stepwise multiple linear regression model?
Become susceptible to “model overfit” (chance associations, i.e. type 1 error).
What is a logistic regression?
When you are trying to predict a dichotomous variable
What is the DV level of measurement of a logistic regression?
Dichotomous
What is the predictor/ IV level of measurement of a logistic regression?
Continuous, ordinal, or dichotomous
What are the pros MANOVA?
• MANOVA gets around multiplicity problem (familywise alpha:
increased Type I error risk)
• MANOVA can be more powerful if DVs related
What are the cons MANOVA?
• “Combo DV” is not directly interpretable
• If statistically significant, then must follow up with post-hoc
ANOVAs
What is a factor analysis?
Method of simplifying & organizing large sets of variable into fewer abstract components
What is a path analysis?
Visual modeling of both direct & indirect relationships
Path analysis is an extension of ____
Path analysis is an extension of multiple regression
Compared to a multiple regression, a path analysis is more __ and ____
Compared to a multiple regression, a path analysis is more flexible and comprehensive
What can a path analysis analyze?
Can analyze both direct and indirect relationships between 1 or more exogenous variables (IVs) and 1 or more endogenous variables (DVs)
What is a hierarchical linear modeling also known as?
- Multilevel linear modeling
* Linear mixed modeling
A hierarchical linear modeling comes from what type of analysis?
The type of analysis where you have some variables nested within other variables (students nested in a classroom when studying schools)
A hierarchical linear modeling, has far fewer __ and is highly ___
A hierarchical linear modeling, has far fewer assumption and
highly flexible
What is the Number Needed to Treat (NNT)?
How many patients you have to provide treatment to in order to prevent one bad outcome
What is Control Event Rate (CER)?
Percent of patients in control group with bad outcome
What is Experimental Event Rate (EER)?
Percent of patients in experimental group with bad outcome
What is the equation for RR?
EER/CER