523 - Stats Flashcards
achievement test
WHO:
WHERE: used in schools and education settings
WHAT: A test designed to measure how much someone knows about a particular topic. Measures previous learning!!! Not their ability to learn something
WHY: Achievement tests offer a standardized measure to compare individuals or groups. The scoring is objective and reliable They may also help to highlight academic strengths and weaknesses.
EXAMPLE: Comps is an achievement test designed to measure how much students have learned in the ten core classes of the program. Also, if they’ve learned enough to continue in the program.
ANOVA
WHO:
WHERE: Applied statistics and psychometrics
WHAT: An analysis of variance. A statistical technique used to compare more than two experimental groups at a time. Different than t-tests because they can analyze differences even if groups have different sample sizes.
WHY: ANOVAs determine whether there is a significant difference between groups. Can also reduce the chances of type I errors (false positives)
EXAMPLE: There is an experiment done to compare test scores using three different study techniques: flashcards, note reading, and practice tests. An ANOVA test is run to see if there are any significant differences between the groups.
aptitude test
WHO:
WHERE: Applied statistics and psychometrics
WHAT: Measures a person’s potential to learn specific skills/gain knowledge on a topic. They rely heavily on predictive criterion validation procedures.
Prone to bias (cultural, racial, language).
WHY: Aptitude tests are important to help understand a person’s innate potential. They can help predict future performance in specific areas and help ensure that students are enrolled in programs that match their capabilities.
EXAMPLE: The ACT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the ACT (racial, gender bias).
clinical vs statistical significance
WHO:
WHERE: Applied statistics and psychometrics
WHAT: Clinical = meaningfulness of change in a client’s life
How meaningful/important are the changes to the patient? Does an individual still have quality of life? Do they still meet criteria for a diagnosis? What percentage of patients are benefitting?
Statistical = reliability of an outcome; calculated mathematically; considered statistically significant if p-value is < .05 (<5% chance results are due to chance)
larger sample = less likely results due to chance
WHY: Findings can be clinically significant without being statistically significant, or vice versa.
This is important to remember while understanding research, and understanding if a treatment may be helpful for a disorder.
EXAMPLE: A therapist is trying to decide between two different treatments for a client. One treatment has a high clinical significance and a statistical one. The other has a high statistical significance, but a low clinical significance. The therapist chooses the first treatment, as the patients in the study have a higher quality of life, and fewer of them meet diagnostic criteria post-treatment.
construct validity
WHO:
WHERE: applied stats and psychometrics
WHAT: The degree to which a test is capable of measuring all aspects of what it claims/aims to be measuring.
- Focuses on the attributes, features, nature, and ability of a measurement instrument being tested
- Does it fully measure when it aims to be measuring?
Divergent validity = how well the test does not correlate with other tests that measure different constructs
Convergent validity = how well the test correlates with other tests that measure the same constructs
WHY: It is important to keep construct validity in mind to ensure you are measuring what you intend to research. Additionally, steps can be taken to avoid things that threaten construct validity, such as a mismatch between the construct and its operational definition, bias, experimenter, and participant effects.
EXAMPLE: A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity (that it is actually measuring the construct of depression). To do this, they measure how much the test correlates with the BDI and how much it does not measure another construct like anxiety.
content validity
WHO:
WHERE: applied stats and psychometrics
WHAT: The degree to which a measure represents all aspects of a given construct
- how well a measure encompasses the full domain of what it is trying to measure
- Is the test representative of what it aims to measure?
Can’t be measured empirically– assessed via factor analysis
WHY: Considering content validity is important in research to ensure a measure measures the entire range of what it aims to test. Useful to assess whether items are relevant
EXAMPLE: A test is designed to survey arithmetic skills at a fourth-grade level. The test’s level of content validity indicates how well it represents the range of arithmetic skills possible at the level
correlation vs causation
WHO:
WHERE: applied stats and psychometrics
WHAT: correlation = relationship between two variables (correlation coefficient between -/+1)
causation = when change in one variable brings a change in the other variable. determined via controlled studies
WHY: Correlation ≠ causation!! Important to consider when creating + consuming research to know how/why two variables are related, and to be able to deduct accurately
EXAMPLE: Ice cream sales and drowning rates are positively correlated. This is not because none causes the other, but rather because both are more common during summer months.
Annie is examining the relationship between social media and her body image. She abstained from social media for one month and noticed her body image became more positive. She now has reason to believe there is a causational relationship between the two variables/
dependent t-test
WHO:
WHERE: applied stats and psychometrics
WHAT: A statistic analysis that compares the means of two RELATED groups to determine whether there is a statistically significant difference between their means
- used when the design involves matched pairs or repeated measures, and has 2 levels of the IV
WHY: Called ‘dependent’ because the groups have characteristics that impact the measurement. The measurement is dependent on these characteristics
They allow for researchers to control for individual characteristics
EXAMPLE: A research wants to test how effective a relaxation technique is on reducing stress levels in college students. Stress levels are recorded before and after the use of the relaxation technique. A dependent t-test is conducted to compare the mean stress levels before and after intervention to determine if the relaxation technique made a statistically significant difference.
internal consistency
WHO:
WHERE: applied stats and psychometrics
WHAT: Type of reliability
Measures the extent of which items on a test measure a specific ability or trait.
Do items that are intended to measure the same contract produce similar scores?
Measured with Cronbach’s alpha, ranges 0 - 1
WHY: Internal consistency shows the degree of interrelationship/homogeneity of items on a test. It is important to ensure a test truly measures what it’s supposed to be measuring.
EXAMPLE: Molly is creating a test to measure the Big 5 personality traits. She tests the test’s internal consistency to ensure it adequately is measuring what she intended it to. The Cronbach’s alpha comes out to 0.91, indicating a good internal consistence. Molly’s test is suitable for use.
internal validity
WHO:
WHERE: applied statistics and psychometrics
WHAT: The extent to which the observed relationship between variables in a study reflects their actual relationship
Internal validity is how sure you can be that the intervention was the only reason for change in the DVs
To increase internal validity = control for cofounding variables, randomly select participants
WHY: A study with a high internal validity may indicate causation. Internal validity indicates whether one can draw reasonable conclusions about the cause-and-effect relationships among variables in a study.
EXAMPLE: A group of researchers were testing a new treatment for depression. They highly controlled who could be a participant, including not allowing anyone with a comorbid disorder. This reduced potential cofounding variables, increased the study’s internal; validity, and therefore increased the likelihood that their treatment was the sole reason for change in participants.
interrater reliability
WHERE: applied statistics and psychometrics
WHAT: Type of reliability
Measures the agreement level between independent raters
- the extent to which independent evaluators produce similar ratings in judging the same thing in the same person/object
- useful with measures that less objective and more subjective
- expressed with correlation coefficient
WHY: Interrater reliability is used to compensate/account for human error in an independent rater (distractibility, misinterpretation, differences in ability)
EXAMPLE: A natural observation study is being conducted to look at the effect of violent video games on the behavior of 10 year old boys. 3 independent observers were to rate the level of aggressiveness of the boys’ behavior. The responses were consistent and yield a high correlation coefficient, indicating good interrater reliability.
measures of central tendency
WHO:
WHERE: applied statistics and psychometrics
WHAT: Statistical descriptions of the center of the distribution
Mean = average
Median = point that separates distribution into two halves
Mode = most frequently occurring
**median and mode most resistant to outliers
WHY: Describes a data set/distribution. Allows for a better understanding of the data, as well as for inferences to be made about trends and the shape of the distribution.
EXAMPLE: A researcher is studying the frequency of BPD patients intentionally skipping their medications per month. To better understand the gathered data, the researchers calculate the most frequently occurring number of days, the average number of missed doses, and the number of missed days in the center of the data set.
measures of variability
WHO:
WHERE:
WHAT:
WHY:
EXAMPLE:
nominal/ordinal/interval/ratio measurements
WHO:
WHERE: applied statistics and psychometrics
WHAT: How the spread of the distributions varies around the central tendency
SD = square root of variance
Range = difference between the highest and lowest value
Variance = the average of each value’s SQUARED difference from the mean
WHY: It is important to see the outliers of data to asses
It is important to see the outliers of data to assess if they need to be dropped to get accurate data when running tests. Helps determine which statistical analyses you can run on a data set
EXAMPLE:
norm-referenced scoring/tests
WHERE: Taught in applied stats and psychometrics
WHAT: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group. Norms should be current, relevant, and representative of the group to which the individual is being compared.
WHY: It is important as Norm-referenced scoring/tests can be problematic when tests are not normed with a culturally diverse population. Many norming samples attempt to be representative of the population which can result in several categories being represented by very few people. This can lead to inappropriate scoring, or test acceptability with some populations and has resulted in within group norming.
EXAMPLE: IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results.