523 - Stats DEENA'S VERSION Flashcards
achievement test
WHAT: A test designed to measure how much someone knows about a particular topic.
- measures previous learning NOT ones ability to learn something
- used in schools and education settings
WHY: Achievement tests offer a standardized measure to compare individuals or groups. The scoring is objective and reliable. They may also help to highlight academic strengths and weaknesses.
EXAMPLE: Comps is an achievement test designed to measure how much students have learned in the ten core classes of the program. Also, if they’ve learned enough to continue in the program.
ANOVA
WHAT: A statistical technique used to compare three or more experimental groups at a time.
- analysis of variance.
- different than t-tests because they can analyze differences even if groups have different sample sizes.
WHY: ANOVAs determine whether there is a significant difference between groups. Can also reduce the chances of type I errors (false positives).
EXAMPLE: There is an experiment done to compare test scores using three different study techniques: flashcards, note reading, and practice tests. An ANOVA test is run to see if there are any significant differences between the groups.
aptitude test
WHAT: Measures a person’s potential to learn specific skills/gain knowledge on a topic
- rely heavily on predictive criterion validation procedures.
- prone to bias (cultural, racial, language).
WHY: Aptitude tests are important to help understand a person’s innate potential. They can help predict future performance in specific areas and help ensure that students are enrolled in programs that match their capabilities.
EXAMPLE: The ACT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the ACT (racial, gender bias).
clinical vs statistical significance
WHAT: Clinical = meaningfulness of change in a client’s life
How meaningful/important are the changes to the patient? What percentage of patients are benefitting?
Statistical = reliability of an outcome; calculated mathematically
- considered statistically significant if p-value is < .05 (<5% chance results are due to chance)
- larger sample = less likely results due to chance
WHY: Findings can be clinically significant without being statistically significant, or vice versa.
This is important to remember while understanding research, and understanding if a treatment may be helpful for a disorder.
EXAMPLE: A therapist is trying to decide between two different treatments for a client. One treatment has a high clinical significance and a statistical one. The other has a high statistical significance, but a low clinical significance. The therapist chooses the first treatment, as the patients in the study have a higher quality of life, and fewer of them meet diagnostic criteria post-treatment.
construct validity
WHAT: The degree to which a test is capable of measuring all aspects of what it claims/aims to be measuring.
- Focuses on the attributes, features, and ability of a measurement instrument being tested
Divergent/Discriminate validity = how well the test does NOT correlate with other tests that measure different constructs
Convergent validity = how well the test correlates with other tests that measure the same constructs
WHY: It is important to keep construct validity in mind to ensure you are measuring what you intend to research. Additionally, steps can be taken to avoid things that threaten construct validity, such as a mismatch between the construct and its operational definition, bias, experimenter, and participant effects.
EXAMPLE: A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity (that it is actually measuring the construct of depression). To do this, they measure how much the test correlates with the BDI (convergent validity) and how much it does not measure another construct like anxiety (divergent validity).
content validity
WHAT: The degree to which a measure represents all aspects of a given construct
- how well a measure encompasses the full domain of what it is trying to measure
- Is the test/items on test representative of what it aims to measure?
Can’t be measured empirically– assessed via factor analysis
WHY: Considering content validity is important in research to ensure a measure measures the entire range of what it aims to test. Useful to assess whether items are relevant
EXAMPLE: A test is designed to survey arithmetic skills at a fourth-grade level. The test’s level of content validity indicates how well it represents the range of arithmetic skills possible at the level
correlation vs causation
WHAT:
Correlation = relationship between two variables (correlation coefficient between -/+1)
Causation = when change in one variable brings a change in the other variable. determined via controlled studies
WHY: Correlation ≠ causation!! Important to consider when creating + consuming research to know how/why two variables are related, and to be able to deduct accurately
EXAMPLE: Ice cream sales and drowning rates are positively correlated. This is not because none causes the other, but rather because both are more common during summer months.
Annie is examining the relationship between social media and her body image. She abstained from social media for one month and noticed her body image became more positive. She now has reason to believe there is a causational relationship between the two variables.
dependent t-test
WHAT: A statistic analysis that compares the means of two RELATED groups
- to determine whether there is a statistically significant difference between their means
- used when the design involves matched pairs or repeated measures (ex:pretest & posttest), and has 2 levels of the IV
TEST BEFORE AND AFTERS/WITHIN GROUP
WHY: Called ‘dependent’ because the groups have characteristics that impact the measurement. The measurement is dependent on these characteristics
They allow for researchers to control for individual characteristics
EXAMPLE: A research wants to test how effective a relaxation technique is on reducing stress levels in college students. Stress levels are recorded before and after the use of the relaxation technique. A dependent t-test is conducted to compare the mean stress levels before and after intervention to determine if the relaxation technique made a statistically significant difference.
independent t-test
WHAT: Used to determine is there are significant differences between two UNRELATED group means
- used with two conditions of IV
TEST DIFF IVS/BETWEEN GROUPS
WHY: Independent t-tests allow you to see to the effects of two different interventions. Significant differences indicate the intervention produced different results.
EXAMPLE: Researchers are comparing the effectiveness CBT vs DBT on treating depression. After treatment, they may run an independent t-test to see if there were any significant differences in symptoms between the groups. This may indicate if one treatment was better at reducing symptoms than the other.
internal consistency
WHAT: Measures the extent of which items on a test measure a specific ability or trait.
- type of reliability
- measured with Cronbach’s alpha, ranges 0 - 1
Do items that are intended to measure the same construst produce similar scores?
WHY: Internal consistency shows the degree of interrelationship/homogeneity of items on a test. It is important to ensure a test truly measures what it’s supposed to be measuring.
EXAMPLE: Molly is creating a test to measure the Big 5 personality traits. She tests the test’s internal consistency to ensure it adequately is measuring what she intended it to. The Cronbach’s alpha comes out to 0.91, indicating a good internal consistence. Molly’s test is suitable for use.
internal validity
WHAT: The extent to which the observed relationship between variables (IV & DV) in a study reflects their actual relationship
- how sure you can be that the intervention was the only reason for change in the DVs
To increase internal validity = control for cofounding variables, randomly select participants
WHY: A study with a high internal validity may indicate causation. Internal validity indicates whether one can draw reasonable conclusions about the cause-and-effect relationships among variables in a study.
EXAMPLE: A group of researchers were testing a new treatment for depression. They highly controlled who could be a participant, including not allowing anyone with a comorbid disorder. This reduced potential cofounding variables, increased the study’s internal; validity, and therefore increased the likelihood that their treatment was the sole reason for change in participants.
interrater reliability
WHERE: applied statistics and psychometrics
WHAT: Measures the agreement level between independent raters
- the extent to which independent evaluators produce similar ratings in judging the same thing in the same person/object
- useful with measures that less objective and more subjective
- expressed with correlation coefficient
WHY: Interrater reliability is used to compensate/account for human error in an independent rater (distractibility, misinterpretation, differences in ability)
EXAMPLE: A natural observation study is being conducted to look at the effect of violent video games on the behavior of 10 year old boys. 3 independent observers were to rate the level of aggressiveness of the boys’ behavior. The responses were consistent and yield a high correlation coefficient, indicating good interrater reliability.
measures of central tendency
WHERE: applied statistics and psychometrics
WHAT: Statistical descriptions of the center of the distribution
Mean = average
Median = point that separates distribution into two halves
Mode = most frequently occurring
**median and mode most resistant to outliers
WHY: Describes a data set/distribution. Allows for a better understanding of the data, as well as for inferences to be made about trends and the shape of the distribution.
EXAMPLE: A researcher is studying the frequency of BPD patients intentionally skipping their medications per month. To better understand the gathered data, the researchers calculate the most frequently occurring number of days, the average number of missed doses, and the number of missed days in the center of the data set.
measures of variability
WHAT: Statistic description of the variability of the distribution around the central tendency
- Range
- Variance (the average of each value’s SQUARED difference from the mean)
- SD (square root of variance)
WHY: Describes a data set/distribution. Allows for a better understanding of the data, as well as for inferences to be made about trends and the shape of the distribution. Also allows you to see outliers and determine if they should be dropped.
EXAMPLE: A school counselor is assessing math test scores from a class. After finding the standard deviation of the scores, she was able to see the outliers and determine who was really struggling and excelling in this subject.
nominal/ordinal/interval/ratio measurements
WHAT: Levels of measurement of variables.
Nominal = categorical (gender, political parties)
Ordinal = indicate order (birth order, Likert scale, stages/steps)
Interval = true score, no true zero– zero does not indicate none/an absence (temp, test score, IQ)
Ratio = interval data, with a true zero– zero indicates none/an absence (height, weight, speed, frequency of behaviors)
WHY: Nominal and ordinal data are non-continuous, while interval and ratio are continuous. Important to know the difference when gathering and organizing data.
EXAMPLE: A researcher is giving out surveys for a study they are conducting. The survey asks for the participants gender, their height, and asks them to rate their moods on a Likert scale. These are examples of nominal, ratio, and ordinal measurements.