523- Applied Statistics and Psychometrics Flashcards
Achievement test
A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning. Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.
ANOVA
Analysis of variance A parametric statistical technique used to compare more than two experimental groups at a time. Determines whether there is a significant difference between the groups, but does not reveal where that difference lies. Clinical example: A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. You will gather data and run an ANOVA on the three groups- counseling, medication, and biofeedback- to see if there is a significant difference between any of them.
Aptitude test
Measures a person’s potential to learn or acquire specific skills. Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias. Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.
Clinical vs. statistical significance
Clinical significance refers to the meaningfulness of change in a client’s life. Statistical significance is calculated when p < .05, meaning the likelihood that your results are due to chance is less than 5%. Statistical significance indicates that it is unlikely you have made a Type I error. Must calculate effect size to truly evaluate the meaningfulness of a result.
Construct validity
Part of: research design Construct validity is the degree to which a test or study measures the qualities or the constructs that it is claiming to measure.* Convergent validity: does test correlate highly with other tests that measure the concept* Divergent validity: does test correlate lowly with tests that measure different constructs Clinical example: A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity, in that it is actually measures the construct of depression. To do this, they measure how much the test correlates with the Beck Depression Inventory and how much it does not measure another concept like anxiety.
Content validity
Part of: research design Content validity is the degree to which a measure or study includes all of the facets/aspects of the construct that it is attempting to measure. Content validity cannot be measured empirically but is rather assessed through logical analysis. Validity=accuracy Clinical example: A depression scale may lack content validity if it only assesses the affective dimension of depression (emotion related- decrease in happiness, apathy, hopelessness) but fails to take into account the behavioral dimension (sleeping more or less, eating more or less, energy changes, etc.).
Correlation vs. causation
Part of: research design and statistical analysis Correlation means that a relationship exists between two variables.* Can be positive or negative; coefficient will fall between -1.00 and +1.00.* Correlation does not indicate causation. Causation means that a change in one variable affects a change in the other variable.* Determined via controlled experiments, when dependent variables can be isolated and extraneous variables controlled. Clinical example: A study found that minutes spent exercising correlated with lower depression levels. This study was able to show that depression levels and exercise were correlated, but could not go so far as to claim that one causes the other.
Dependent t-test
Statistical analysis that compares the means of two related groups to determine whether there is a statistically significant difference between these means.* Sometimes called a correlated t-test because the data are correlated.* Used when the design involves matched pairs or repeated measures, and only two conditions of the independent variable* It is called “dependent” because the subjects carry across the manipulation–they take with them personal characteristics that impact the measurement at both points—thus measurements are “dependent” on those characteristics. Clinical example: A researcher wants to determine the effects of caffeine on memory. They administer a memory test to a group of subjects have the subjects consume caffeine then administer another memory test. Because they used the same subjects, this is a repeated measures experiment that requires a dependent t-test during statistical analysis.
Descriptive vs. inferential
Descriptive statistics are those which are used to describe and summarize the sample or population.* includes measures of central tendency and variance* can be used with any type of data (experimental and non-experimental)Inferential statistics allow inferences to be made from the sample to the population.* Sample must accurately reflect the population (importance of random sampling)* Infer causality* Limited to experimental data* Techniques include hypothesis testing, regression analysis.* The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population. EXAMPLE: A researcher conducts a study examining the rates of test anxiety in Ivy League students. This is a descriptive study because it is concerned with a specific population. However, this study cannot be generalized to represent all college students, so it is not an inferential study.
Effect size
Part of: statistical analysis A measure of the strength of a significant relationship; the proportion of variance accounted for. Indicates if findings are weak, moderate, or strong. Also called shared variance or the coefficient of determination. Why: Quantifies the effectiveness of a particular intervention, relative to some comparison; commonly used in meta-analyses. Example: A researcher conducts a correlational research study on the relationship between caffeine and anxiety ratings. The study produces a correlation coefficient of 0.8 which is considered a large effect size. The effect size reflects a strong relationship between the caffeine and anxiety.
Independent t-test
Statistical analysis that compares the means of two independent groups, typically taken from the same population (although they could be taken from separate populations).* Determines if there is a statistical difference between the two groups’ means* We make the assumption that if randomly selected from the same population, the groups will mimic each other; the null hypothesis is no difference between the two groups EXAMPLE: Fred is analyzing the best treatment options for his patient Harold. He reads a study comparing two different types of therapies. After utilizing an independent t-test, the researchers found that there was not a statistically significant difference between the treatment options. Harold decides that both are good options for his patient and he decides to think about his client’s person variables that might make one better than the other.
Internal consistency
Part of: research design What: a type of reliability that measures whether several items that propose to measure the same general construct produce similar scores and are free from error.* usually measured with Cronbach’s alpha. EXAMPLE: Patient comes in with symptoms of PTSD. You decide to search for a psychological test that is designed to help you to detect and diagnose PTSD. You come across the Posttraumatic Stress Diagnostic Scale (PDS). The test manual indicates that the PDS is a valid measure of PTSD. You look in the test manual of the PDS and find that Cronbach’s alpha is 0.91. This indicates that the PDS has strong internal consistency.
Internal validity
Part of: research design What: The extent to which the observed relationship between variables in a study reflects the actual relationship between the variables. Control for confounding variables can increase internal validity, as well as a random selection of participants. EXAMPLE: Researchers investigated a new tx for depressing using tight controls in terms of who could be a participant. For instance, they did not allow anyone with comorbidity to participate. This increased the study’s internal validity. It did, however, jeopardize the ecological validity of the research.
Interrater reliability
Part of: research design What: a type of reliability that measures the agreement level between independent raters.* useful with measures that are less objective and more subjective.* used to account for human error in the form of distractibility, misinterpretation or simply differences in opinion. EXAMPLE: Three graduate students are performing a natural observation study for a class that examines violent video games and behavior in a group of 9 year old boys. The students rated the behavior on a scale of 1 (not aggressive) to 5 (very aggressive). However, the responses were not consistent between the observers. The study lacked inter-rater reliability.
Measures of central tendency
Part of: statistical analysis What: Tendency of the data to lump somewhere around the middle across the values on X; provides a statistical description of the center of the distribution.* Three main measures are used: the mean, mode and median.* Mean is the arithmetic average of all scores within a data set.* Mode is the most frequently occurring score.* Median is the point that separates the distribution into two equal halves.* Median and mode are the most resilient to outliers. EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. To better understand the data that was gathered, they start by calculating the measures of central tendency: the most frequently occurring number of episodes in the group, the average number of episodes, and the number of episodes in the middle of the set. In other words, the mode median and mean.