PSYC 523- Statistics Flashcards
ANOVA
Analysis of variance: a statistical technique used to compare whether three or more populations are statistically different from each other; determines whether there is a significant difference between the groups but does not reveal where that difference lies - must do further tests to determine.
Clinical example: A group of psychiatric patients are trying three different therapies: counseling, medication, and biofeedback. You want to see if one therapy is more efficacious than the others. You will gather data and run an ANOVA on the three groups- counseling, medication, and biofeedback- to see if there is a significant difference between any of them.
Clinical v. statistical significance
Clinical significance refers to the meaningfulness of change in a client’s life due to the treatment. Do the patient’s symptoms reduce in a meaningful or noticeable way? Does the quality of life improve for the patient?
Statistical significance refers to whether or not a treatment made a statistically significant impact on some outcome variable of interest or the impact of the treatment had a high probability of not being due to chance alone. A treatment can be statistically significant in research but not clinically significant.
Clinical example: If a randomized controlled trial does not show that a treatment is more effective than no treatment or a placebo, but that treatment produces a meaningful difference in a client’s life, it could be said to have clinical but not statistical significance.
You are trying to decide between two treatments for your client with treatment-resistant depression. One has demonstrated high clinical significance and high statistical significance in RCTs. The other shows high statistical significance but low clinical significance. You choose the one with high clinical significance because it assesses treatment efficacy from the patient perspective.
Construct validity
In research design, construct validity is the degree to which a test or study measures the qualities or the constructs that it is claiming to measure.
There are two ways of collecting evidence for construct validity, both of which are statistical procedures: convergent validity is how well a certain measure of a construct correlates with other well-established measures of that construct and divergent validity is how well the measure of a construct does not correlate with measures of other constructs.
In order to have high construct validity, a test should correlate highly with measures of the same construct (convergent validity) and not correlate highly with measures of other constructs (divergent validity).
Clinical example: If people score significantly differently on a new test designed to measure intelligence compared to a recognized test of intelligence, the new test may be lacking construct validity.
A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity, in that it actually measures the construct of depression. To do this, they measure how much the test correlates with the Beck Depression Inventory and how much it does not measure another concept like anxiety.
Content validity
In research design, content validity is the degree to which a measure or study includes all of the facets/aspects of the construct that it is attempting to measure. Content validity cannot be measured empirically but is rather assessed through logical analysis.
Clinical example: A depression scale may lack content validity if it only assesses the affective dimension of depression (emotion related- decrease in happiness, apathy, hopelessness) but fails to take into account the behavioral dimension (sleeping more or less, eating more or less, energy changes, etc) Because of this therapists, end up using other scales.
Correlation v. causation
In the context of research, correlation means that a relationship exists between two variables. This relationship can be positive or negative; coefficient will fall between -1.00 and +1.00. Causation means that a change in one variable affects a change in the other variable. Causality is usually determined via controlled studies, when you can isolate variables you want to examine and control for extraneous variables. Correlation does not indicate causation.
Clinical example: A study found that minutes spent exercising correlated with lower depression levels. This study was able to show that depression levels and exercise were correlated, but could not go so far as to claim that one causes the other.
Correlational research
Research method that examines the potential for relationships between variables that might logically seem to be related. The technique identifies a mathematical relationship and does not establish causal factors.
- Produces correlation coefficient; ranges from 1.0 to -1.0 depending on strength/direction of the relationship between the two variables
- Very common in psychological research; usually cost-effective
- PROS - inexpensive, produces wealth of data, encourages future research; precursor to experiment determining causation
- CONS - cannot establish causation or control for confounds
- Statistical tests include Pearson, Spearman, & point-biserial
Clinical example: Shelia’s patient Donna suffers from an anxiety disorder. She brings Shelia an article claiming that eating out of plastic containers causes cancer. After reading the article, Shelia explains that the study referenced in the article is a correlational study, which only shows that there is a relationship between eating out of plastic containers and cancer, but it does not prove that eating out of plastic containers causes cancer.
Cross-sectional design
A type of research that simultaneously compares individuals of different ages at one specific point in time. This type of design is very common and used in online surveys.
- Groups can be compared across a variety of dependent variables
- Advantages include a collection of large amounts of data in a short amount of time & low cost
- Drawbacks included the inability to infer causation (because it is just a snapshot)
- Considered quasi-experimental design (participants are not selected randomly - selected based on age)
EXAMPLE: George was looking to study the difference in peer relations and self-esteem in various age groups. He decided to use a cross-sectional design comparing 6 year-olds, 12-year-olds, 18-year-olds, and 25-year-olds.
EXAMPLE: You’re treating someone with depression. He is having a hard time finding the energy to carry out daily activities. The therapist shows him a cross-sectional study looking at depression levels and the utilization of behavioral activation. Specifically, the effectiveness of taking daily walks to increase energy level. The therapist explains that those who walk daily have been shown to have lower depression and higher energy levels, especially for his age group.
Dependent t-test
In psychological research, a type of statistical analysis that compares the means of two groups where the values in one sample affect the values in the other sample. Because the sample is carried across the test (AKA matched pairs or repeated measures), they are dependent on one another.
- Used when the design involves matched pairs or repeated measures, and only two conditions of the independent variable
- It is called “dependent” because the subjects carry across the manipulation–they take with them personal characteristics that impact the measurement at both points—thus measurements are “dependent” on those characteristics.
Clinical example: A researcher wants to determine the effects of caffeine on memory. They administer a memory test to a group of subjects have the subjects consume caffeine then administer another memory test. Because they used the same subjects, this is a repeated-measures experiment that requires a dependent t-test during statistical analysis.
Descriptive v. inferential
Descriptive statistics are those which are used to describe and summarize a data set.
- Can only be used to describe the sample they are conducted on.
- Common tools include measures of central tendency, variance, and skew.
- We choose a group that we want to describe and then measure all subjects in that group
Inferential statistics take data from a sample and make inferences about the larger population from which the sample was drawn.
- Need to have confidence that sample accurately reflects the population (population must be defined) → importance of random sampling
- Common techniques include hypothesis testing, regression analysis, etc.
- The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population.
EXAMPLE: A researcher conducts a study examining the rates of test anxiety in Ivy League
students. This is a descriptive study because it is concerned with a specific population. However, this study cannot be generalized to represent all college students, so it is not an inferential
study.
Double-blind study
A type of experimental design in which both the participants and the researchers are unaware of who is in the experimental condition and who is in the placebo condition.
- In contrast to a single-blind study, where only the participants are unaware of who is in the experimental condition.
- Double-blind studies eliminate the possibility that the researcher may somehow communicate (knowingly or unknowingly) to a participant which condition they are in, thereby contaminating the results.
Example: A study testing the efficacy of a new SSRI for anxiety is using a double-blind study. Neither the experimenter nor the participants are aware of who is in the treatment group and who is receiving a placebo. This setup ensures that the experimenters do not make subtle gestures accidentally signaling who is receiving the drug and who is not, and that experimenter expectations could not affect the studies outcome.
Ecological validity
The extent to which an experimental situation approximates the real-life situation which is being studied.
- Researchers call for these in hopes they will better generalize to the real-world
- Different from external validity
- Experiments high in ecological validity tend to be low in reliability because there is less control of the variables in real-world like settings
EXAMPLE: A researcher wants to study the effects of alcohol on sociability, so he administers beer to a group of subjects and has them interact with each other. To increase their ecological validity, he decides to carry out the study in an actual bar.
Effect size
Part of: research methods and statistical analysis
What: A quantitative measure of the strength of a relationship between two variables; refers to magnitude of an effect.
- It is also valuable for quantifying the effectiveness of a particular intervention, relative to some comparison - commonly used in meta-analyses
- Effect size can be used with the correlation between two variables, regression coefficients or the mean difference.
Example: A researcher conducts a correlational research study on the relationship between caffeine and anxiety ratings. The study produces a correlation coefficient of 0.8 which is considered a large effect size. The effect size reflects a strong relationship between the caffeine and anxiety.
Experimental research
A form of research in which one variable (the independent variable) is manipulated in order to see what effect it will have on another variable (the dependent variable). Researchers try to control any other variables (confounds) that may affect the dependent variable(s). Experimental research is the only way to establish causation.
Example: A researcher conducts an experimental research study to examine the relationship between caffeine intake and anxiety ratings. The study administers various levels of caffeine (the independent variable) to the low, high, and no caffeine groups. The participants are then asked to report their anxiety levels (the dependent variable). They found that those who had more caffeine reported feeling more anxious.
Hypothesis
In the field of research, a hypothesis is a formally stated prediction that can be tested for its accuracy.
- Essential to the scientific method and testing in research
- Hypotheses help to focus the research and bring it to a meaningful conclusion.
- Without hypotheses, it is impossible to test theories.
- Specifically, a hypothesis is a statement or proposition about the characteristics or appearance of variables, or the relationship between variables, that acts as a working template for a particular research study.
EXAMPLE: A famous hypothesis in social psychology was generated from a news story, when a woman in New York City was murdered in full view of dozens of onlookers. Psychologists John Darley and Bibb Latané developed a hypothesis about the relationship between helping behavior and the number of bystanders present, and that hypothesis was subsequently supported by research. It is now known as the bystander effect.
Independent t-test
Statistical analysis that compares the means of two independent groups, typically taken from the same population (although they could be taken from separate populations).
- Determines if there is a statistical difference between the two groups’ means
- We make the assumption that if randomly selected from the same population, the groups will mimic each other; the null hypothesis is no difference between the two groups
EXAMPLE: Fred is analyzing the best treatment options for his patient Harold. He reads a study comparing two different types of therapies. After utilizing an independent t-test, the researchers found that there was not a statistically significant difference between the treatment options. Harold decides that both are good options for his patient and he decides to think about his client’s person variables that might make one better than the other.
Internal consistency
Part of: psychological research
What: the extent to which different items on a test measure the same ability or trait; intercorrelations among items on the same test
-usually measured with Cronbach’s alpha but can use split-half or KR20
EXAMPLE: Patient comes in with symptoms of PTSD. You decide to search for a psychological test that is designed to help you to detect and diagnose PTSD. You come across the Posttraumatic Stress Diagnostic Scale (PDS). The test manual indicates that the PDS is a valid measure of PTSD. You look in the test manual of the PDS and find that Cronbach’s alpha is 0.91. This indicates that the PDS has strong internal consistency.
Internal validity
The extent to which the observed relationship between variables in a study reflects the actual relationship between the variables. Internal validity is a measure of the integrity of a study. Internal validity is how sure we can be that the experimental treatment was the only cause of change in a dependent variable(s). Control for confounding variables can increase internal validity, as well as a random selection of participants.
EXAMPLE: Researchers investigated a new tx for depressing using tight controls in terms of who could be a participant. For instance, they did not allow anyone with comorbidity to participate. This increased the study’s internal validity. It did, however, jeopardize the ecological validity of the research.
Interrater reliability
In research design, interrater reliability is a type of reliability that measures the agreement level between independent raters. It is useful with measures that are less objective and more subjective. This type of reliability is used to account for human error in the form of distractibility, misinterpretation or simply differences in opinion.
EXAMPLE: Three graduate students are performing a natural observation study for a class that examines violent video games and behavior in a group of 9 year old boys. The students rated the behavior on a scale of 1 (not aggressive) to 5 (very aggressive). However, the responses were not consistent between the observers. The study lacked inter-rater reliability.
Measures of central tendency
Provide statistical descriptions of the center of the distribution; describes a data set.
- Three main measures are used: the mean, mode and median.
- These help to summarize the main features of a data set and identify the score around which most scores fall.
- The mean is the arithmetic average of all scores within a data set.
- The mode is the most frequently occurring score.
- The median is the point that separates the distribution into two equal halves.
- The median and mode are the most resilient to outliers.
EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. To better understand the data that was gathered, they start by calculating the measures of central tendency: the most frequently occurring number of episodes in the group, the average number of episodes, and the number of episodes in the middle of the set. In other words, the mode median and mean.
Measures of variability
In statistics, measures of variability are measures of how scores in a distribution vary around the central tendency. Three primary measures: range, variance and standard deviation. The range is obtained from taking the two most extreme scores and subtracting the lowest from the highest. The variance is the average squared deviation around the mean, and must be squared because the sum of the variations would equal zero. The standard deviation is the square root of the variance and is highly useful in describing variability.
EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. After calculating the measure of central tendency, they decide that they want to know more about the distribution of number of episodes. They decide to calculate the measures of variability. This includes the range, variance, and standard deviation
Nominal/ordinal/interval/ratio measurements
These are 4 types of measurements seen in statistics.
-Nominal scales are scales in which labels are assigned for identification but cannot be counted or categorical data where there may be more than two categories. They have none of the 3 properties that distinguish scales (no magnitude, equal intervals or abs 0) .Ex: male/female; Republican, Democrat
-Ordinal scales have data (numbers) that indicate order only, but may not indicate what measurement was used to determine the order or the magnitude of the differences within the order. Ordinal scales are used for rankings of individuals or variables and have the property of magnitude.
-Interval scales have true score data where you know the score a person made and you can tell the actual distance between individuals based on their respective scores, but the measure used to generate the score has no true zero. Thus, they have magnitude and equal intervals between any two observations, but do not have the property of absolute zero.
Ex: most psychological measures, IQ, SAT, GRE
-Ratio scales have interval data with an absolute zero. They have all three properties of scales including magnitude, equal intervals and absolute zero.
The type of scales used dictates what statistical procedures may be run on a data set.
EXAMPLE: A researcher is creating a questionnaire to measure depression. They include nominal scale questions (“what is your gender?”) ordinal scale questions (“rank your mood today from 1-very unhappy to 5-very happy”) and ratio scale questions (“how many hours of sleep do you get on average?”)
Normal curve
A normal curve is a normal distribution, graphically represented by a bell-shaped curve.
- A frequency distribution where most occurrences take place in the middle of the distribution and taper off on either side
- All measures of central tendency are at the highest point of the curve
- Based on infinity
- Symmetrical, extremes are at the tails, further from the center=lower frequency, divisible into deviations, fits any set of data where n=infinity
EXAMPLE: A researcher is developing a new intelligence test. After obtaining the results, they found that the scores fell along a normal curve: most participants scored in the middle range with very few obtaining either the highest or lowest scores (scores were normally distributed).
Probability
A mathematical statement indicating the likelihood that something will happen or that a particular event will occur when a particular population is randomly sampled, symbolized by (p). The higher the p value, the more likely that the phenomenon or event happened by chance. Probability is based on hard data (unlike chance); p is between 0 and 1
EXAMPLE: Researchers are conducting a study on the heritability of bipolar disorder. They find
that there is a strong genetic link, meaning there is a greater probability of an individual having the disorder if one of their parents also has it.
Parametric v. nonparametric statistical analyses
Part of psychological research
What: Parametric statistical analyses are inferential procedures that require certain assumptions about the distribution of scores. They are usually used with scores most appropriately described by the mean, they are based on symmetrical distributions or distributions that come close to symmetry, they focus on 1 variable or relationship, and are robust procedures with negligible amounts of error.
Nonparametric statistical analyses involve inferential procedures that do not require stringent assumptions about the parameters of the raw score population represented by the sample data and are usually used with scores most appropriately described by the median or the mode. Nonparametric data have skewed distributions.
Parametric analyses are preferred because they have greater statistical power and are more likely to detect statistical significance.
EXAMPLE: Researchers set up a study to determine if there is a correlation between hours of sleep per night and ratings of happiness. Because they used a very small sample, they cannot assume the data are symmetrically distributed and therefore must use a nonparametric test.