PSYC 523 Statistics Flashcards
ANOVA
analysis of variance; a parametric statistical technique used to compare more than two experimental groups at a time; determines whether there is a significant difference between the groups but does not reveal where that difference lies - must do further tests to determine.
EXAMPLE: A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. You will gather data and run an ANOVA on the three groups- counseling, medication, and biofeedback to see if there is a significant difference between any of them.
Clinical vs statistical significance
clinical significance: a type of significance assessed in research and measured in the participants.
How important are the changes to the patient? Looks at the patient’s quality of life - symptoms, remission, etc. Do they still meet criteria for diagnosis? What are the percentages of patient benefitting? Bigger Picture
Statistical significance: is the obtained result likely to be attributable to chance factors? Looks from an experimental standpoint, data driven
EXAMPLE: A clinical significance analysis revealed that the drug did not improve the pt’s depressive symptoms enough considering some of the harsh side effects and its expensive price tag. Therefore, the professional decided that the statistical significance did not warrant changing the current meds he prescribed for his depressed pts.
Construct validity
In research design, construct validity is the degree to which a test or study measures the qualities or the constructs that it is claiming to measure.
Comes in two forms: convergent validity is how well a certain measure of a construct correlates with other well established measure of that construct.
Divergent validity is how well the measure of a construct does not correlate with measures of other constructs.
In order to have high construct validity, a test should correlate highly with measures of the same construct (convergent validity) and not correlate highly with measures of other constructs (divergent validity).
EXAMPLE: A group of researchers create a new test to measure depression. They want to
ensure that the test has construct validity, in that it is actually measures the construct of depression. To do this, they measure how much the test correlates with the Beck Depression Inventory and how much it does not measure another concept like anxiety.
Content validity
In research design content validity is the degree to which a measure or study includes all of the facets/aspects of the construct that it is attempting to measure. Content validity cannot be measured empirically but is rather assessed through logical analysis; related to face validity but it is not the same thing.
EXAMPLE: A depression scale may lack content validity if it only assesses the affective dimension of depression (emotion related- decrease in happiness, apathy, hopelessness) but fails to take into account the behavioral dimension (sleeping more or less, eating more or less, energy changes, etc) Because of this therapists, end up using other scales.
Correlation vs causation
In the context of research, correlation examines relationships between 2 variables, and this relationship can be positive or negative; coefficient will fall between -1.00 and +1.00; correlation uses data/variables that currently exist and are not manipulated. Correlation examines the extent to which one variable correlates or responds with a change in the other variable.
Causation indicates that one event is the result of the occurrence of the other event, i.e., there is a cause and effect relationship between the two events. Causality is usually determined via controlled studies, when you can isolate variables you want to examine and control for extraneous variables.
Correlation is not causation!
EXAMPLE: Shelia’s patient Donna suffers from illness anxiety disorder. She brings Shelia an article claiming that eating out of plastic containers causes cancer. After reading the article, Shelia explains that the study referenced in the article is a correlational study, which only shows that there is a relationship between eating out of plastic containers and cancer, but it does not prove that eating out of plastic containers causes cancer.
Correlational research
research method that examines the relationship between two variables;
the technique simply identifies a mathematical relationship and does not in any way establish causal factors;
Produces correlation coefficient; ranges from 1.0 to -1.0 depending on strength/direction of relationship between the two variables
PROS - inexpensive, produces wealth of data, encourages future research, precursor to experiment determining causation CONS - cannot establish causation or control for confounds
Statistical tests include Pearson, Spearman, & point-biserial
EXAMPLE: Shelia’s patient Donna suffers from illness anxiety disorder. She brings Shelia an article claiming that eating out of plastic containers causes cancer. After reading the article, Shelia explains that the study referenced in the article is a correlational study, which only shows that there is a relationship between eating out of plastic containers and cancer, but it does not prove that eating out of plastic containers causes cancer.
Cross sectional design
a type of research that simultaneously compares individuals of different ages at one specific point in time
Groups can be compared across a variety of dependent variables
Very common - online surveys
Advantages include collection of large amounts of data in a short amount of time & low cost
Drawbacks include not providing info about the aging process, and inability to infer causation (because it is just a snapshot);
Considered quasi-experimental design (participants are not selected randomly - selected based on age)
EXAMPLE: You’re treating someone with depression. He is having a hard time finding energy to carry out daily activities. The therapist shows him a cross-sectional study looking at depression levels and the utilization of behavioral activation. Specifically, the effectiveness of taking daily walks to increase energy level. The therapist explains that those who walk daily have been shown to have lower depression and higher energy levels, especially for his age group.
Dependent T-test
sometimes called a correlated t-test because the data are correlated; statistical analysis that compares the means of two related groups to determine whether there is a statistically significant difference between these means
Used when the design involves matched pairs or repeated measures, and only two conditions of the independent variable
It is called “dependent” because the subjects carry across the manipulation–they take with them personal characteristics that impact the measurement at both points—thus measurements are “dependent” on those characteristics
EXAMPLE: A researcher wants to determine the effects of caffeine on memory. They administer a memory test to a group of subjects have the subjects consume caffeine then administer another memory test. Because they used the same subjects, this is a repeated measures experiment that requires a dependent t-test during statistical analysis.
Descriptive vs Inferential
Descriptive statistics are those which are used to describe and summarize a data set.
Can only be used to describe the sample they are conducted on.
Common tools include measures of central tendency, skewness, etc.
We choose a group that we want to describe and then measure all subjects in that group
Inferential statistics takes data from a sample and makes inferences about the larger population from which the sample was drawn – generalizations
Need to have confidence that sample accurately reflects the population (population must be defined) → importance of random sampling
Common techniques include hypothesis testing, regression analysis, etc.
The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population.
EXAMPLE: A researcher conducts a study examining the rates of test anxiety in Ivy League
students. This is a descriptive study because it is concerned with a specific population. However, this study cannot be generalized to represent all college studies, so it is not an inferential
study.
Double blind study
A type of experimental design in which both the participants and the researchers are unaware of who is in the experimental condition and who is in the placebo condition.
In contrast to a single-blind study, where only the participants are unaware of who is in the experimental condition.
Double-blind studies eliminate the possibility that the researcher may somehow communicate (knowingly or unknowingly) to a participant which condition they are in, thereby contaminating the results.
Neither the subjects nor the persons administering the experiment know the critical aspects of the experiment, which guards against experimenter bias and placebo effects.
EXAMPLE: A study testing the efficacy of a new SSRI for anxiety is using a double-blind study. Neither the experimenter nor the participants are aware of who is in the treatment group and who is receiving a placebo. This setup ensures that the experimenters do not make subtle gestures accidentally signaling who is receiving the drug and who is not, and that experimenter expectations could not affect the studies outcome.
Ecological validity
part of research; the extent to which an experimental situation approximates the real-life situation which is being studied
Researchers call for these in hopes they will better generalize to the real-world
Different from external validity
Experiments high in ecological validity tend to be low in reliability because there is less control of the variables in real-world like settings
EXAMPLE: A researcher wants to study the effects of alcohol on sociability, so they administer beer to a group of subjects and have them interact with each other. To increase their ecological validity they decide to carry out the study in an actual bar.
Effect size
relates to statistical analysis and research; a quantitative measure of the strength of a relationship between two variables; refers to magnitude of an effect
It is also valuable for quantifying the effectiveness of a particular intervention, relative to some comparison - commonly used in Meta-anlyses
Effect size can be used with the correlation between two variables, regression coefficients or the mean difference.
EXAMPLE: A researcher conducts a correlational research study on the relationship between caffeine and anxiety ratings. The study produces a correlation coefficient of 0.8 which is considered a large effect size. In other words the effect size reflects a strong relationship betwen the caffeine and anxiety.
Experimental research
a form of research in which one variable (the independent variable) is manipulated in order to see what effect it will have on another variable (the dependent variable)
Researchers try to control any other variables (confounds) that may affect the dependent variable(s)
Experimental research is the only way to establish causation
EXAMPLE: A researcher conducts an experimental research study to examine the relationship between caffeine anxiety ratings. The study administers various levels of caffeine (the independent variable) to the low, high, and no caffeine groups. The participants are then asked to report their anxiety levels (the dependent variable). They found that those who had more caffeine reported feeling more anxious.
Hypothesis
In the field of research, a hypothesis is as formally stated prediction that can be tested for its accuracy.
Essential to the scientific method and testing in research
Hypotheses help to focus the research and bring it to a meaningful conclusion.
Many hypotheses go into the making of a psychological theory.
Without hypotheses, it is impossible to test theories.
Specifically, a hypothesis is a statement or proposition about the characteristics or appearance of variables, or the relationship between variables, that acts as a working template for a particular research study.
EXAMPLE: A famous hypothesis in social psychology was generated from a news story, when a woman in New York City was murdered in full view of dozens of onlookers. Instead of simply shaking their heads in sadness, psychologists John Darley and Bibb Latané developed a hypothesis about the relationship between helping behavior and the number of bystanders present, and that hypothesis was subsequently supported by research. It is now known as the bystander effect.
Independent T-test
statistical analysis that compares the means of two independent groups, typically taken from the same population (although they could be taken from separate populations)
Determines if there is a statistical difference between the two groups’ means
We make the assumption that if randomly selected from the same population, the groups will mimic each other; the null hypothesis is no difference between the two groups
EXAMPLE: Fred is analyzing the best treatment options for his patient Harold. He reads a study comparing two different types of therapies. After utilizing an independent t-test, the researchers found that there was not a statistically significant difference between the treatment options. Harold decides that both are good options for his patient and he decides to think about his client’s person variables that might make one better than the other.
Internal Consistency
In the context of research, this type of reliability refers to the extent to which different items on a test measure the same ability or trait.
In other words, internal consistency measures whether several items that propose to measure the same general construct produce similar scores and are free from error.
Internal consistency is usually measured with Cronbach’s alpha; measured using split-half in which both halves are correlated or by using the reliability coefficient - ranges from 0-1
EXAMPLE: Patient comes in with symptoms of PTSD. You decide to search for a psychological test that is designed to help you to detect and diagnose PTSD. You come across the Posttraumatic Stress Diagnostic Scale (PDS). The test manual indicates that the PDS is a valid measure of PTSD. You look in the test manual of the PDS and find that Cronbach’s alpha is 0.91. This indicates that the PDS has strong internal consistency.
Internal validity
the extent to which the observed relationship between variables in a study reflects the actual relationship between the variables.
A study that is internally valid is free from flaws in its internal structure and therefore may establish a causal relationship.
Thus, internal validity is how sure we can be that the experimental treatment was the only cause of change in a dependent variable(s)
Control for confounding variables can increase internal validity as well as a random selection of participants.
EXAMPLE: Researchers investigated a new tx for depressing using tight controls in terms of who could b a participant. For instance, they did not allow anyone with comorbidity to participate. This increased the study’s internal validity. It did, however, jeopardize the ecological validity of the research.
Interrater reliability
In research design, interrater reliability is a type of reliability that measures the agreement level between independent raters
It is useful with measures that are less objective and more subjective
This type of reliability is used to account for human error in the form of distractibility, misinterpretation or simply differences in opinion.
EXAMPLE: Three graduate students are performing a natural observation study for a class that examines violent video games and behavior in a group of 9 year old boys. The students rated the behavior on a scale of 1 (not aggressive) to 5 (very aggressive). However, the responses were not consistent between the observers. The study lacked inter-rater reliability.
Measures of central tendency
provides statistical descriptions of the center of the distribution; describes a data set.
These help to summarize the main features of a data set and identify the score around which most scores fall.
Three main measures are used: the mean, mode and median.
The mean is the arithmetic average of all scores within a data set.
The mode is the most frequently occurring score.
The median is the point that separates the distribution into two equal halves.
The median and mode are the most resilient to outliers
EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. To better understand the data that was gathered, they start by calculating the measures of central tendency: the most frequently occurring number of episodes in the group, the average number of episodes, and the number of episodes in the middle of the set. In other words, the mode median and mean.
Measures of variability
In statistics, measures of variability are measures of how scores in a distribution vary around the central tendency. Three primary measures: range, variance and standard deviation. The range is obtained from taking the two most extreme scores and subtracting the lowest from the highest. The variance is the average squared deviation around the mean, and must be squared because the sum of the variations would equal zero. The standard deviation is the square root of the variance and is highly useful in describing variability.
EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. After calculating the measure of central tendency, they decide that they want to know more about the distribution of number of episodes. They decide to calculate the measures of variability. This includes the range, variance, and standard deviation
Nominal/ordinal/interval/ratio measurement
These are 4 types of measurements seen in statistics. Nominal scales are scales in which labels are assigned for identification but cannot be counted or categorical data where there may be more than two categories. They have none of the 3 properties that distinguish scales (no magnitude, equal intervals or abs 0) .Ex: male/female; Republican, Democrat Ordinal scales have data (numbers) that indicate order only, but may not indicate what measurement was used to determine the order or the magnitude of the differences within the order. Ordinal scales are used for rankings of individuals or variables and have the property of magnitude. Interval scales have true score data where you know the score a person made and you can tell the actual distance between individuals based on their respective scores, but the measure used to generate the score has no true zero. Thus, they have magnitude and equal intervals between any two observations, but do not have the property of absolute zero. Ex: most psychological measures, IQ, SAT, GRE Ratio scales have interval data with an absolute zero. They have all three properties of scales including magnitude, equal intervals and absolute zero. The type of scales used dictates what statistical procedures may be run on a data set. EXAMPLE: A researcher is creating a questionnaire to measure depression. They include nominal scale questions (“what is your gender?”) ordinal scale questions (“rank your mood today from 1-very unhappy to 5-very happy”) and ratio scale questions (“how many hours of sleep do you get on average?”)
Normal curve
aka normal distribution; a bell-shaped curve
A frequency distribution where most occurrences take place in the middle of the distribution and taper off on either side
All measures of central tendency are at the highest point of the curve
Based on infinity
Symmetrical, extremes are at the tails, further from the center=lower frequency, divisible into deviations, fits any set of data where n=infinity
EXAMPLE: A researcher is developing a new intelligence test. After obtaining the results, they found that the scores fell along a normal curve: most participants scored in the middle range with very few obtaining either the highest or lowest scores (scores were normally distributed).
Probability
A mathematical statement indicating the likelihood that something will happen or that a particular event will occur when a particular population is randomly sampled, symbolized by (p). The higher the p value, the more likely that the phenomenon or event happened by chance. Probability is based on hard data (unlike chance); p is between 0 and 1
EXAMPLE: Researchers are conducting a study on the heritability of bipolar disorder. They find
that there is a strong genetic link, meaning there is a greater probability of an individual having the disorder if one of their parents also has it.
Parametric vs Non-parametrics Statistical Analyses
Parametric statistical analyses are inferential procedures that require certain assumptions about the distribution of scores. They are usually used with scores most appropriately described by the mean, they are based on symmetrical distributions or distributions that come close to symmetry, they focus on 1 variable or relationship, and are robust procedures with negligible amounts of error.
Nonparametric statistical analyses involve inferential procedures that do not require stringent assumptions about the parameters of the raw sore population represented by the sample data and are usually used with scores most appropriately described by the median or the mode. Nonparametric data have skewed distributions.
Parametric analyses are preferred because they have greater statistical power and are more likely to detect statistical significance.
EXAMPLE: Researchers sets up a study to determine if there is a correlation between hours of sleep per night and ratings of happiness. Because they used a very small sample, they cannot assume the data are symmetrically distributed and therefore must use a nonparametric test.