PSYC523 – Statistics Flashcards
ANOVA
an abbreviation for an analysis of variance, which is a parametric procedure for determining whether significant differences exist in an experiment that contains two or more conditions. ANOVA is a test of inferential statistics.
a medical researcher wants to determine whether there is a difference in the mean length of time it takes 3 types of pain relievers to provide relief from headache pain. Headache sufferers are randomly selected, randomly assigned to 3 groups, and given one of the 3 medications. Each headache sufferer records the time in minutes it takes the medication to begin working. The mathematical formula for a one-way ANOVA is used to determine if there is a difference in the pain medications. The null hypothesis states that the 3 medications are equal. The ANOVA shows that the F-value is not in the rejection region, so the null hypothesis should not be rejected as there is not enough evidence at the 5% level of significance to conclude that there is a difference in the mean length of time it takes the 3 pain relievers to provide relief from headache pain as shown by the variance between groups.
Construct validity
in research design, construct validity is the degree to which a test or study measures the construct ( theoretical trait) that it claims to measure. There are two further aspects to construct validity: convergent and divergent validity. Convergent validity is how well a test agrees with other previously validated tests that measure the same construct. Divergent validity is the extent to which a test measures what it is supposed to and not some theoretically unrelated construct. In order to have high construct validity, a test should correlate highly with other measures of the same construct, and not correlate highly with measures of other constructs.
Ex:Client comes to therapy complaining of fatigue, loss of appetite, and feeling hopeless. The therapist uses the Beck Depression Inventory (BDI) to measure her current symptoms of depression bc of the test’s high construct validity, the BDI is a psychological assessment that accurately measures depression(construct), thereby demonstrating construct validity.
Content validity
in research design, content validity is the degree to which a test or study includes all of the facets of the construct it is attempting to measure; items should cover entire range of relevant behaviors, thoughts, and feelings that define the construct being measured. An element of subjectivity exists in relation to determining content validity, which requires a degree of agreement about what a particular construct (such as extraversion) represents. Content validity is related to face validity, but is not the same thing.
EX—The newly developed depression scale lacked content validity as it only assessed the affective dimension of depression but failed to take into account the behavioral dimension of the individual.
Correlational research
this is a form of research design that determines whether there is a relationship between two variables and, if so, what the strength of that relationship is. A correlation is a measure of a LINEAR relationship between two variables. Correlational studies yield a correlation coefficient (a number between -1.00 and 1.00) which represents the strength and direction of the relationship between the two variables.
- Correlational research cannot establish causation, Correlational research is often conducted as exploratory or beginning research. Once variables have been identified and defined, experiments are conductable.
Ex:psychologist is interested in testing the claim that people with more friends tend to be healthier. She surveys 500 people in her community, asking them how many friends they have and getting some measure of their overall health. Then she makes a scatterplot and sees that there is a positive correlation between these variables. Specifically, she finds that r = +.3, concluding that there is a positive correlation between people with more friends and good health.
Cross-sectional design
This type of study utilizes different groups of people who differ in the variable of interest, but share other characteristics such as socioeconomic status, educational background, and ethnicity.
Typically, several dependent variables are measured, and the study itself rarely takes more than a few months to complete. Cross sectional designs are advantageous because they take less time and therefore less money, but are disadvantageous because they give little information about the stability of the dependent variables and the change in them over time.Cross-sectional studies are observational in nature and are known as descriptive research, not causal or relational.
For example, researchers studying developmental psychology might select groups of people who are remarkably similar in most areas, but differ only in age. By doing this, any differences between groups can presumably be attributed to age differences rather than to other variables
Dependent t-test
a statistical procedure that is appropriate for significance testing when the scores meet the requirements of a parametric test, the design involves matched groups or repeated measures, and there are only two conditions of the independent variable.
Ex:Performed a t-test to evaluate significance in scores on an intervention to improve life satisfaction among grad students. Program “Happy”
is implemented with our sample of 30 participants. Life satisfaction is measured by an index score at pretest and posttest phases. The life satisfaction index is operationally defined as a continuous and ratio measure ranging from 0 to 100 with lower scores indicating lower life satisfaction and higher scores indicating higher life satisfaction.
The intervention test phase is the independent variable (posttest versus pretest) and the life satisfaction index score is the dependent variable. Test phase is categorical and nominal with two subcategories. The life satisfaction score is continuous and ratio.
The dependent-samples t test compares the average values of a characteristic measured on
a continuous scale (life satisfaction) between two conditions of the same group (e.g., assessment pretest vs posttest)
Descriptive vs. Inferential
descriptive statistics are those which are used to concisely describe a data set. Inferential statistics are those which use a smaller representative sample to draw conclusions about a larger group. Descriptive statistics can only be used to describe the sample that they are conducted on; inferential statistics can be used to make generalizations about a larger population from a small sample.
Ex:Frequency distributions, measures of central tendency (mean, median, and mode), and graphs like pie charts and bar charts that describe the data are all examples of descriptive statistics. ex: 15% of students at Made-up High have been a victim of bullying.
Examples of inferential statistics include linear regression analyses, ANOVA, correlation analyses, ex: 15% of children in high school have been a victim of bullying.
Double-blind study
a type of experimental design in which both the participants and the researchers are unaware of who is in the experimental condition and who is in the placebo condition. This is in contrast to a single-blind, where only the participants are unaware. Double-blind studies eliminate the possibility that the researcher may somehow communicate (knowingly or unknowingly) to a participant which condition they are in, thereby contaminating the results.
ex:A study to test the efficacy of a new SSRI targeted to alleviate anxiety symptoms in returning vets suffering from PTSD used a double-blind study in order to increase the internal validity and reduce experimenter bias. Neither the experimenter nor participants were aware of who was in the treatment group and who was receiving a placebo until the results were calculated. This setup ensured that the experimenter could not make subtle gestures signaling who was receiving the drug and who was not and that experimenter expectations could not affect the studies outcome. With this double-blind design the drug proved to be extremely efficacious in treating anxiety in PTSD.
Ecological validity
in the context of research, Ecological validity is present to the degree that a result generalizes across different settings. It is the extent to which an experimental situation approximates the real-life situation which is being studied.
Reactivity, a threat to ecological validity, is defined as an alteration in performance that occurs as a result of being aware of participating in a study.
Reactivity is a problem of ecological validity because the results might only generalize to other people who are also being observed.
Researchers have called for making experiments more ecologically valid in hopes that they would generalize better to the real world.
Ex: research carried out at the local university looked at the effects of getting 4 hours of sleep or less on cognitive performance. The subjects were primarily adults over the age of 25 and
the experiment was monitored in a laboratory setting.
Study has low ecological validity when applied to the population as a whole since college students are mainly young and rather accustomed to functioning on low amounts of sleep and then going to class and performing cognitive tasks. The setting
also had some artificial features lacking in the real
world (e.g., the research participant was aware of the goals of the study.)
Experimental research
a form of research in which one variable (the independent variable) is manipulated in order to see what effect it will have on another variable (the dependent variable). Researchers will try to control any other variables (confounds) that may affect the dependent variable, in order to establish that if a change occurred it was caused by the independent variable. Experimental research is the only kind of research which can establish causation.
Ex: using experimental research the psychologist randomly assigned the participants who fit the criteria for depressive symptoms into an experimental and a control group. He then administered the an anti-depressant drug (the IV) daily to the experimental group and a placebo on to the control group to determine what effects the drug combined would have on the pts. depressive symptoms. When compared to the control group, the symptoms (DV) of the experimental group improved significantly.
Independent t-test
a statistical procedure used for significance testing that is appropriate when the scores meet the requirements of a parametric test, the design involves independent samples, and there are only two conditions of the independent variable. Independent t-test is important in determining significance.
Ex:In examining the life satisfaction of grad students, we are interested in the relationship between gender and life
satisfaction levels. We use a sample of 30 participants and examine whether life satisfaction (measured by an index score)
varies between male and female grad students.
The life satisfaction index is operationally defined as a continuous and ratio measure ranging from 0 to 100 with lower scores indicating lower life satisfaction and higher scores indicating higher life satisfaction.
Gender is the independent variable and the life satisfaction index score is the dependent variable. Gender is categorical and nominal with two subcategories (male and female). The life satisfaction index score is continuous and ratio.
compares the average values of a characteristic measured on a continuous scale (Satisfaction) between two subgroups of a categorical variable. (Gender)
Internal consistency
In statistics and research, Internal consistency is a measure of reliability used to evaluate the degree to which different test items that probe the same construct produce similar results.
There are three methods used to measure internal consistency reliability:
Cronbach’s alpha: The most commonly used measurement of internal consistency.
Split-halves test: Involves splitting the test items in half (i.e., forming a group of all even items and another group with all of the odd items) and correlating the two halves.
Kuder-Richardson test: Similar to the split-halves test. You find the average correlation for all of the possible split-half combinations.
Internal consistency ranges between zero and one.
No matter which method you use, the closer your measurement is to 1, the higher your internal consistency is.
Ex: Patient comes in with symptoms of PTSD after surviving a car accident. You decide to search for a psychological test that is designed help you to detect and diagnoses PTSD. You come across the Posttraumatic Stress Diagnostic Scale (PDS). The test manual indicates that the PDS is a valid measure of PTSD. You look in the test manual of the PDS and find that Cronbach’s alpha is 0.91. This indicates that the PDS has strong internal consistency.
Internal validity
Internal validity can be defined as the degree to which the independent variable causes the changes seen in the dependent variable being examined within the study. The internal validity of a study is related to the researcher’s control of extraneous variables. Therefore, an experiment conducted in a laboratory with high control can eliminate extraneous variables more easily and establish internal validity. Also, construct validity must be established before internal validity can be attained.
Ex:The drug company used tight controls for the participants allowed to be in the study to test a new drug for depression. They did not allow anyone with a comorbidity to participate. This increased the internal validity of the research and showed high statistical and clinical significance for the efficacy of the drug as there were no extraneous variables of other disorders to conflict with the treatment. It did, however, jeopardize the external validity of the research.
Interrater reliability
in research design, this is a type of reliability that measures the agreement level between independent raters. It is used with measures that are less objective and more subjective. This type of reliability is used to account for human error in the form of distractibility or misinterpretation.
Ex: three grad students are performing a natural observation study to examine violent video games and behavior of a group of 9 year old boys. The students rated the behavior on a scale of 1 (not aggressive) to 5 (very aggressive). However, the responses were not consistent between the students. The study lacked inter-rater reliability. It was decided that the raters needed more training to properly define the construct of aggressive b/h in order to increase inter-rater reliability
Measures of central tendency
in statistics, a measure of central tendency is a single value that describes the way in which a group of data cluster around a central value. They help to summarize the main features of a data set and identify the score around which most scores fall.
The mean, median, and mode are the three measures of central tendency. The mean and median can only be used for numerical data, The mode can be used with both numerical and nominal data. The mean is the arithmetic average of all scores within a data set; the mode is the most frequently occurring score; the median is the point that separates the distribution into two equal halves. The median and mode are not as affected by outliers as the mean.
Central tendency is useful: It lets us know what is normal or ‘average’ for a set of data. allows you to compare one data set to another. Central tendency is also useful when you want to compare one piece of data to the entire data set.
Ex:According to the bell curve that represents IQ, the mean is 100 with a standard deviation of 15. Many psychologists use this measure of central tendency when evaluating the mental stability and capacity of children and mentally ill pts The child was given the Stanford-Binet IQ measure and scored one SD above the mean of 100 for a score of 115. The measure of intelligence did not correlate with his failing grades and he was referred for counseling.