Research Methods and Test construction Flashcards
In the context of psychological assessment, the terms “floor” and “ceiling” refer to:
A. the lowest and highest true scores an examinee is likely to have, given his or her obtained predictor score.
B. the lowest and highest scores an examinee is likely to obtain on a criterion, given his or her predictor score.
C. the degree to which a test can discriminate among examinees who have very low levels or very high levels of the characteristic measured by the test.
D. the degree to which a test accurately predicts the criterion scores of examinees who obtain very low scores or very high scores on the test.
A test has limited floor when it cannot discriminate well among examinees who have a low level of the characteristic measured by the test because the test does not include a sufficient number of easy items. In contrast, a test has limited ceiling when it cannot discriminate well among examinees who have a high level of the characteristic measured by the test because it does not include a sufficient number of difficult items.
A psychologist is planning a research study to evaluate the effects of a two-hour online lecture on statistics for improving the statistics knowledge of 35 psychologists who have just started studying for the EPPP. All participants will (1) take a pre-test consisting of 50 multiple-choice statistics questions on Monday, (2) attend the online lecture on Wednesday evening, and (3) take a post-test consisting of 50 multiple-choice statistics questions that are equivalent to the pre-test questions on Friday. To analyze the data she obtains in her study, the researcher will use which of the following?
A. t-test for a single sample
B. t-test for correlated samples
C. two-way ANOVA
D. single-sample chi-square test
The first steps in identifying the appropriate statistical test are to identify the independent and dependent variables and the scale of measurement of the dependent variable. This study’s independent variable is the lecture on statistics and the dependent variable is statistics test score. The dependent variable is measured on a ratio scale, which means that the statistical test will be used to compare the mean scores obtained by the psychologists on the pre- and post-tests. The t-test and ANOVA are both used to compare mean scores, but because there are only two means, the t-test is the appropriate test. To determine which t-test to use, you determine how the means will be obtained: In this study they will be obtained from a single group of subjects, and the t-test for correlated samples is used when two means are obtained from the same group or from two groups that are related in some way.
Which of the following is a culture-reduced measure of fluid intelligence?
A. Kuhlmann-Anderson
B. Raven’s Progressive Matrices
C. Woodcock-Johnson
D. Slosson Intelligence Test
The Raven’s Progressive Matrices (RPM) tests are measures of fluid intelligence and are considered to be culture-reduced because they do not use language and performance does not depend on specific cultural or academic learning. There are three RPM tests: Standard Progressive Matrices, Colored Progressive Matrices, and Advanced Progressive Matrices.
Which of the following is a type of counterbalanced design?
A. Solomon four-group
B. Latin square
C. factorial
D. multiple-baseline
Counterbalancing is used to control order effects that may occur when a within-subjects design is used – i.e., when subjects in each group will receive or participate in all levels of the independent variable. The Latin square is a type of counterbalanced design that ensures that the different levels of the independent variable are assigned to the groups of subjects so that each level appears an equal number of times in each ordinal position.
Which of the following best describes ethical requirements regarding sexual relationships with students?
A. Psychologists are prohibited from becoming sexually involved with current and former students in their department.
B. Psychologists are prohibited from becoming sexually involved with students who are “vulnerable to undue influence.”
C. Psychologists are prohibited from becoming sexually involved with students when doing so creates an unacceptable multiple relationship.
D. Psychologists are prohibited from becoming sexually involved with students over whom they have or are likely to have evaluative authority.
Correct answer
D. Psychologists are prohibited from becoming sexually involved with students over whom they have or are likely to have evaluative authority.
This is the best answer because it’s closest to the language of Standard 7.07 of the APA Ethics Code and Standard II.29 of the Canadian Code of Ethics, which prohibit psychologists from becoming sexually involved with students who are in their departments or over whom they have evaluative authority.
The Generations Study (Bishop et al., 2020) compared the timing and pacing of sexual identity development for lesbian, gay, and bisexual individuals ages 18 to 26, 32 to 43, and 50 to 60. The results of the study found that:
A. members of the three age cohorts reported similar ages of onset for all milestones and similar lengths of time between milestones.
B. members of the youngest age cohort reported the earliest ages of onset for all milestones, but members of all three age cohorts reported similar lengths of time between milestones.
C. members of the youngest age cohort reported the earliest ages of onset for all milestones, but members of the oldest age cohort reported the shortest lengths of time between milestones.
D. members of the youngest age cohort reported the earliest ages of onset for all milestones and the shortest lengths of time between milestones.
Correct answer
D. members of the youngest age cohort reported the earliest ages of onset for all milestones and the shortest lengths of time between milestones.
Participants in the Generations Study were asked about the ages when they first experienced five sexual identity milestones:
1. awareness of same-sex attraction;
2. self-identification as lesbian, gay, or bisexual;
3. same-sex sexual behavior;
4. disclosure as a sexual minority to a straight friend; and
5. Disclosure as a sexual minority to a family member.
Consistent with previous research, it found that members of the youngest age cohort (ages 18 to 26) reported the earliest ages of onset for all milestones and the shortest lengths of time between milestones.
Older adults with mild neurocognitive disorder due to Alzheimer’s disease would most likely obtain the highest score on the WAIS-IV __________ Index and lowest score on the __________ Index.
A. Verbal Comprehension; Working Memory
B. Working Memory; Perceptual Reasoning
C. Perceptual Reasoning; Processing Speed
D. Verbal Comprehension; Processing Speed
Correct answer
D. Verbal Comprehension; Processing Speed
Explanation
The WAIS-IV Technical and Interpretive Manual (Psychological Corporation, 2008) reports the following mean Index scores for individuals with mild Alzheimer’s disease: Verbal Comprehension 86.2, Perceptual Reasoning 85.8, Working Memory 84.3, and Processing Speed 76.6.
A division of labor and a hierarchy of authority are defining characteristics of which of the following organizational theories?
A. Weber’s bureaucracy
B. McGregor’s Theory Y
C. Fiedler’s contingency theory
D. Katz and Kahn’s open-system theory
Correct answer
A. Weber’s bureaucracy
Explanation
As described by Weber (1947), an ideal bureaucracy is characterized by a division of labor, a hierarchy of authority, clearly defined rules and procedures, impersonal relationships based on position, and selection and promotion decisions based on an applicant’s or employee’s technical competence.
Cataplexy, hypocretin deficiency, and short rapid eye movement sleep latency are symptoms of:
A. narcolepsy.
B. sleep apnea.
C. temporal lobe seizures.
D. Tourette’s syndrome.
Correct answer
A. narcolepsy.
Explanation
The DSM-5 diagnosis of narcolepsy requires irrepressible episodes of sleep (daytime naps) with at least one of the following: cataplexy, hypocretin deficiency, or REM sleep latency less than or equal to 15 minutes.
In a normal distribution, a percentile rank of ____ is one standard deviation above the mean of the distribution.
A. 50
B. 68
C. 84
D. 97
Answer C is correct. A percentile rank is a transformed score that indicates the percentage of scores that fall at or below a given score. In a normal distribution, when an examinee’s percentile rank is 84, this means that his/her score is one standard deviation above the mean.
Dr. Oz conducts a study to compare the effects of three treatments (drug, relaxation, and drug plus relaxation) on the systolic blood pressure of patients who have secondary hypertension as the result of three different conditions (tobacco use, chronic alcohol use, or obesity). To analyze the main and interaction effects of treatment and condition on systolic blood pressure, Dr. Oz will use which of the following statistical tests?
A. chi-square test for multiple samples
B. t-test for independent samples
C. two-way ANOVA
D. MANOVA
Answer C is correct. The first two things to do when choosing an inferential statistical test are to identify (1) the independent and dependent variables and (2) the scale of measurement of the dependent variable. This study has two independent variables – treatment and condition – and one dependent variable – systolic blood pressure. You may have had trouble with this question because you were uncertain about the scale of measurement of systolic blood pressure. However, a general rule is that most physical measurements (including systolic blood pressure) represent an interval or ratio scale, which means that the appropriate statistical test will be a t-test or analysis of variance. To choose between the t-test, the two-way ANOVA, and the MANOVA, you have to consider how many independent and dependent variables there are. There are two independent variables, which eliminates the t-test because it’s used when there’s only one independent variable; and there is one dependent variable, which eliminates the MANOVA because it’s used when there are two or more dependent variables. That leaves the two-way ANOVA, which is the correct answer. It’s used when a study includes two independent variables and one dependent variable that’s measured on an interval or ratio scale.
When test scores represent an interval or ratio scale and the distribution of scores is skewed, the best measure of central tendency for the distribution is usually which of the following?
A. mode
B. mean
C. median
D. minuend
Answer C is correct. The choice of the appropriate measure of central tendency not only depends on the scale of measurement of the data but also on several other factors including the shape of the data distribution. When the data represent an interval or ordinal scale, the mean is ordinarily the appropriate measure of central tendency. However, when the distribution is skewed, the mean may provide misleading information because its magnitude is affected by the extreme outliers. Consequently, for a skewed distribution, the median is a better measure of central tendency because it’s not affected by the extreme outliers and is more representative of the typical score in the distribution. (The minuend is the first term in a subtraction problem – e.g., 30 in the problem 30 – 10. It’s NOT something you need to be familiar with for the exam.)
The size of the standard error of the mean increases as:
A. the population standard deviation increases and the sample size decreases.
B. the population standard deviation decreases and the sample size increases.
C. the population standard deviation and the sample mean both decrease.
D. the population standard deviation and the sample mean both increase.
Answer A is correct. The standard error of the mean is the standard deviation of the sampling distribution of the mean and is calculated by dividing the population standard deviation by the square root of the sample size. Consequently, as the population standard deviation (the numerator) increases and the sample size (the denominator) decreases, the standard error of the mean increases.
A research study involves comparing the short- and long-term effects of three brief behavioral treatments for social anxiety disorder by randomly assigning clinic patients who have just received the diagnosis to one of the three treatments and assessing their symptoms before treatment and then one week, four weeks, 12 weeks, and 24 weeks following treatment. Which of the following research designs is being used in this study?
A. counterbalanced
B. mixed
C. between groups
D. within subjects
Answer B is correct. A mixed design is being used when a study has at least two independent variables and one variable is a between groups variable and the other is a within subjects variable. The study described in this question has two independent variables – treatment and time: Treatment is a between subjects variable because each subject will receive only one of the three treatments, while time is a within subjects variable because all subjects will be assessed at five different times.
To measure the degree of association between high school diploma (yes or no) and yearly income in dollars, you would use which of the following correlation coefficients?
A. Spearman rho
B. point biserial
C. contingency coefficient
D. eta
Answer B is correct. The point biserial correlation coefficient is the appropriate correlation coefficient when one variable is a true dichotomy and the other is measured on a continuous (interval or ratio) scale. High school diploma (yes or no) is a true dichotomy and yearly income in dollars is measured on a continuous scale.
The internal validity of a research study is threatened by statistical regression when:
A. more participants with average scores on the pretest dropped out of the study than did participants with high or low scores.
B. more participants with average scores responded favorably to the independent variable than did other participants.
C. participants were chosen for inclusion in the study because they obtained extremely low scores on a pretest.
D. participants were chosen for inclusion in the study because they obtained average scores on a pretest.
Answer C is correct. Statistical regression refers to the tendency of extremely high and low scores to “regress to the mean” on retesting. It can threaten a study’s internal validity because, when extremely high and low scores become less extreme on retesting, this may be due to statistical regression rather than the effects of the independent variable.
You would use the multivariate analysis of variance (MANOVA) to analyze the data you obtain in a research study when:
A. you want to control the effects of an extraneous variable by statistically removing its effects on the dependent variable.
B. you want to measure the main and interaction effects of an extraneous variable on the dependent variable.
C. your study includes at least one between-subjects variable and one within-subjects variable.
D. your study includes two or more dependent variables that are measured on an interval or ratio scale.
Answer D is correct. The MANOVA is used when a study includes one or more independent variables and two or more dependent variables that are each measured on an interval or ratio scale.
To determine the correlation between college graduate (yes or no) and yearly income in dollars, you would use which of the following correlation coefficients?
A. phi coefficient
B. Pearson r
C. point biserial coefficient
D. Spearman rho
Answer C is correct. The point biserial correlation coefficient is used when one variable is a true dichotomy (e.g., college graduate or nongraduate) and the other variable is continuous (e.g. yearly income in dollars).
For a test that consists of 50 true/false questions, the optimal average item difficulty level (p) is:
A. 1.0
B. .75
C. .50
D. .25
Answer B is correct. The optimal difficulty level for test questions depends on several factors including the chance that examinees can choose the correct answers just by guessing. With regard to this factor, the optimal difficulty level falls halfway between 100% and the probability of choosing the correct answer by guessing: For true/false questions, the probability of guessing correctly is 50%, so the optimal difficulty level is halfway between 1.0 and .50, which is .75.
A test has a mean of 60 and standard deviation of 5, and test scores are normally distributed. Based on this information, you can conclude that about 95% of scores fall between scores of:
A. 55 and 65.
B. 50 and 70.
C. 45 and 75.
D. 40 and 80.
Answer B is correct. In a normal distribution about 95% of scores fall between the scores that are plus and minus two standard deviations from the mean. This test has a mean of 60 and standard deviation of 5, so about 95% of scores fall between 60 plus and minus 10 (5 x 2), which is between 50 and 70.
To compare the effectiveness of two brief treatments for social anxiety disorder, you obtain a sample of individuals who have received this diagnosis, determine the severity of each subject’s social anxiety, match the subjects in pairs based on the severity of their symptoms, and randomly assign one member of each pair to one of the treatments and other member to the other treatment. To compare the scores subjects in the two groups receive on a measure of symptom severity after they receive treatment, you will use which of the following?
A. t-test for correlated samples
B. t-test for uncorrelated samples
C. two-way ANOVA
D. single-sample chi-square test
Answer A is correct. The first steps in identifying the appropriate statistical test are to identify the independent and dependent variables and the scale of measurement of the dependent variable. This study’s independent variable is type of treatment and the dependent variable is score on a measure of severity of anxiety following treatment. More specifically, the dependent variable is score on a measure of symptom severity and, for the exam, you can assume that scores represent an interval or ratio scale. This means that a statistical test will be used to compare the mean scores obtained by two groups. The t-test and two-way ANOVA are both used to compare mean scores but, because there are only two means, the t-test is the appropriate test. To determine which t-test to use, you determine how the means were obtained: In this study, they were obtained from related groups (from groups that consist of subjects who were matched in terms of initial symptom severity). The t-test for correlated samples is the appropriate test when two means are obtained from the same group or from two groups that are related in some way.
To evaluate the effectiveness of a stress reduction technique for alleviating test anxiety, a psychologist administers a measure of test anxiety to Psychology 101 undergraduates, chooses the 50 students with the highest scores on the test, administers the intervention to the students, and then readministers the measure of test anxiety. The biggest threat to this study’s internal validity is:
A. differential selection
B. reactivity
C. statistical regression.
D. instrumentation
Answer C is correct. Whenever subjects are chosen for inclusion in a study because they have extreme scores on the pretest, their scores on the posttest are likely to “regress toward the mean” regardless of the effects of the independent variable. This is referred to as statistical regression, and it threatens a study’s internal validity whenever it’s not possible to ascertain to what extent a change in posttest scores is due to statistical regression or the effects of the independent variable.
You would use which of the following statistical tests to compare the number of adults living in a rural, urban, or suburban community who have received a diagnosis of a bipolar disorder, depressive disorder, or anxiety disorder?
A. single-sample chi-square test
B. multiple-sample chi-square test
C. one-way ANOVA
D. factorial ANOVA
You would use which of the following statistical tests to compare the number of adults living in a rural, urban, or suburban community who have received a diagnosis of a bipolar disorder, depressive disorder, or anxiety disorder?
A. single-sample chi-square test
B. multiple-sample chi-square test
C. one-way ANOVA
D. factorial ANOVA
A psychologist finds that the relationship between physiological arousal and motor performance for a sample of athletes is .40. This means that ___% of variability in motor performance is explained by variability in physiological arousal.
A. 60
B. 40
C. 36
D. 16
Answer D is correct. A correlation coefficient between two different variables can be squared to calculate the coefficient of determination, which indicates the amount of variability in one variable that’s explained by variability in the other variable. The psychologist obtained a correlation of .40 between arousal and motor performance, and .40 squared is .16. This means that 16% of variability in motor performance is explained by variability in arousal.
Which of the following best describes the variables included in a structural equation model?
A. Manifest variables cannot be observed directly, and their influence is inferred from indicator variables.
B. Latent variables cannot be observed directly, and their influence is inferred from indicator variables.
C. Manifest and latent variables cannot be observed directly, and their influence is inferred from indicator variables.
D. Manifest variables cannot be observed directly, and their influence is inferred from latent variables.
Answer B is correct. In structural equation modeling, observed variables are also known as manifest variables and indicators and are directly observed and measured. Latent variables are also known as factors and constructs and cannot be directly observed or measured but are inferred from observed variables.
You would use stepwise multiple regression when you want to:
A. identify the fewest number of predictors needed to make accurate predictions about scores on a criterion.
B. identify the fewest number of predictors needed to accurately categorize people into two or more mutually exclusive criterion groups.
C. identify the predictors that have a causal relationship with the criterion.
D. identify the optimal number of criterion groups.
Answer A is correct. Stepwise multiple regression involves adding or subtracting one predictor at a time to the multiple regression equation in order to identify the fewest number of predictors that are needed to make accurate predictions about scores on the criterion.
Which of the following is the appropriate technique for using measures of severity of depression, anxiety, drug/alcohol use, and cognitive impairment to classify individuals with major depressive disorder as being at risk or not at risk for suicide?
A. regression analysis
B. multiple regression
C. canonical correlation
D. discriminant function analysis
Answer D is correct. Discriminant function analysis is the appropriate multivariate technique when two or more predictors will be used to estimate status on one nominal (grouping) variable.
For a sample of middle school students with high IQs, the correlation between IQ scores and achievement test scores is .35. If the correlation between IQ scores and achievement test scores is calculated for a sample of middle school students whose IQs represent the full range of IQ scores, the correlation coefficient is likely to be:
A. .35
B. larger than .35.
C. smaller than .35.
D. smaller or larger than .35
Answer B is correct. The scores used to obtain the original correlation coefficient had a restriction in range because the sample included only students with high IQs. A restricted range of scores tends to lower the correlation coefficient, so recalculating the coefficient for students with an unrestricted range is likely to produce a larger correlation coefficient.
To determine the relationship between cigarette smoking and absence from work, Dr. Nunez obtains a sample of employees who are either smokers or nonsmokers and determines the number of days each employee was absent from work the previous year. Dr. Nunez will use which of the following to calculate the correlation between these two variables?
A. Pearson r
B. Spearman rho
C. point biserial coefficient
D. contingency coefficient
Answer C is correct. The point biserial coefficient is used when one variable is a true dichotomy (smokers versus nonsmokers) and the other is continuous (number of days absent from work). (A useful mnemonic for distinguishing between the point biserial and biserial coefficients is to use the “t” in point as a reminder that the point biserial coefficient is used when one variable is a true dichotomy.)
Bayes’ theorem allows researchers to update prior knowledge about a parameter using:
A. current (observed) data.
B. previous knowledge and current (observed) data.
C. qualitative information.
D. a revised theoretical framework.
Answer B is correct. Bayes’ theorem uses previous knowledge (the prior) and current data (the likelihood function) to derive updated knowledge about a parameter (the posterior).
The probability of making a Type II error equals which of the following?
A. alpha
B. beta
C. one minus alpha
D. one minus beta
Answer B is correct. The probability of making a Type II error is equal to beta which is not set by the researcher but can be reduced by increasing statistical power.
Statistical power refers to the ability to:
A. retain a null hypothesis.
B. retain a true null hypothesis.
C. reject a null hypothesis.
D. reject a false null hypothesis.
Answer D is correct. Statistical power refers to the ability to reject a false null hypothesis, which is ordinarily what a researcher wants to do.
To calculate the standard error of means you need which of the following?
A. sample mean and standard deviation
B. sample mean and sample size
C. population standard deviation and sample size
D. population mean and standard deviation
Answer C is correct. The central limit theorem predicts that the standard deviation of the sampling distribution (also known as the standard error of means) is equal to the population standard deviation divided by the square root of the sample size.
Changing alpha from .05 to .01:
A. increases the probability of making a Type I error and decreases the probability of making a Type II error.
B. decreases the probability of making a Type I error and increases the probability of making a Type II error.
C. increases the probability of making a Type I error and Type II error.
D. decreases the probability of making a Type I error and Type II error.
Answer B is correct. Knowing that changing the size of alpha has opposite effects on the chance of making a Type I and Type II error would have allowed you to eliminate answers C and D. Then knowing that a Type I error occurs when a true null hypothesis is rejected and that decreasing alpha makes it harder to reject a null hypothesis whether it is true or false would have helped you identify answer B as the correct answer: Decreasing alpha decreases the probability of making a Type I error (rejecting a true null hypothesis) and, conversely, increases the probability of making a Type II error (retaining a false null hypothesis).
The central limit theorem predicts that the sampling distribution of means increasingly approaches normal as the:
A. number of samples increases regardless of the shape of the population distribution.
B. size of the sample increases regardless of the shape of the population distribution.
C. number of samples increases only when the population distribution is normal.
D. size of the sample increases only when the population distribution is normal.
Answer B is correct. The central limit theorem predicts that the sampling distribution of means increasingly approaches a normal shape as the size of the sample increases, regardless of the shape of the population distribution of scores.
A Type I error occurs when a researcher:
A. retains a true null hypothesis.
B. retains a false null hypothesis.
C. rejects a true null hypothesis.
D. rejects a false null hypothesis.
Answer C is correct. When a researcher rejects a true null hypothesis, the researcher has concluded that the independent variable has had a significant effect on the dependent variable, but the observed effect is actually due to sampling error or other factors. This type of incorrect decision is known as a Type I error.
Answer C is correct. When a researcher rejects a true null hypothesis, the researcher has concluded that the independent variable has had a significant effect on the dependent variable, but the observed effect is actually due to sampling error or other factors. This type of incorrect decision is known as a Type I error.
When a predictor included in a multiple regression equation has a negative beta weight, this means that:
A. The predictor has a negative correlation with the other predictors.
B. The predictor has a negative correlation with the criterion.
C. The predictor has a statistically significant relationship with the criterion.
D. The predictor does not have a statistically significant relationship with the criterion.
Answer B is correct. Beta weights are standardized regression coefficients, and a predictor’s beta weight indicates the strength of the relationship between the predictor and the criterion. When a predictor’s beta weight is positive, this means there’s a positive relationship between the predictor and criterion (i.e., as scores on the predictor increase, scores on the criterion increase). Conversely, when the beta weight is negative, this means there’s a negative relationship between the predictor and criterion (i.e., as scores on the predictor increase, scores on the criterion decrease).
A psychologist conducts a study to evaluate the effects of three different work shifts (day, swing, and graveyard) on the average number of errors, absences, and accidents of assembly line workers during a six-month period. To minimize the probability of making a Type I error, the psychologist will use which of the following to analyze the data she collects?
A. one-way ANOVA
B. three-way ANOVA
C. ANCOVA
D. MANOVA
Answer D is correct. The MANOVA (multivariate analysis of variance) is used when a study includes one or more independent variables and two or more dependent variables that are each measured on an interval or ratio scale. An advantage of conducting a single MANOVA rather than separate ANOVAs for each dependent variable is that doing so reduces the probability of making a Type I error – i.e., it reduces the experimentwise error rate. The study described in this question has one independent variable (work shift) and three dependent variables that are all measured on a ratio scale (number of errors, absences, and accidents). Therefore, the MANOVA can be used to analyze the data collected in this study.
A researcher conducts a study to determine if there are gender differences in acceptance as a graduate student into the six largest departments at a university. To analyze the data she collects in this study, the researcher will use which of the following?
A. one-way ANOVA
B. two-way ANOVA
C. single-sample chi-square test
D. multiple-sample chi-square test
Answer D is correct. The first and second steps in identifying the appropriate statistical test are identifying the study’s independent and dependent variables and the scale of measurement of the dependent variable. This study has two variables – gender and department – and gender can be viewed as the independent variable and department as the dependent variable. The dependent variable – department – is a nominal variable. The chi-square test is the appropriate test for analyzing nominal data and, when a study has two or more variables, the multiple-sample chi-square test is used. Keep in mind that, for the chi-square test, you count the total number of variables regardless of whether they’re independent or dependent variables: The multiple-sample chi-square test is used when the study has two or more variables, and the single-sample chi-square test is used when a study is a descriptive study and has only one variable.
The scores obtained by 35 students on a math exam and a physics exam are converted to ranks. To determine the degree of association between the two sets of ranks, the appropriate correlation coefficient is which of the following?
A. biserial
B. point biserial
C. Spearman
D. Pearson
Answer C is correct. Knowing that “Spearman” refers to the Spearman rank-order correlation coefficient would have helped you identify the correct answer to this question. As its name suggests, it’s used to determine the correlation between rank-ordered data.
__________ validity refers to the extent to which research results are generalizable to other people, settings, and times.
A. External
B. Ecological
C. Exogeneous
D. Endogenous
Answer A is correct. External validity refers to the generalizability of research results and includes population validity (generalizability to other people in the population the sample was drawn from), ecological validity (generalizability to other settings or environments), and temporal validity (generalizability to other times).
The correction for attenuation formula is used to estimate the effects of increasing:
A. the reliability of a predictor and/or criterion on the criterion-related validity coefficient.
B. the reliability of a predictor and/or criterion on the predictor’s incremental validity.
C. the number of items included in the predictor on its criterion-related validity coefficient.
D. the base rate on the predictor’s incremental validity.
Answer A is correct. The correction for attenuation formula is used to estimate what the maximum criterion-related validity coefficient would be if the predictor and/or criterion had a reliability coefficient of 1.0.
The reliability index is an estimate of the correlation between actual observed scores and theoretical true scores and is calculated by:
A. squaring the reliability coefficient.
B. taking the square root of the reliability coefficient.
C. subtracting the reliability coefficient from 1.0.
D. taking the square root of the standard error of measurement.
Answer B is correct. The reliability index is calculated by taking the square root of the reliability coefficient. For example, when a test’s reliability coefficient is .81, the reliability index is .90.
How to calculate the optimal difficulty level of an item in a questionnaire/Test?
For instance, the chance of choosing the correct answer to a four-answer multiple-choice question by guessing is .25, and the optimal difficulty level for this type of item is calculated by adding 1.0 to .25 and dividing the result by 2: (1.0 + .25)/2 = 1.25/2 = .625.
How to calculate standard error of measurment?
The standard error of measurement is used to construct a confidence interval, and it’s calculated by multiplying the test’s standard deviation times the square root of 1 minus the reliability coefficient. For instance, if a test has a standard deviation of 5 and a reliability coefficient of .84, its standard error of measurement equals 5 times the square root of 1 minus .84: 1 minus .84 is .16, the square root of .16 is .4, and 5 times .4 is 2. In other words, when a test’s standard deviation is 5 and its reliability coefficient is .84, its standard error of measurement is 2.
How to calculate confidence interval?
For a 68% confidence interval, you add and subtract one standard error of measurement to and from the obtained score; for a 95% confidence interval, you add and subtract two standard errors of measurement; and for a 99% confidence interval, you add and subtract three standard errors of measurement. On the exam, a test question might state that an examinee obtained a score of 90 on a test that has a standard error of measurement of 5 and ask you to identify the 95% confidence interval for this score. To do so, you add and subtract 10 (two standard errors) to and from 90, which gives you a 95% confidence interval of 80 to 100.
How to calculate item discrimination index?
The item discrimination index (D) ranges from -1.0 to +1.0 and indicates the difference between the percentage of examinees with high total test scores (often the top 27%) who answered the item correctly and the percentage of examinees with low total test scores (the bottom 27%) who answered the item correctly. As an example, when 90% of examinees in the high-scoring group and 20% of examinees in the low-scoring group answered an item correctly, the item’s D value is .90 minus .20, which is .70.
When conducting a factor analysis, a researcher would rotate the initial factor matrix to:
A. reduce measurement error.
B. increase the size of the communality.
C. obtain a factor matrix that is easier to interpret.
D. minimize the effects of missing data.
Answer C is correct. Rotation of the initial factor matrix simplifies the factor structure, thereby creating a matrix that is easier to interpret. In a rotated factor matrix, each test included in the factor analysis will have a high correlation (factor loading) with one of the factors and low correlations with the remaining factors. Consequently, the interpretation of and name given to each factor involves considering the tests that correlate highly with each factor. For example, if Tests A, B, and C all have high correlations with Factor 1 and low correlations with Factor 2, the content of these three tests will be considered to determine what they have in common, and that information will be used to name Factor 1. If the opposite pattern is true for Tests D, E, and F, the same procedure will be used to name Factor 2.
Note that answer B is not the correct answer because the communality (the amount of variability in each test that is explained by all of the factors) is not affected by rotation.
A __________ can be used to visually summarize the nominal data collected in a research study.
A. line graph
B. histogram
C. bar graph
D. histogram or bar graph
Answer C is correct. Bar graphs are used to depict the number of observations in each nominal category (e.g., gender, eye color). Interval and ratio data are depicted with histograms and with line graphs, which are also known as frequency polygons.
Which of the following scales of measurement allows you to conclude that the difference between the scores of 50 and 51 on a test is equal to the difference between the scores of 90 and 91 on the same test?
A. ordinal, interval, and ratio
B. interval and ratio
C. interval only
D. ratio only
Answer B is correct. Interval and ratio scales both have the property of equal intervals between adjacent points on the scale. Equal intervals allow you to draw the conclusion that the one-point difference between the scores of 50 and 51 on a test is equal to the one-point difference between the scores of 90 and 91 on the same test.
A psychologist is conducting clinical trials to evaluate the effectiveness of a treatment
for children, ages 9 to 12 years, who have received a diagnosis of bipolar disorder with comorbid symptoms of acute stress disorder. Before administering the intervention to each child, the psychologist must obtain consent from the child’s legal guardian(s) and:
A. assent from the child.
B. assent from the child when the child is under 12 years of age or consent from the child when the child is 12 years of age or older.
C. assent from the child unless assent is waived because an appropriate mechanism for protecting the child’s safety will be provided.
D. assent from the child unless assent is waived on capacity grounds or grounds of direct benefit to the child.
Answer D is correct. This situation is addressed by Fisher (2017) in her discussion of the application of Standard 8.02 of the APA’s Ethics Code to pediatric clinical trials. She notes that U.S. federal regulations [CFR 46.408(a)] permit waiver of a child’s assent to participate in research that involves a therapeutic intervention when the child does not have the capacity to consent or when the research offers benefits to the health of the child that cannot be obtained from interventions available outside the research study.
Your one-way ANOVA produces a statistically significant F-ratio. In this situation, you would consider conducting a post-hoc test if:
A. the independent variable has only two levels.
B. the independent variable has three or more levels.
C. there is more than one dependent variable.
D. the interaction between independent variables is also statistically significant.
Answer B is correct. A one-way ANOVA is used when a study has one independent variable and one dependent variable (which is why answers C and D can be eliminated). When it produces a statistically significant F-ratio, this indicates that at least one group mean is significantly different from another group mean. If there are only two groups (two levels of the independent variable), a post-hoc test is not necessary because the significant F-ratio indicates that the means of those two groups are significantly different. However, when there are three or more groups (three or more levels of the independent variable), a post-hoc test is useful for determining which group means are significantly different. (Comparing the magnitude of the means obtained by the groups indicates which group means differ, but a post-hoc test is needed to determine which differences are statistically significant.)
Which of the following is likely to produce the largest reliability coefficient for a newly developed achievement test?
A. unrestricted range of scores and homogeneous content of test items
B. unrestricted range of score and heterogeneous content of test items
C. restricted range of scores and homogeneous content of test items
D. restricted range of scores and heterogeneous content of test items
Answer A is correct. Two factors that affect the size of a test’s reliability coefficient are the range of test scores and the homogeneity of the test’s content: All other things being equal, a test with an unrestricted range of test scores and homogeneous items will produce a larger reliability coefficient than will a test with a restricted range of scores and heterogeneous items. For example, a 50-item test that measures knowledge of neuropsychology and contains items that range from easy to very difficult can be expected to have a higher reliability coefficient than will a 50-item test that measures knowledge of neuropsychology, psychopathology, and clinical psychology and contains items that range only from difficult to very difficult.
When using Bayes’ theorem:
A. the likelihood function is derived from a synthesis of the prior and posterior.
B. the prior is derived from a synthesis of the posterior and likelihood function.
C. the posterior is derived from a synthesis of the prior and likelihood function.
D. the prevailing is derived from a synthesis of the prior, likelihood function, and posterior.
Answer C is correct. Bayes’ theorem combines the prior probability distribution for the target parameter and the probability distribution for the parameter derived from current data (the likelihood function) to obtain a posterior (updated) probability distribution for the parameter. [There is no “prevailing” (answer D) in Bayes’ theorem.]
The central limit theorem predicts that, regardless of the shape of the population distribution of scores, a sampling distribution of means increasingly approaches the shape of:
A. the population distribution as the number of samples increases.
B. the population distribution as the sample size increases.
C. a normal distribution as the number of samples increases.
D. a normal distribution as the sample size increases.
Answer D is correct. According to the central limit theorem, the shape of the sampling distribution of means increasingly approaches a normal shape as the sample size increases, regardless of the shape of the population distribution of scores. (The central limit theorem assumes an infinite number of equal-sized samples, and its prediction about the shape of the sampling distribution is based on the size of the samples.)
A researcher would use the split-plot ANOVA to analyze the data she collected in her research study when:
A. she wants to statistically remove the effects of an extraneous variable on the dependent variable.
B. she wants to assess the effects of one independent variable on three dependent variables that are all measured on an interval or ratio scale.
C. her study included one between-subjects independent variable and one within-subjects independent variable.
D. her study included an extraneous variable that was treated like an independent variable.
Answer C is correct. Knowing that the split-plot ANOVA is also known as the mixed ANOVA may have helped you identify the correct answer to this question: It is used when data are collected from a study that used a mixed design – i.e., that had at least one between-subjects variable and one within-subjects variable.
When conducting a one-way ANOVA, an F-ratio is calculated by dividing the mean square between (MSB) by the mean square within (MSW). The mean square between provides an estimate of variability in dependent variable scores due to:
A. treatment effects only.
B. error only.
C. treatment effects plus error.
D. treatment effects minus error.
Answer C is correct. MSB is a measure of variability due to a combination of treatment effects plus error, while MSW is a measure of variability due to error only. When MSB is divided by MSW, this produces the F-ratio which provides an estimate of treatment effects.
An advantage of conducting a single one-way ANOVA rather than separate t-tests when a study includes one independent variable with three or more levels is that the ANOVA:
A. provides information on both main and interaction effects.
B. controls the experimentwise error rate.
C. reduces the effects of measurement error.
D. controls the effects of an extraneous variable.
Answer B is correct. The experimentwise error rate is the probability of making a Type I error when multiple statistical comparisons are made within a single research study. If an independent variable has three or more levels, several t-tests would have to be conducted because the t-test compares only two means at a time, and this would increase the experimentwise error rate. When using the one-way ANOVA, all possible comparisons between means are made in a way that maintains the experimentwise error rate at the alpha level set by the researcher.
Dr. Haar is concerned that the statistics tests she uses for her introductory statistics class are too difficult since so few students pass them. To make her tests a little easier, she will want to remove some items that have an item difficulty index (p) of ________ and add some items that have an item difficulty index of ________.
A. +1.0 and higher; -1.0 and lower
B. +.50 and higher; -.50 and lower
C. .85 and higher; .15 and lower
D. .15 and lower; .85 and higher
Answer D is correct. The item difficulty index (p) ranges from 0 to 1.0, with 0 indicating a very difficult item (none of the examinees answered it correctly) and 1.0 indicating a very easy item (all examinees answered it correctly). Therefore, to make the statistics tests easier, Dr. Haar will want to remove some of the very difficult items (e.g., those with a p value of .15 and lower) and add some easy items (e.g., those with a p value of .85 and higher).
The analysis of covariance (ANCOVA) is used to:
A. statistically remove the effects of an extraneous variable on the dependent variable.
B. measure the effects of an extraneous variable on the dependent variable by treating it as an independent variable.
C. simultaneously assess the effects of the independent variable on two or more dependent variables.
D. simultaneously assess the effects of two or more independent variables on a single dependent variable.
Answer A is correct. The ANCOVA is used to statistically remove the effects of an extraneous variable from scores on the dependent variable so that it’s easier to detect the effects of the independent variable on the dependent variable. When using the ANCOVA, the extraneous variable is the “covariate.”
The item discrimination index (D) ranges from:
A. 0 to +1.0.
B. 0 to 50.
C. -1.00 to +1.00.
D. -50 to +50.
Answer C is correct.
The value of D ranges from -1.0 to +1.0. When D is +1.0, this indicates that all examinees in the high-scoring group answered the item correctly and all examinees in the low-scoring group answered it incorrectly.
Conversely, when D is -1.0, this indicates that all examinees in the low-scoring group answered the item correctly and all examinees in the high-scoring group answered it incorrectly.
A job applicant’s score on a selection test is used to predict what her future score on a measure of job performance will be if she’s hired. If the applicant’s predicted job performance score is 80 and the measure of job performance has a standard deviation of 7 and standard error of estimate of 3, the 99% confidence interval for the applicant’s predicted score of 80 is:
A. 73 to 87.
B. 66 to 94.
C. 74 to 86.
D. 71 to 89.
Answer D is correct. The 99% confidence interval for a predicted score is calculated by adding and subtracting three standard errors of estimate to and from the predicted score. In this situation, the applicant’s predicted score is 80 and the standard error of estimate is 3, so the 99% confidence interval is 80 minus and plus 9 (three standard errors), which is 71 to 89.
To help ensure that differential selection doesn’t threaten the internal validity of a research study, an investigator will:
A. include more than one group in the research study.
B. use the single- or double-blind technique.
C. randomly select participants from the population.
D. randomly assign participants to different levels of the independent variable.
Answer D is correct. It should have been easy to identify the correct answer to this question as long as you remembered that the name of this threat is misleading since it is the result of the way subjects are assigned to treatment groups rather than how they are selected from the population. It threatens a study’s internal validity when subjects in different groups differ in an important way at the beginning of the study, and the best way to control it is to randomly assign subjects to the treatment groups.
A researcher wants to evaluate the effects of virtual reality exposure for treating the storm, height, and spider phobias of a 34-year-old woman. The best single-subject research design for evaluating this treatment is which of the following?
A. multiple baseline
B. reversal
C. discrete trials
D. time series
Answer A is correct. Of the three single-subject designs listed in the answers (multiple baseline, reversal, and discrete trials), the multiple baseline design would be the most appropriate because it would allow the researcher to determine if the treatment is effective for any of the woman’s phobias by sequentially applying the treatment to them. In addition, the multiple baseline design doesn’t require a treatment to be withdrawn once it’s been applied to a behavior. Consequently, if the treatment has a beneficial effect on any of the woman’s phobias, the researcher would not have to withdraw the treatment during the course of the study just for the sake of assessing its effects.
When a test has a standard deviation of 10, the test’s standard error of measurement will fall between:
A. 0 and 10
B. 10 and 1.0
C. 0 and 1.0
D. -1.0 and +1.0
Answer A is correct.
A test’s standard error of measurement equals its standard deviation times the square root of 1 minus the reliability coefficient. A test’s reliability coefficient can range from 0 to 1.0, so the standard error of measurement for a test that has a standard deviation of 10 ranges from 0 when the reliability coefficient is 1.0 (10 times the square root of 1 minus 1 equals 0) to 10 when the reliability coefficient is 0 (10 times the square root of 1 minus 0 equals 10).
Dr. Osprey conducted a study to evaluate the effects of an anti-drug program on attitudes toward drug use for middle school students from low-income families. Subjects were 662 7th and 8th graders attending an inner-city school. Attitudes toward drugs were assessed in the second week of September, the five-hour anti-drug program was administered in five one-hour sessions during the second week of October, and attitudes toward drugs were then re-assessed in the second week of November. Results indicated that the program significantly increased negative attitudes toward drug use. The biggest threat to the internal validity of this study is which of the following?
A. statistical regression
B. selection
C. reactivity
D. history
Answer D is correct. History threatens a study’s internal validity whenever an external event that occurs during the course of the study has a systematic effect on subjects’ scores or status on the dependent variable(s). Given that this study includes only one group of subjects and takes place over a two-month period of time, the biggest threat to its internal validity is history. For example, many of the students may have changed their attitudes not because of the anti-drug program but because they were exposed to information about drug use during the two months in a biology or health class or a TV or billboard ad campaign.
Which of the following is a type of counterbalanced design?
A. Solomon four-group
B. Latin square
C. factorial
D. multiple-baseline
Answer B is correct. Counterbalancing is used to control order effects that may occur when a within-subjects design is used – i.e., when subjects in each group will receive or participate in all levels of the independent variable. The Latin square is a type of counterbalanced design that ensures that the different levels of the independent variable are assigned to the groups of subjects so that each level appears an equal number of times in each ordinal position.
The test manual for an academic achievement test indicates that it has an alternate forms reliability coefficient of .80. This means that _____ of variability in test scores is true score variability.
A. 80%
B. 64%
C. 36%
D. 20%
Answer A is correct. Reliability coefficients are interpreted directly as the percent of variability in test scores that is due to true score variability. When the reliability coefficient is .80, this means that 80% of variability in scores is due to true score variability and 20% is due to measurement error.
A company’s current selection procedure for computer programmers consists of seven predictors that are used to predict the job performance score that a job applicant will receive six months after being hired. The owner of the company wants to reduce the costs and time required to make selection decisions. Which of the following would be most useful for determining the fewest number of predictors needed to make accurate predictions about applicants’ job performance scores?
A. linear regression analysis
B. discriminant function analysis
C. stepwise multiple regression
D. factor analysis
Answer C is correct. Multiple regression is used to predict a person’s score on a single criterion (e.g., job performance measure) using two or more predictors.
Stepwise multiple regression is a type of multiple regression that’s used to identify the fewest number of predictors needed to make an accurate prediction.
Which of the following describes the relationship between a test’s reliability coefficient and its criterion-related validity coefficient?
A. A test’s criterion-related validity coefficient can be no greater than its reliability coefficient.
B. A test’s criterion-related validity coefficient can be no greater than the square root of its reliability coefficient.
C. A test’s criterion-related validity coefficient can be no greater than the square root of one minus its reliability coefficient.
D. A test’s criterion-related validity coefficient can be no greater than the square of its reliability coefficient.
Answer B is correct. A test’s criterion-related validity coefficient can be no greater than the square root of its reliability coefficient. For example, if a test has a reliability coefficient of .81, its criterion-related validity coefficient can be no greater than the square root of .81, which is .90.
Which of the following is a culture-reduced measure of fluid intelligence?
A. Kuhlmann-Anderson
B. Raven’s Progressive Matrices
C. Woodcock-Johnson
D. Slosson Intelligence Test
Answer B is correct. The Raven’s Progressive Matrices (RPM) tests are measures of fluid intelligence and are considered to be culture-reduced because they do not use language and performance does not depend on specific cultural or academic learning. There are three RPM tests: Standard Progressive Matrices, Colored Progressive Matrices, and Advanced Progressive Matrices.
Which of the following is the appropriate bivariate correlation coefficient to use when the scores to be correlated are both reported as ranks?
A. Spearman
B. Pearson
C. biserial
D. point biserial
Answer A is correct. The full-name of the Spearman correlation coefficient is the Spearman rank-order correlation coefficient. As its name suggests, it’s used to correlate scores on two variables that are reported as ranks.
You obtain the data you need from a sample of licensed psychologists to calculate a correlation coefficient for their EPPP score and yearly salary five years after taking the exam. If you square the correlation coefficient, you will obtain a(n) _______________, which indicates the amount of variability in yearly salary that’s accounted for by EPPP score.
A. coefficient of concordance
B. coefficient of determination
C. kappa coefficient
D. eta coefficient
Answer B is correct. Squaring a correlation coefficient produces a coefficient of determination, which is a measure of shared variability – or, put another way, a measure of the amount of variability in one variable that is accounted for by variability in another variable.
A psychologist is planning a research study to evaluate the effects of a two-hour online lecture on statistics for improving the statistics knowledge of 35 psychologists who have just started studying for the EPPP. All participants will (1) take a pre-test consisting of 50 multiple-choice statistics questions on Monday, (2) attend the online lecture on Wednesday evening, and (3) take a post-test consisting of 50 multiple-choice statistics questions that are equivalent to the pre-test questions on Friday. To analyze the data she obtains in her study, the researcher will use which of the following?
A. t-test for a single sample
B. t-test for correlated samples
C. two-way ANOVA
D. single-sample chi-square test
Answer B is correct. The first steps in identifying the appropriate statistical test are to identify the independent and dependent variables and the scale of measurement of the dependent variable. This study’s independent variable is the lecture on statistics and the dependent variable is statistics test score. The dependent variable is measured on a ratio scale, which means that the statistical test will be used to compare the mean scores obtained by the psychologists on the pre- and post-tests. The t-test and ANOVA are both used to compare mean scores, but because there are only two means, the t-test is the appropriate test. To determine which t-test to use, you determine how the means will be obtained: In this study they will be obtained from a single group of subjects, and the t-test for correlated samples is used when two means are obtained from the same group or from two groups that are related in some way.
A test developer would use the multitrait-multimethod matrix to evaluate a test’s:
A. incremental validity.
B. criterion-related validity.
C. construct validity.
D. differential validity.
Answer C is correct. The multitrait-multimethod matrix is one method for evaluating a test’s construct validity and is important for tests that are designed to assess a hypothetical trait (construct). When using the multitrait-multimethod matrix, the test being validated is administered to a sample of examinees along with tests known to measure the same or a related trait and tests known to measure unrelated traits. When scores on the test being validated have high correlations with scores on tests that measure the same or a related trait, this provides evidence of the test’s convergent validity. And, when scores on the test have low correlations with scores on tests that measure unrelated traits, this provides evidence of the test’s divergent validity. Adequate convergent and divergent validity provide evidence of the test’s construct validity.
A problem with using percent agreement as a measure of inter-rater reliability is that it may:
A. underestimate reliability because it’s susceptible to rater biases.
B. overestimate reliability because it’s susceptible to rater biases.
C. underestimate reliability because it’s affected by chance agreement.
D. overestimate reliability because it’s affected by chance agreement.
Answer D is correct. A certain amount of chance agreement between two or more raters is possible, especially for behavior observation scales when the behavior occurs frequently. Percent agreement is easy to calculate but, because it’s affected by chance agreement, it may overestimate a measure’s inter-rater reliability.
The Kuder-Richardson Formula 20 (KR-20) can be used to estimate a test’s ____________ reliability when test items are scored dichotomously.
A. alternate forms
B. internal consistency
C. test-retest
D. inter-rater
Answer B is correct. KR-20 is a variation of coefficient alpha that can be used to evaluate a test’s internal consistency reliability when test items are scored dichotomously (e.g., as correct or incorrect).
In the context of psychological assessment, the terms “floor” and “ceiling” refer to:
A. the lowest and highest true scores an examinee is likely to have, given his or her obtained predictor score.
B. the lowest and highest scores an examinee is likely to obtain on a criterion, given his or her predictor score.
C. the degree to which a test can discriminate among examinees who have very low levels or very high levels of the characteristic measured by the test.
D. the degree to which a test accurately predicts the criterion scores of examinees who obtain very low scores or very high scores on the test.
Answer C is correct. A test has **limited floor **when it cannot discriminate well among examinees who have a low level of the characteristic measured by the test because the test does not include a sufficient number of easy items.
In contrast, a test has limited ceiling when it cannot discriminate well among examinees who have a high level of the characteristic measured by the test because it** does not include a sufficient number of difficult items.**
In the context of factor analysis, “oblique” means:
A. statistically significant.
B. statistically insignificant.
C. uncorrelated.
D. correlated.
Answer D is correct. The factors extracted (identified) in a factor analysis can be either orthogonal or oblique. Orthogonal factors are uncorrelated, while oblique factors are correlated.
The standard error of the mean increases in size as the:
A. population standard deviation and sample size decrease.
B. population standard deviation and sample size increase.
C. population standard deviation increases and sample size decreases.
D. population standard deviation decreases and sample size increases.
Answer C is correct. The standard error of the mean is the standard deviation of the sampling distribution of the mean and is used to determine how well a sample mean estimates a population mean.
It’s calculated by dividing the population standard deviation by the square root of the sample size, and it increases as the population standard deviation increases and the sample size decreases, and vice versa.
When two variables are measured on an interval or ratio scale and their relationship is nonlinear, you would use which of the following correlation coefficients to assess their degree of association?
A. Spearman rho
B. contingency
C. eta
D. biserial
Answer C is correct.
An assumption that must be met for most bivariate correlation coefficients is that there’s a linear relationship between the two variables that will be correlated. An exception is eta, which is used when both variables are measured on a continuous (interval or ratio) scale and the relationship between the variables is nonlinear.
Which of the following is considered the most effective way to control extraneous variables?
A. random selection of subjects from the population
B. random assignment of subjects to the different treatment groups
C. statistically removing their effects from the independent variable
D. statistically removing their effects from the dependent variable
Answer B is correct. An extraneous variable is a variable that has not been designated as an independent or dependent variable in a research study but affects the study’s results. A researcher always wants to control extraneous variables so that the effects of an independent variable on the dependent variable can be detected. Methods used to control extraneous variables include random assignment of subjects to treatment groups, treating the extraneous variable as an independent variable, and statistically removing the effects of the extraneous variable.
Of these, random assignment of subjects is considered the most effective method because it helps ensure that groups are initially equivalent in terms of all extraneous variables, even those that are unknown.
You would use which of the following to construct a confidence interval around an examinee’s predicted criterion score?
A. regression equation
B. multiple regression equation
C. standard error of measurement
D. standard error of estimate
Answer D is correct. The standard error of estimate indicates the amount of error that can be expected when an examinee’s predictor score is used to predict his or her score on a criterion, and it is used to construct a confidence interval around the predicted criterion score.
The standard error of measurement (answer C) indicates the amount of error that can be expected in an examinee’s obtained (rather than predicted) score and is used to construct a confidence interval around the obtained score.
To estimate the effect of shortening or lengthening a test on the test’s reliability coefficient, you would use which of the following?
A. coefficient of determination
B. coefficient alpha
C. Spearman-Brown formula
D. Kuder-Richardson formula 20
Answer C is correct. The Spearman-Brown formula is also known as the Spearman-Brown prophecy formula and is used to estimate the effect of adding or subtracting items to a test on the test’s reliability coefficient.
When using the A-B-A-B single-subject design:
A. there are two no-treatment phases and two treatment phases with the same treatment being applied in both treatment phases.
B. there are two no-treatment phases and two treatment phases with a different treatment being applied in each treatment phase.
C. there are four treatment phases with two different treatments each being applied twice.
D. there are four treatment phases with four different treatments each being applied once.
Answer A is correct. In the context of single-subject designs, the letter “A” always designates a no-treatment (baseline) phase and the other letters designate a treatment phase. When the other letter is the same – as in the A-B-A-B design – this indicates that the same treatment (B) is applied twice. In contrast,** when the other letters are different – **as in the A-B-A-C design – this indicates that two different treatments (B and C) are each applied once.
You have developed a battery of tests to determine which of five vocational training programs is most appropriate for unemployed young adults who dropped out of high school. Which of the following multivariate techniques will be useful in this situation?
A. multiple regression
B. multivariate analysis of variance
C. canonical correlation
D. discriminant function analysis
Answer D is correct. Discriminant function analysis is also known as discriminant analysis and is the appropriate technique when two or more predictors will be used to categorize people into one of two or more criterion groups – e.g., to use a person’s scores on two or more tests to determine which of five vocational training programs is the best one for him or her.
According to classical test theory, variability in test scores is due to a combination of:
A. true score variability and random error.
B. true score variability and systematic error.
C. observed variability and divergent error.
D. observed variability and convergent error.
Answer A is correct. Classical test theory describes observed variability in test scores as being the result of a combination of true score variability (variability in what the test is measuring) and measurement error (variability due to random error).
Parametric statistical tests are more “powerful” than nonparametric tests which means that, when using a parametric test, you’re more likely to:
A. retain a true null hypothesis.
B. reject a true null hypothesis.
C. retain a false null hypothesis.
D. reject a false null hypothesis.
Answer D is correct. In the context of inferential statistics, power is also known as statistical power and refers to the ability to detect (and reject) a false null hypothesis. Several factors affect power including the type of statistical test that’s used: Parametric tests (e.g., t-test and analysis of variance) are more powerful than nonparametric tests (e.g., chi-square test) because of the type of data that are analyzed by parametric tests and because of the assumptions that must be met to use them.
To compare an obtained sample mean to a known population mean, you would use which of the following?
A. ANCOVA
B. one-way ANOVA
C. t-test for a single sample
D. single sample chi-square test
Answer C is correct. The t-test is always used to compare two means and the appropriate t-test depends on how the means were obtained. When an obtained sample mean will be compared to a known population mean, the t-test for a single sample is the appropriate t-test. In this situation, the sample is the treatment group and the population is being used as the control (no treatment) group.
To evaluate the test-retest reliability of a newly developed measure of intelligence, a test developer administers the test to the same sample of examinees on two separate occasions. When he correlates the two sets of scores, he obtains a reliability coefficient of .60. To increase this reliability coefficient, the test developer should:
A. increase the number of test items and make sure the new sample of examinees is heterogeneous with regard to level of intelligence.
B. increase the number of test items and make sure the new sample of examinees is homogeneous with regard to level of intelligence.
C. decrease the number of test items and make sure the new sample of examinees is heterogeneous with regard to level of intelligence.
D. decrease the number of test items and make sure the new sample of examinees is homogeneous with regard to level of intelligence.
Answer A is correct. A test’s reliability coefficient is affected by several factors including the length of the test and the degree of similarity of examinees with regard to the attribute(s) measured by the test: In general, longer tests are more reliable than shorter tests and reliability coefficients are larger when they’re derived from a sample that has an unrestricted range of scores – i.e., when examinees in the sample are heterogeneous with regard to the attribute(s) measured by the test.
Community-based participatory research (CBPR) is best described as a type of:
A. historical research.
B. action research.
C. ethnographic research.
D. developmental research.
Answer B is correct. CBPR is a type of community-engaged research that “encourages engagement and full participation of community partners in every aspect of the research process from question identification to analysis and dissemination” [K. Hacker, Community-based participatory research, Los Angeles, SAGE Publications, Inc., 2013, p. 2]. It’s categorized as a type of action research, which was originally developed by Lewin in the 1940s as a way of using research to facilitate planned social change. Action research subsequently became the basis for a number of participatory action research approaches including CBPR. Like other forms of action research, CBPR involves formulating a research question, planning the study, collecting and analyzing the data, developing action plans from the data, carrying out the action plan, evaluating the results, and disseminating the results.
The item difficulty index ranges from __________, with 0 indicating a __________.
A. 0 to +1.0; very difficult item
B. 0 to +1.0; very easy item.
C. -10 to +10; moderately difficult item
D. -1.0 to +1.0; moderately difficult item
Answer A is correct. The item difficulty index (p) ranges in value from 0 to +1.0, with 0 indicating that none of the examinees answered the item correctly (i.e., that the item is very difficult) and +1.0 indicating that all of the examinees answered the item correctly (i.e., that the item is very easy).
In a factor matrix, a communality indicates the proportion of variability:
A. in multiple tests that’s accounted for by a single identified factor.
B. in multiple tests that’s accounted for by all of the identified factors.
C. in a single test that’s accounted for by a single identified factor.
D. in a single test that’s accounted for by all of the identified factors.
Answer D is correct. To identify the correct answer to this question, you have to know that a communality indicates the amount of variability in each test included in the factor analysis that’s explained by all of the identified factors. Alternatively, knowing that each test in a factor matrix has its own communality would have helped you eliminate answers A and B. Then, knowing that it’s a factor loading (not a communality) that indicates the proportion of variability in a single test that’s accounted for by a single factor would have helped you eliminate answer C.
To evaluate the inter-rater reliability of a test when scores or ratings on the test represent a nominal scale of measurement, you would use which of the following?
A. coefficient alpha
B. kappa coefficient
C. KR-20
D. Spearman-Brown
Answer B is correct. The kappa coefficient is also known as Cohen’s kappa statistic and is used to measure inter-rater reliability when scores or ratings represent a nominal scale of measurement. An advantage of the kappa coefficient as a measure of inter-rater reliability is that, unlike percent agreement, the kappa coefficient corrects for chance agreement between the raters.
The results of a factor analysis indicate that a test has a correlation coefficient of .20 with Factor I, .35 with Factor II, and .60 with Factor III. The correlation of .60 indicates that ____% of variability in test scores is explained by Factor III.
A. 60
B. 40
C. 36
D. 64
Answer C is correct. Factor loadings are interpreted like other correlation coefficients for two different measures and are squared to obtain a measure of shared variability. When the correlation between a test and a factor is .60, this means that 36% (.60 squared) of variability in test scores is explained by variability in the factor.
When an occupational interest test provides ipsative scores, this means that an examinee’s scores indicate:
A. the degree of consistency of his/her interests.
B. his/her likelihood of success in different occupations.
C. the relative strength of each occupational interest assessed by the test.
D. the absolute strength of each occupational interest assessed by the test.
Answer C is correct. Ipsative scores are also known as **intraindividual scores **and provide information on the examinee’s relative (rather than absolute) strengths with regard to the interests or other characteristic measured by the test.
To calculate the standard error of measurement for a newly developed test, you need which of the following?
A. the test’s mean and reliability coefficient
B. the test’s standard deviation and the sample size
C. the test’s standard deviation and validity coefficient
D. the test’s standard deviation and reliability coefficient
Answer D is correct. A test’s standard error of measurement is used to construct a confidence interval around an examinee’s obtained score. It’s calculated by multiplying the test’s standard deviation times the square root of 1 minus its reliability coefficient.
A test’s __________ is calculated by dividing the number of true positives by the number of true positives plus false negatives.
A. positive predictive value
B. negative predictive value
C. specificity
D. sensitivity
Answer D is correct. A test’s sensitivity refers to its ability to accurately identify the people who have the disorder or other attribute the test was designed to measure, and it’s calculated using the method described in this question – i.e., sensitivity = TP/(TP + FN) where TP is the number of true positives and FN is the number of false negatives.
The main characteristic that distinguishes true experimental research from quasi-experimental research is that, when conducting a true experimental research study:
A. the results of the study can be analyzed using a parametric statistical test.
B. the results of the study can be analyzed using a parametric or nonparametric statistical test.
C. subjects are randomly assigned to treatment groups.
D. subjects are randomly selected from the population.
Answer C is correct. True experimental research is distinguished from quasi-experimental research by two main characteristics. The researcher can (a) manipulate the independent variable(s) – i.e., decide which groups receive different levels of the variable(s) and (b) randomly assign subjects to the different groups.
The numerator of the F-ratio produced by a one-way ANOVA is a measure of variability in dependent variable scores that’s due to:
A. treatment effects only.
B. error only.
C. treatment effects plus error.
D. treatment effects minus error.
Answer C is correct. The F-ratio is calculated by dividing the mean square between (MSB) by the mean square within (MSW).
MSB is a measure of variability due to a combination of treatment effects plus error,
while MSW is a measure of variability due to error only. When MSB is divided by MSW, this produces the F-ratio which provides an estimate of treatment effects.
In a normal distribution, which of the following represents the lowest score?
A. T score = 40
B. z score = 1.0
C. percentile rank = 84
D. stanine = 5
Answer A is correct. In a normal distribution, a stanine score of 5 is equivalent to raw scores that equal the mean or are slightly above or below the mean, a z-score of 1.0 and a percentile rank of 84 are equivalent to the raw score that’s one standard deviation above the mean, and a T-score of 40 is equivalent to the raw score that’s one standard deviation below the mean.
A researcher conducts a study to determine if there are gender differences in acceptance as a graduate student into the six largest departments at a university. To analyze the data she collects in this study, the researcher will use which of the following?
A. one-way ANOVA
B. two-way ANOVA
C. single-sample chi-square test
D. multiple-sample chi-square test
Answer D is correct. The first and second steps in identifying the appropriate statistical test are identifying the study’s independent and dependent variables and the scale of measurement of the dependent variable. This study has two variables – gender and department – and gender can be viewed as the independent variable and department as the dependent variable. The dependent variable – department – is a nominal variable. The chi-square test is the appropriate test for analyzing nominal data and, when a study has two or more variables, the multiple-sample chi-square test is used. Keep in mind that, for the chi-square test, you count the total number of variables regardless of whether they’re independent or dependent variables: The multiple-sample chi-square test is used when the study has two or more variables, and the single-sample chi-square test is used when a study is a descriptive study and has only one variable.
When using the multitrait-multimethod matrix, a small monotrait-heteromethod coefficient indicates that the test:
A. lacks adequate convergent validity.
B. lacks adequate divergent validity.
C. has adequate incremental validity.
D. has adequate divergent validity.
Answer A is correct. The monotrait-heteromethod coefficient indicates the correlation between the test being evaluated for construct validity and a measure of the same trait (monotrait) using a different method of measurement (heteromethod). For example, if the multitrait-multimethod matrix is being used to assess the construct validity of a self-report measure of assertiveness, a monotrait-heteromethod coefficient might be the correlation between the self-report measure of assertiveness and a teacher report measure of assertiveness. If this coefficient is small, this indicates that the assertiveness test may lack adequate convergent validity because it doesn’t have a high correlation with a test it should correlate with. (A measure’s construct validity is demonstrated when it has adequate levels of both convergent and divergent validity.)
The Taylor-Russell tables are used to obtain an estimate of a predictor’s:
A. criterion-related validity.
B. incremental validity.
C. likelihood of causing adverse impact.
D. susceptibility to the effects of measurement error.
Answer B is correct. The Taylor-Russell tables are used to obtain an estimate of a predictor’s incremental validity for various combinations of criterion-related validity coefficients, base rates, and selection ratios.
When a predictor included in a multiple regression equation has a negative beta weight, this means that:
A. The predictor has a negative correlation with the other predictors.
B. The predictor has a negative correlation with the criterion.
C. The predictor has a statistically significant relationship with the criterion.
D. The predictor does not have a statistically significant relationship with the criterion.
Answer B is correct. Beta weights are standardized regression coefficients, and a predictor’s beta weight indicates the strength of the relationship between the predictor and the criterion. When a predictor’s beta weight is positive, this means there’s a positive relationship between the predictor and criterion (i.e., as scores on the predictor increase, scores on the criterion increase). Conversely, when the beta weight is negative, this means there’s a negative relationship between the predictor and criterion (i.e., as scores on the predictor increase, scores on the criterion decrease).
A job applicant’s score on a job knowledge test is used to predict what her future score on a measure of job performance will be if she’s hired. If the applicant’s predicted job performance score is 75, the measure of job performance has a standard deviation of 6, and the standard error of estimate is 4, the 95% confidence interval for the applicant’s predicted score of 75 is:
A. 69 to 81.
B. 63 to 87.
C. 71 to 79.
D. 67 to 83.
Answer D is correct. The 95% confidence interval for a predicted score is calculated by adding and subtracting two standard errors of estimate to and from the predicted score. In this situation, the applicant’s predicted score is 75 and the standard error of estimate is 4, so the 95% confidence interval is 75 minus and plus 8 (two standard errors), which is 67 to 83.
While developing a new test of job knowledge, a test developer administers the test items to a sample of 50 recently hired employees. Based on the results, he eliminates a few items and changes several other items to increase the test’s reliability and validity. When he readministers the test along with a measure of job performance to the same sample, he obtains a criterion-related validity coefficient of .65. If he then administers the test and measure of job performance to a different sample of 50 recently hired employees, he’s most likely to obtain a validity coefficient that is:
A. greater than .65.
B. less than .65.
C. greater or less than .65.
D. equal to .65.
Answer B is correct. To identify the correct answer to this question, you have to recognize that the test developer cross-validated the test when she administered it to a new sample and know that validity coefficients tend to be lower for the cross-validation sample than for the original sample. This is referred to as “shrinkage,” and it occurs because all the chance factors that contributed to the correlation between the job knowledge test (predictor) and measure of job performance (criterion) for the sample that was used to develop the test are not likely to be present in the cross-validation sample.
To determine if “test unfairness” is the reason why a selection test is having an adverse impact on older job applicants, you would:
A. compare the validity coefficients of the test for older and younger workers in the validation sample.
B. compare the selection test and job performance scores obtained by older and younger workers in the validation sample.
C. determine if the job performance scores obtained by older workers in the validation sample were affected by criterion contamination.
D. determine if the selection test scores obtained by older and younger workers in the validation sample were affected by rater biases.
Answer B is correct. As defined by the EEOC, “test unfairness” occurs when members of one group consistently obtain lower scores on a selection test or other employment procedure but the score difference is not reflected in differences in scores obtained by different groups on a measure of job performance.
In a positively skewed distribution of scores, the __________ is the lowest score and the __________ is the highest score.
A. mode; mean
B. mean; mode
C. median; mean
D. mean; median
nswer A is correct. Skewed distributions are asymmetrical with most scores “piled up” in one tail of the distribution and a few scores in the other tail: In a positively skewed distribution, the few scores are in the positive tail (the high end of the distribution); in a negatively skewed distribution, the few scores are in the negative tail (the low end of the distribution) – i.e., the “tail tells the tale.” In both distributions, the mean, median, and mode do not equal the same value: Instead, the mean is in the tail with the few scores, the median is in the middle, and the mode is in the tail containing most of the scores. Consequently, in a positively skewed distribution, the mean is the highest score, the median is the middle score, and the mode is the lowest score. Conversely, in a negatively skewed distribution, the mean is the lowest score, the median is the middle score, and the mode is the highest score.
A study conducted by Kaczynski, Lindahl, Malik, and Laurenceau (2006) confirmed their hypothesis that the relationship between marital conflict and child adjustment was due to the impact of marital conflict on parenting style which, in turn, impacted child adjustment. In this situation, parenting style is a(n) ________ variable.
A. moderator
B. mediator
C. extraneous
D. independent
Answer B is correct. Mediator variables are also known as intervening variables and explain the relationship between two other variables. Kaczynski et al. found that the relationship between marital conflict and child adjustment was explained (mediated) by parenting style. In others, they found that parenting style was a mediating variable. [K. J. Kaczynski, K. M. Lindahl, N. M. Malik, and J. Laurenceau, Marital conflict, maternal and paternal parenting, and child adjustment: A test of mediation and moderation, Journal of Family Psychology, 20, 199-208, 2006.]
An essential feature of schizoid personality disorder is:
A. feeling disconnected from others.
B. distrust and suspiciousness of others.
C. feeling uncomfortable in social situations.
D. fear of disapproval and rejection in interpersonal situations.
Answer A is correct. This is the best answer because it’s most consistent with the DSM-5 description of schizoid personality disorder, which states that it involves “a pervasive pattern of detachment from social relationships and a restricted range of expression of emotions in interpersonal settings” (2013, p. 652).
Which of the following is used to determine a test’s internal consistency reliability?
A. kappa statistic
B. coefficient alpha
C. coefficient of concordance
D. eta
Answer B is correct. A test’s internal consistency reliability can be evaluated in several ways including with the use of coefficient alpha, which is also known as Cronbach’s alpha and indicates the average of the correlations between responses to all possible pairs of test items. The kappa statistic and coefficient of concordance are used to assess interrater reliability, and eta is a correlation coefficient that’s used to measure the degree of association between two continuous variables that have a nonlinear relationship.
A scatterplot is constructed from the scores obtained by a sample of employees on a newly developed selection test (X) and a measure of job performance (Y). The scatterplot indicates that the variability of Y scores is about the same for all scores on X. Which of the following terms describes this situation?
A. homoscedasticity
B. heteroscedasticity
C. unrestricted range
D. restricted range
Answer A is correct. The terms homoscedasticity and heteroscedasticity are used to describe the relationship between two variables in terms of the amount of variability in one variable for different values of the other variable. Homoscedasticity occurs when the variability of scores on one variable is about the same at different values of the other variable; heteroscedasticity occurs when the variability of scores on one variable differs at different values of the other variable. Homoscedasticity is one of the conditions that tends to increase the correlation coefficient. Also, when data are homoscedastic, the use of a regression equation to predict people’s scores on Y from their scores on X will produce the same accuracy of prediction for all scores on X.
When designing a research study, you would use the double-blind technique to reduce which of the following?
A. experimenter expectancy
B. carryover effects
C. pretest sensitization
D. false consensus effect
Answer A is correct. When using the single-blind technique, subjects do not know which groups they are in (e.g., drug or placebo); when using the double-blind technique, subjects and experimenters do not know what groups subjects are in. An advantage of the double-blind technique is that it reduces experimenter expectancy, which is also known as experimenter bias and refers to the effects of the experimenter’s knowledge about the purpose of the study on the study’s outcomes. Neither the single-blind nor the double-blind technique are useful for controlling carryover effects or pretest sensitization which are threats to a study’s external and internal validity, respectively. The false consensus effect is not relevant to internal or external validity and is the tendency to overestimate the extent to which other people share our opinions, values, and beliefs
Conducting a utility analysis for a new selection test would be useful for:
A. estimating the positive hit rate when the test is added to the current selection procedure.
B. determining the likelihood that use of the test will have an adverse impact on members of racial/ethnic minority groups.
C. obtaining information on the return on investment that can be expected when the test is used to hire job applicants.
D. estimating the degree to which an actual criterion adequately assesses the ultimate criterion.
Answer C is correct. Utility analysis is used to obtain information on the monetary value (return on investment) of a selection test, training program, or other employment practice.
You would use which of the following statistical tests to compare the number of adults living in a rural, urban, or suburban community who have received a diagnosis of a bipolar disorder, depressive disorder, or anxiety disorder?
A. single-sample chi-square test
B. multiple-sample chi-square test
C. one-way ANOVA
D. factorial ANOVA
Answer B is correct. The first and second steps in identifying the appropriate statistical test are identifying the study’s independent and dependent variables and the scale of measurement of the dependent variable. This study has two variables (diagnosis and community type); however, it’s a descriptive study rather than an experimental study, so it’s difficult to identify one of the variables as the independent variable and the other as the dependent variable. In this situation, you identify the scale of measurement of the data to be analyzed. The data are the frequency (number) of individuals in each category – e.g., the number of people who live in an urban area and have received a diagnosis of a bipolar disorder. In other words, the scale of measurement is nominal: Subjects will not receive a score but will belong to a category.
The chi-square test is used to analyze nominal data and, when there’s more than one variable, the multiple-sample chi-square test is the appropriate test. Note that the multiple-sample chi-square test is also known as the chi-square test for contingency tables.
When designing and conducting a research study, you can increase statistical power by doing which of the following?
A. reducing the size of alpha
B. decreasing the effect size
C. randomly selecting subjects from the population
D. using a parametric test when it’s appropriate to do so
Answer D is correct. Increasing the size of alpha, increasing the effect size (the magnitude of the effects of the independent variable), and using a parametric test when it’s appropriate to do so are methods for increasing statistical power, which is the ability to reject a false null hypothesis. Randomly selecting subjects from the population increases a study’s external validity but does not affect statistical power.
To compare the effectiveness of two brief treatments for social anxiety disorder, you obtain a sample of individuals who have received this diagnosis, determine the severity of each subject’s social anxiety, match the subjects in pairs based on the severity of their symptoms, and randomly assign one member of each pair to one of the treatments and other member to the other treatment. To compare the scores subjects in the two groups receive on a measure of symptom severity after they receive treatment, you will use which of the following?
A. t-test for correlated samples
B. t-test for uncorrelated samples
C. two-way ANOVA
D. single-sample chi-square test
Answer A is correct. The first steps in identifying the appropriate statistical test are to identify the independent and dependent variables and the scale of measurement of the dependent variable. This study’s independent variable is type of treatment and the dependent variable is score on a measure of severity of anxiety following treatment. More specifically, the dependent variable is score on a measure of symptom severity and, for the exam, you can assume that scores represent an interval or ratio scale. This means that a statistical test will be used to compare the mean scores obtained by two groups. The t-test and two-way ANOVA are both used to compare mean scores but, because there are only two means, the t-test is the appropriate test. To determine which t-test to use, you determine how the means were obtained: In this study, they were obtained from related groups (from groups that consist of subjects who were matched in terms of initial symptom severity). The t-test for correlated samples is the appropriate test when two means are obtained from the same group or from two groups that are related in some way.
An assumption of classical test theory is that measurement error:
A. is random.
B. is systematic.
C. is random and systematic.
D. cannot be estimated.
Answer A is correct. Classical test theory is based on the assumption that an examinee’s obtained test score is due to a combination of “truth” and measurement error, with truth referring to the “true” amount of the characteristic measured by the test that the examinee has and measurement error being random (unsystematic).
Which of the following is used to estimate the effect of increasing a predictor’s reliability on its criterion-related validity coefficient?
A. Spearman-Brown formula
B. correction for attenuation formula
C. coefficient of determination
D. Cronbach’s alpha
Answer B is correct. Inadequate reliability is one of the factors that reduces a predictor’s criterion-related validity coefficient, and the correction for attenuation formula is used to estimate the effect of increasing the reliability of the predictor and/or criterion on the predictor’s criterion-related validity coefficient.