523- Applied Statistics and Psychometrics Flashcards

Question 1

Q

Achievement test

Answer

A

A test that is designed to measure an individual’s level of knowledge in a particular area; generally used in schools and educational settings. Unlike an aptitude test, which measures a person’s ability to learn something, an achievement test focuses specifically on how much a person knows about a specific topic. It measures an individual’s previous learning. Example: The comps exam is an achievement test, as it is designed to measure how thoroughly clinical counseling students have learned the information in the ten core classes of the program.

Question 2

Q

ANOVA

Answer

A

Analysis of variance A parametric statistical technique used to compare more than two experimental groups at a time. Determines whether there is a significant difference between the groups, but does not reveal where that difference lies. Clinical example: A group of psychiatric patients are trying three different therapies: counseling, medication and biofeedback. You want to see if one therapy is better than the others. You will gather data and run an ANOVA on the three groups- counseling, medication, and biofeedback- to see if there is a significant difference between any of them.

Question 3

Q

Aptitude test

Answer

A

Measures a person’s potential to learn or acquire specific skills. Often used for measuring high school students’ potential for college. Aptitude tests are prone to bias. Example: The SAT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the SAT.

Question 4

Q

Clinical vs. statistical significance

Answer

A

Clinical significance refers to the meaningfulness of change in a client’s life. Statistical significance is calculated when p < .05, meaning the likelihood that your results are due to chance is less than 5%. Statistical significance indicates that it is unlikely you have made a Type I error. Must calculate effect size to truly evaluate the meaningfulness of a result.

Question 5

Q

Construct validity

Answer

A

Part of: research design Construct validity is the degree to which a test or study measures the qualities or the constructs that it is claiming to measure.* Convergent validity: does test correlate highly with other tests that measure the concept* Divergent validity: does test correlate lowly with tests that measure different constructs Clinical example: A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity, in that it is actually measures the construct of depression. To do this, they measure how much the test correlates with the Beck Depression Inventory and how much it does not measure another concept like anxiety.

Question 6

Q

Content validity

Answer

A

Part of: research design Content validity is the degree to which a measure or study includes all of the facets/aspects of the construct that it is attempting to measure. Content validity cannot be measured empirically but is rather assessed through logical analysis. Validity=accuracy Clinical example: A depression scale may lack content validity if it only assesses the affective dimension of depression (emotion related- decrease in happiness, apathy, hopelessness) but fails to take into account the behavioral dimension (sleeping more or less, eating more or less, energy changes, etc.).

Question 7

Q

Correlation vs. causation

Answer

A

Part of: research design and statistical analysis Correlation means that a relationship exists between two variables.* Can be positive or negative; coefficient will fall between -1.00 and +1.00.* Correlation does not indicate causation. Causation means that a change in one variable affects a change in the other variable.* Determined via controlled experiments, when dependent variables can be isolated and extraneous variables controlled. Clinical example: A study found that minutes spent exercising correlated with lower depression levels. This study was able to show that depression levels and exercise were correlated, but could not go so far as to claim that one causes the other.

Question 8

Q

Dependent t-test

Answer

A

Statistical analysis that compares the means of two related groups to determine whether there is a statistically significant difference between these means.* Sometimes called a correlated t-test because the data are correlated.* Used when the design involves matched pairs or repeated measures, and only two conditions of the independent variable* It is called “dependent” because the subjects carry across the manipulation–they take with them personal characteristics that impact the measurement at both points—thus measurements are “dependent” on those characteristics. Clinical example: A researcher wants to determine the effects of caffeine on memory. They administer a memory test to a group of subjects have the subjects consume caffeine then administer another memory test. Because they used the same subjects, this is a repeated measures experiment that requires a dependent t-test during statistical analysis.

Question 9

Q

Descriptive vs. inferential

Answer

A

Descriptive statistics are those which are used to describe and summarize the sample or population.* includes measures of central tendency and variance* can be used with any type of data (experimental and non-experimental)Inferential statistics allow inferences to be made from the sample to the population.* Sample must accurately reflect the population (importance of random sampling)* Infer causality* Limited to experimental data* Techniques include hypothesis testing, regression analysis.* The statistical results incorporate the uncertainty that is inherent in using a sample to understand an entire population. EXAMPLE: A researcher conducts a study examining the rates of test anxiety in Ivy League students. This is a descriptive study because it is concerned with a specific population. However, this study cannot be generalized to represent all college students, so it is not an inferential study.

Question 10

Q

Effect size

Answer

A

Part of: statistical analysis A measure of the strength of a significant relationship; the proportion of variance accounted for. Indicates if findings are weak, moderate, or strong. Also called shared variance or the coefficient of determination. Why: Quantifies the effectiveness of a particular intervention, relative to some comparison; commonly used in meta-analyses. Example: A researcher conducts a correlational research study on the relationship between caffeine and anxiety ratings. The study produces a correlation coefficient of 0.8 which is considered a large effect size. The effect size reflects a strong relationship between the caffeine and anxiety.

Question 11

Q

Independent t-test

Answer

A

Statistical analysis that compares the means of two independent groups, typically taken from the same population (although they could be taken from separate populations).* Determines if there is a statistical difference between the two groups’ means* We make the assumption that if randomly selected from the same population, the groups will mimic each other; the null hypothesis is no difference between the two groups EXAMPLE: Fred is analyzing the best treatment options for his patient Harold. He reads a study comparing two different types of therapies. After utilizing an independent t-test, the researchers found that there was not a statistically significant difference between the treatment options. Harold decides that both are good options for his patient and he decides to think about his client’s person variables that might make one better than the other.

Question 12

Q

Internal consistency

Answer

A

Part of: research design What: a type of reliability that measures whether several items that propose to measure the same general construct produce similar scores and are free from error.* usually measured with Cronbach’s alpha. EXAMPLE: Patient comes in with symptoms of PTSD. You decide to search for a psychological test that is designed to help you to detect and diagnose PTSD. You come across the Posttraumatic Stress Diagnostic Scale (PDS). The test manual indicates that the PDS is a valid measure of PTSD. You look in the test manual of the PDS and find that Cronbach’s alpha is 0.91. This indicates that the PDS has strong internal consistency.

Question 13

Q

Internal validity

Answer

A

Part of: research design What: The extent to which the observed relationship between variables in a study reflects the actual relationship between the variables. Control for confounding variables can increase internal validity, as well as a random selection of participants. EXAMPLE: Researchers investigated a new tx for depressing using tight controls in terms of who could be a participant. For instance, they did not allow anyone with comorbidity to participate. This increased the study’s internal validity. It did, however, jeopardize the ecological validity of the research.

Question 14

Q

Interrater reliability

Answer

A

Part of: research design What: a type of reliability that measures the agreement level between independent raters.* useful with measures that are less objective and more subjective.* used to account for human error in the form of distractibility, misinterpretation or simply differences in opinion. EXAMPLE: Three graduate students are performing a natural observation study for a class that examines violent video games and behavior in a group of 9 year old boys. The students rated the behavior on a scale of 1 (not aggressive) to 5 (very aggressive). However, the responses were not consistent between the observers. The study lacked inter-rater reliability.

Question 15

Q

Measures of central tendency

Answer

A

Part of: statistical analysis What: Tendency of the data to lump somewhere around the middle across the values on X; provides a statistical description of the center of the distribution.* Three main measures are used: the mean, mode and median.* Mean is the arithmetic average of all scores within a data set.* Mode is the most frequently occurring score.* Median is the point that separates the distribution into two equal halves.* Median and mode are the most resilient to outliers. EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. To better understand the data that was gathered, they start by calculating the measures of central tendency: the most frequently occurring number of episodes in the group, the average number of episodes, and the number of episodes in the middle of the set. In other words, the mode median and mean.

Question 16

Q

Measures of variability

Answer

A

In statistics, measures of variability are how the spread of the distribution vary around the central tendency. Three primary measures: range, variance and standard deviation.* Range is obtained by taking the two most extreme scores and subtracting the lowest from the highest.* Variance is the average squared deviation around the mean* Standard deviation is the square root of the variance and is highly useful in describing variability. Why: Helps determine which statistical analyses you can run on a data set. EXAMPLE: A researcher is studying the frequency of binge eating in a group of girls suffering from binge eating disorder. After calculating the measures of central tendency, they decide that they want to know more about the distribution of number of episodes. They decide to calculate the measures of variability. This includes the range, variance, and standard deviation

Question 17

Q

Nominal/ordinal/interval/ratio measurements

Answer

A

These are four types of measurements seen in statistics.* Nominal data: dichotomous, only two levels, such as male and female, or categorical, such as Republican, Democrat, Independent.* Ordinal data (numbers) indicate order only (1st born, 2nd born)* Interval data: true score data where you know the score a person made and you can tell the actual distance between individuals based on their respective scores, but the measure used to generate the score has no true zero (temperature, F or C, SAT scores)* Ratio data: interval data with a true zero (age, height, weight, speed)EXAMPLE: A researcher is creating a questionnaire to measure depression. They include nominal scale questions (“what is your gender?”) ordinal scale questions (“rank your mood today from 1-very unhappy to 5-very happy”) and ratio scale questions (“how many hours of sleep do you get on average?”)

Question 18

Q

Norm-referenced scoring/tests

Answer

A

Part of: psychometrics and assessment What: A norm referenced test evaluates a test taker’s performance against a standardized sample; typically used for the purpose of making comparisons with a larger group. Norms should be current, relevant, and representative of the group to which the individual is being compared. Can be problematic when tests are not normed with a culturally diverse population. Example: The child psychologist tested the adolescent’s IQ and discovered that the child’s IQ was 165, placing him in the 99th percentile, more than 3 SDs above the mean on the normal curve because IQ is normally distributed. IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in terms of typical performance/results.

Question 19

Q

Normal curve

Answer

A

Part of: statistics A normal curve is a normal distribution, graphically represented by a bell-shaped curve.* A frequency where most occurrences take place in the middle of the distribution and taper off on either side* All measures of central tendency are at the highest point of the curve* Symmetrical, extremes are at the tails* Divisible into deviations* Fits any set of data where n=infinity EXAMPLE: A researcher is developing a new intelligence test. After obtaining the results, they found that the scores fell along a normal curve: most participants scored in the middle range with very few obtaining either the highest or lowest scores (scores were normally distributed).

Question 20

Q

Objective tests

Answer

A

Part of: psychometrics, psychological testing/assessment What: Objective tests are more structured than projective tests. They use multiple choice, true/false, or Likert scale format, and are usually self-report. Answers are scored quantitatively, clearly stated questions and answers. No subjective element, therefore not influenced by rater variables. EXAMPLE: The psychologist found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the Rorschach Test– in which his own personal bias or judgment could hinder the test results thus affecting the reliability and validity of the measure. The objective tests were also easier to score.

Question 21

Q

Probability

Answer

A

A mathematical statement indicating the likelihood that something will happen when a particular population is randomly sampled, symbolized by (p).The higher the p value, the more likely that the phenomenon or event happened by chance. Probability is based on hard data (unlike chance); p is between 0 and 1 EXAMPLE: Researchers are conducting a study on the heritability of bipolar disorder. They find that there is a strong genetic link, meaning there is a greater probability of an individual having the disorder if one of their parents also has it.

Question 22

Q

Projective tests

Answer

A

Part of: psychometrics, psychological testing What: Test in which the stimulus, the required response, or both are ambiguous. The general idea behind projective tests is that a person’s interpretation of an ambiguous stimulus reflects his unique characteristics.- Most often personality tests- Have fallen out of favor in recent years. - Tests include the Rorschach inkblot test and the Thematic Apperception Test among others. - Usually these types of tests require extensive training and not a lot of evaluator agreement- Most fall flat when psychometric properties are examined i.e. low reliability low validity EXAMPLE: You are seeing a client and you ask them to interpret a black ‘blob’ while using the Rorschach inkblot test. This is a projective test that suggests the client saying that she sees a crab in the image might be indicative of her mood at the time of testing.

Question 23

Q

Parametric vs. nonparametric statistical analyses

Answer

A

Parametric statistical analyses: inferential procedures that require certain assumptions about the distribution of scores.* usually used with scores most appropriately described by the mean* based on symmetrical distributions* robust procedures with negligible amounts of error.* greater statistical power and more likely to detect statistical significance than nonparametric analyses. Nonparametric statistical analyses involve inferential procedures that do not require stringent assumptions about the parameters of the raw score population represented by the sample data* usually used with scores most appropriately described by the median or the mode.* Nonparametric data have skewed distributions. EXAMPLE: Researchers sets up a study to determine if there is a correlation between hours of sleep per night and ratings of happiness. Because they used a very small sample, they cannot assume the data are symmetrically distributed and therefore must use a nonparametric test.

Question 24

Q

Regression

Answer

A

regression: prediction (based on correlated data)Correlation tells us whether a relationship exists. Regression allows us to predict based on that relationship by identifying the line of best fit. A descriptive statistical technique developed by Sir Francis Galton. Regression=prediction and is based on significantly correlated data.* If two variables are significantly correlated, then we should be able to predict one from another.* linear regression is predicting one variable to another or vise versa.* Multiple regression is the same idea but utilizes multiple predictor variables.* Strength of the relationship determines the amount of error in making predictions; stronger correlation = better prediction* Produces a line of best fit- a straight line that best matches the data and can be used to predict Y given a known X; comes in form Y’= a + bx EXAMPLE: A developmental psychologist performed a study on aggressive behavior in boys and hormone levels. Researchers performed a regression analysis on the data. Their results showed that the severity and frequency of the boys’ aggression could be accurately predicted based on the levels of testosterone.

Question 25

Q

Types of reliability

Answer

A

Part of: psychometrics and research design What: Reliability refers to the accuracy, dependability, consistency, or repeatability of test results. Reliability is a foundational characteristic of “psychometric soundness” Types of reliability: Inter-Rater Reliability examines the degree of consistency between different raters’ scorings, particularly in behavioral observation studies. An ethogram helps researchers create operational definitions to increase inter rater reliability. -Correlation between those scores (Kappa statistic)Test-Retest Reliability refers to the consistency of a measure when the same test is administered to the same person at different points in time. - Only works for stable traits - The interval between measurements must be considered: shorter intervals -> higher carryover - Be careful of developmental milestones Parallel Forms Reliability compares scores on two different measures of the same quality Internal Consistency Reliability examines the consistency of items within a test - Done via split-half, KR20, and Cronbach’s Alpha - Split-half is when test is split in half and the correlation between the two halves is examine EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at several different times to evaluate the instrument’s test-retest reliability.

Question 26

Q

Sample vs. population

Answer

A

Part of: research design A sample is a relatively small subset of the population that is selected to represent the population in a study; sample must be representative of the population being studied. A population is all members of a group; the larger group of individuals from which a sample is selected. All members of the population should have an equal chance of being selected for the sample in a randomized study. EXAMPLE: Researchers want to conduct a study to investigating how opioid addiction affects depression. As it would be nearly impossible to study every single individual with an opioid addiction, they select a sample of individuals that closely represents the whole population. To ensure that the sample is representative, they compare the sample and population means.

Question 27

Q

Standard error of estimate

Answer

A

Part of: statistical analysis What: In regression analysis, this is a measure of the accuracy of predictions made:* how much the data points are spread around the regression line* Also referred to as standard error of the residuals - a residual is the difference between the observed value of the dependent variable (Y) and the predicted value (Y’)* Standard deviation of residuals is the standard error of estimate EXAMPLE: A researcher wants to find out if there is a relationship between social media usage and depression. They collect data and find that there is a positive relationship between the two variables. Next, they calculate the regression line. Next, they want to know how accurate the predictions made using the regression equation are, so they calculate the standard error of estimate.

Question 28

Q

Standard error of measurement

Answer

A

Part of: statistical analysis What: A common tool in research and standardized testing which provides an estimate of how much an individual’s score would be expected to change on re-testing with same/equivalent form of test.* Avg the scores over the infinite number of tests, the average of scores is considered an estimate of the true ability/knowledge (T true score). The standard deviation of all those scores= SEM* (The smaller the SEM the more precise the measurement capacity of the instrument)* creates a confidence band within which a person’s true score would be expected to fall* Has an inverse relationship with the reliability coefficient* Common tool in psychological research and standardized testing EXAMPLE: A researcher develops a test to measure depression, then administers it to a sample. They want to analyze the data that they gathered using statistics. They calculate the SEM and it turns out to be low which indicates that the measurement is fairly precise. They then decide to carry out further statistical analysis.

Question 29

Q

Standard error of the difference (2 sample t-test)

Answer

A

Part of: statistical analysis What: the estimated standard deviation of the differences between the means of two independent samples, meaning it’s the estimate of error between the two groups. A two-sample t-test compares the means of two samples to see if they came from the same population. Example: Anxiety between teenagers in the US and Germany. A researcher conducts a study on how caffeine affects test scores. They take the mean of scores from each group (with or without caffeine) and calculate the differences between the means. They then used the standard error of the difference to find the amount of error between the estimated and actual difference.

Question 30

Q

Test bias

Answer

A

A difference in test scores that can be attributed to demographic variables such as age, sex, and race. Tests are considered biased if a test design systematically disadvantages certain groups of people over others. Example: A therapist develops a test for measuring disordered eating, however it is found to be biased as it assesses societal influences according to norms set by cisgender females.

Question 31

Q

Type I and Type II error

Answer

A

Two types of errors seen in research. Random sampling and increased sample size help avoid these errors. Type I error: occurs when researcher rejects a null hypothesis when it is actually true; detecting an effect that is not present aka “false positive” How to minimize/avoid Type I error: Increase significance level to higher threshold i.e. from 0.05 to 0.01 Type II error occurs when a researcher does not reject a null hypothesis when it is not true; failure to observe a difference when in truth there is one. - researchers incorrectly conclude that the independent variable(s) had no effect on the dependent variable(s) - Aka “false negative” - How to minimize/avoid Type II error: Increase sample size, Increase significance level to higher threshold i.e. from 0.05 to 0.01 EXAMPLE: A researcher is testing a new drug for PTSD. After reviewing the results, they concluded that the drug effectively reduced symptoms; however, the conclusion was wrong and the drug had no impact. This is an example of Type I error because the researchers rejected the null hypothesis of no difference between the tx and control groups when in fact there was no difference .

Question 32

Q

Types of validity

Answer

A

What: A psychometric property. Validity is the extent to which an assessment or study measures the construct intended to measure; a validity coefficient of 0.3-0.4 is considered adequate. Types of validity: Face validity: based on logical analysis rather than statistical analysis, face validity is the appearance that a test measure what it purports to at a surface level. Content validity: evidence that the content of a test adequately represents the conceptual domain it is designed to cover; test items are a fair sample of the total potential content and relevant to construct being tested. Based on logical analysis v. statistical analysis. Criterion validity: extent to which a test corresponds with a particular criterion against which it is compared; indicated by high correlations between a test and a well-defined measure. For example, you might examine a suicide risk scale and suicide rates/attempts or driver skill test and number of infractions on the road. Construct validity: the degree to which the test measures the construct or trait it intends to measure. This may be demonstrated with convergent evidence (when there is a high correlation between two or more tests that purport to assess the same criterion) or discriminant evidence (when two tests of unrelated constructs have low correlations; that is, they discriminate between two qualities that are not related to each other).Example: A business psychologist has developed a new IQ-test that requires only 5 minutes per subject, as compared to 90 min for the test acknowledged as the gold standard. He administers both tests to a sample population and then compared scores. The correlation between the two sets of scores was high indicating that the new test has that the new test has construct validity.

Question 33

Q

Variance

Answer

A

Part of: statistics and data analysis What: variance is a measure of the variation of or differences among subjects in a distribution across the measure X. Variance arises from natural, random differences among subjects or from environmental variations, measurement error, or researcher error. To calculate variance, square the deviation around the mean. It must be squared because sum of deviations around mean would always equal zero. EXAMPLE: A clinical psychologist is doing research on a new tx for substance use disorders. She conducts an experiment in which she compares the tx group to a group that received the gold standard tx and to a control group. At first glance it looks like the level of symptomatic reduction is the same in the new tx and gold standard tx groups, but upon further inspection the psychologist notes that the new tx group has a large amount of variance. That is, some people saw significant sx reduction and others saw very minimal change. She needs to investigate this further. What is it that makes the new tx beneficial for some?