523 - Stats DEENA'S VERSION Flashcards

1
Q

achievement test

A

WHAT: A test designed to measure how much someone knows about a particular topic.
- measures previous learning NOT ones ability to learn something
- used in schools and education settings

WHY: Achievement tests offer a standardized measure to compare individuals or groups. The scoring is objective and reliable. They may also help to highlight academic strengths and weaknesses.

EXAMPLE: Comps is an achievement test designed to measure how much students have learned in the ten core classes of the program. Also, if they’ve learned enough to continue in the program.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ANOVA

A

WHAT: A statistical technique used to compare three or more experimental groups at a time.
- analysis of variance.
- different than t-tests because they can analyze differences even if groups have different sample sizes.

WHY: ANOVAs determine whether there is a significant difference between groups. Can also reduce the chances of type I errors (false positives).

EXAMPLE: There is an experiment done to compare test scores using three different study techniques: flashcards, note reading, and practice tests. An ANOVA test is run to see if there are any significant differences between the groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

aptitude test

A

WHAT: Measures a person’s potential to learn specific skills/gain knowledge on a topic
- rely heavily on predictive criterion validation procedures.
- prone to bias (cultural, racial, language).

WHY: Aptitude tests are important to help understand a person’s innate potential. They can help predict future performance in specific areas and help ensure that students are enrolled in programs that match their capabilities.

EXAMPLE: The ACT is an aptitude test designed to predict a student’s potential success in college. There is reason to doubt the predictive validity of the ACT (racial, gender bias).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

clinical vs statistical significance

A

WHAT: Clinical = meaningfulness of change in a client’s life
How meaningful/important are the changes to the patient? What percentage of patients are benefitting?

Statistical = reliability of an outcome; calculated mathematically
- considered statistically significant if p-value is < .05 (<5% chance results are due to chance)
- larger sample = less likely results due to chance

WHY: Findings can be clinically significant without being statistically significant, or vice versa.
This is important to remember while understanding research, and understanding if a treatment may be helpful for a disorder.

EXAMPLE: A therapist is trying to decide between two different treatments for a client. One treatment has a high clinical significance and a statistical one. The other has a high statistical significance, but a low clinical significance. The therapist chooses the first treatment, as the patients in the study have a higher quality of life, and fewer of them meet diagnostic criteria post-treatment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

construct validity

A

WHAT: The degree to which a test is capable of measuring all aspects of what it claims/aims to be measuring.
- Focuses on the attributes, features, and ability of a measurement instrument being tested

Divergent/Discriminate validity = how well the test does NOT correlate with other tests that measure different constructs
Convergent validity = how well the test correlates with other tests that measure the same constructs

WHY: It is important to keep construct validity in mind to ensure you are measuring what you intend to research. Additionally, steps can be taken to avoid things that threaten construct validity, such as a mismatch between the construct and its operational definition, bias, experimenter, and participant effects.

EXAMPLE: A group of researchers create a new test to measure depression. They want to ensure that the test has construct validity (that it is actually measuring the construct of depression). To do this, they measure how much the test correlates with the BDI (convergent validity) and how much it does not measure another construct like anxiety (divergent validity).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

content validity

A

WHAT: The degree to which a measure represents all aspects of a given construct
- how well a measure encompasses the full domain of what it is trying to measure
- Is the test/items on test representative of what it aims to measure?

Can’t be measured empirically– assessed via factor analysis

WHY: Considering content validity is important in research to ensure a measure measures the entire range of what it aims to test. Useful to assess whether items are relevant

EXAMPLE: A test is designed to survey arithmetic skills at a fourth-grade level. The test’s level of content validity indicates how well it represents the range of arithmetic skills possible at the level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

correlation vs causation

A

WHAT:
Correlation = relationship between two variables (correlation coefficient between -/+1)

Causation = when change in one variable brings a change in the other variable. determined via controlled studies

WHY: Correlation ≠ causation!! Important to consider when creating + consuming research to know how/why two variables are related, and to be able to deduct accurately

EXAMPLE: Ice cream sales and drowning rates are positively correlated. This is not because none causes the other, but rather because both are more common during summer months.
Annie is examining the relationship between social media and her body image. She abstained from social media for one month and noticed her body image became more positive. She now has reason to believe there is a causational relationship between the two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

dependent t-test

A

WHAT: A statistic analysis that compares the means of two RELATED groups
- to determine whether there is a statistically significant difference between their means
- used when the design involves matched pairs or repeated measures (ex:pretest & posttest), and has 2 levels of the IV
TEST BEFORE AND AFTERS/WITHIN GROUP

WHY: Called ‘dependent’ because the groups have characteristics that impact the measurement. The measurement is dependent on these characteristics
They allow for researchers to control for individual characteristics

EXAMPLE: A research wants to test how effective a relaxation technique is on reducing stress levels in college students. Stress levels are recorded before and after the use of the relaxation technique. A dependent t-test is conducted to compare the mean stress levels before and after intervention to determine if the relaxation technique made a statistically significant difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

independent t-test

A

WHAT: Used to determine is there are significant differences between two UNRELATED group means
- used with two conditions of IV
TEST DIFF IVS/BETWEEN GROUPS

WHY: Independent t-tests allow you to see to the effects of two different interventions. Significant differences indicate the intervention produced different results.

EXAMPLE: Researchers are comparing the effectiveness CBT vs DBT on treating depression. After treatment, they may run an independent t-test to see if there were any significant differences in symptoms between the groups. This may indicate if one treatment was better at reducing symptoms than the other.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

internal consistency

A

WHAT: Measures the extent of which items on a test measure a specific ability or trait.
- type of reliability
- measured with Cronbach’s alpha, ranges 0 - 1

Do items that are intended to measure the same construst produce similar scores?

WHY: Internal consistency shows the degree of interrelationship/homogeneity of items on a test. It is important to ensure a test truly measures what it’s supposed to be measuring.

EXAMPLE: Molly is creating a test to measure the Big 5 personality traits. She tests the test’s internal consistency to ensure it adequately is measuring what she intended it to. The Cronbach’s alpha comes out to 0.91, indicating a good internal consistence. Molly’s test is suitable for use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

internal validity

A

WHAT: The extent to which the observed relationship between variables (IV & DV) in a study reflects their actual relationship
- how sure you can be that the intervention was the only reason for change in the DVs

To increase internal validity = control for cofounding variables, randomly select participants

WHY: A study with a high internal validity may indicate causation. Internal validity indicates whether one can draw reasonable conclusions about the cause-and-effect relationships among variables in a study.

EXAMPLE: A group of researchers were testing a new treatment for depression. They highly controlled who could be a participant, including not allowing anyone with a comorbid disorder. This reduced potential cofounding variables, increased the study’s internal; validity, and therefore increased the likelihood that their treatment was the sole reason for change in participants.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

interrater reliability

A

WHERE: applied statistics and psychometrics

WHAT: Measures the agreement level between independent raters
- the extent to which independent evaluators produce similar ratings in judging the same thing in the same person/object
- useful with measures that less objective and more subjective
- expressed with correlation coefficient

WHY: Interrater reliability is used to compensate/account for human error in an independent rater (distractibility, misinterpretation, differences in ability)

EXAMPLE: A natural observation study is being conducted to look at the effect of violent video games on the behavior of 10 year old boys. 3 independent observers were to rate the level of aggressiveness of the boys’ behavior. The responses were consistent and yield a high correlation coefficient, indicating good interrater reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

measures of central tendency

A

WHERE: applied statistics and psychometrics

WHAT: Statistical descriptions of the center of the distribution
Mean = average
Median = point that separates distribution into two halves
Mode = most frequently occurring
**median and mode most resistant to outliers

WHY: Describes a data set/distribution. Allows for a better understanding of the data, as well as for inferences to be made about trends and the shape of the distribution.

EXAMPLE: A researcher is studying the frequency of BPD patients intentionally skipping their medications per month. To better understand the gathered data, the researchers calculate the most frequently occurring number of days, the average number of missed doses, and the number of missed days in the center of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

measures of variability

A

WHAT: Statistic description of the variability of the distribution around the central tendency
- Range
- Variance (the average of each value’s SQUARED difference from the mean)
- SD (square root of variance)

WHY: Describes a data set/distribution. Allows for a better understanding of the data, as well as for inferences to be made about trends and the shape of the distribution. Also allows you to see outliers and determine if they should be dropped.

EXAMPLE: A school counselor is assessing math test scores from a class. After finding the standard deviation of the scores, she was able to see the outliers and determine who was really struggling and excelling in this subject.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

nominal/ordinal/interval/ratio measurements

A

WHAT: Levels of measurement of variables.
Nominal = categorical (gender, political parties)

Ordinal = indicate order (birth order, Likert scale, stages/steps)

Interval = true score, no true zero– zero does not indicate none/an absence (temp, test score, IQ)

Ratio = interval data, with a true zero– zero indicates none/an absence (height, weight, speed, frequency of behaviors)

WHY: Nominal and ordinal data are non-continuous, while interval and ratio are continuous. Important to know the difference when gathering and organizing data.

EXAMPLE: A researcher is giving out surveys for a study they are conducting. The survey asks for the participants gender, their height, and asks them to rate their moods on a Likert scale. These are examples of nominal, ratio, and ordinal measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

norm-referenced scoring/tests

A

WHERE: Taught in applied stats and psychometrics

WHAT: Evaluates a test taker’s performance against a representative sample of the population
- used for the purpose of making comparisons with a larger group
- norms should be current, relevant, and representative of the group to which the individual is being compared

WHY: Allows comparisons/rankings to be made.
Helps to interpret individual scores in a meaningful way by placing them in the context of a larger group.
It is important as Norm-referenced scoring/tests can be problematic when tests are not normed with a culturally diverse population. This can lead to inappropriate scoring.

EXAMPLE: IQ testing is an example of norm referenced scoring/testing because an individual’s score is always interpreted in comparison to a representative sample of the population.

17
Q

normal curve

A

WHERE: Taught in applied stats and psychometrics

WHAT: The bell-shaped curve which is created by a normal distribution of a population.
- describes the shape of a frequency distribution in which most occurrences take place in the middle and taper off to either side of the mean
- symmetrical
- mean, median and mode are the same value

*random sampling tends to make a normal curve.

WHY: It is important to understand what the normal curve is as many statistical models are based on the assumption that data follow a normal distribution.

EXAMPLE: A researcher is developing a new intelligence test. After obtaining the results, they found that the scores fell along a normal curve: most participants scored in the middle range with very few obtaining either the highest or lowest scores (scores were normally distributed).

18
Q

objective tests

A

WHERE: Taught in applied stats and psychometrics

WHAT: Type of unbiased and structured psychological assessment instrument
- consists of items that have specific correct answers (yes/no, true/false)
- the items are direct/not open to interpretation
- answers are scored quantitatively (+/- points for each question)

WHY: Important as it leaves no interpretation, judgment, or personal impressions involved in scoring. Scoring of the test is unbiased.

EXAMPLE: A researcher found that when assessing clients with Borderline Personality Disorder, objective tests of personality–such as the MMPI– were more valid in providing personality information than projective tests–such as the Rorschach Test. The researcher’s personal bias or judgment could hinder the test the scoring of a projective test, thus affecting the reliability and validity of the measure

19
Q

probability

A

WHERE: Taught in applied stats and psychometrics

WHAT: A mathematical statement indicating the likelihood that something will occur based on data
- symbolized by (p)
- a p-value of .05 is generally accepted (95% chance the IV influenced the DV)

WHY: The higher the p value, the more likely that the phenomenon or event happened by chance. A p-value less than 5% indicates that the change was due to purposeful manipulation.

EXAMPLE: A psychologist wants to understand the probability that a child from poverty will grow up to have a drug addiction. She will distribute a test and assess it using a p-value to see if poverty influences potential drug addiction. A p-value < 5% would indicate that poverty is a predictor of drug addiction, while a p-value > 5% would indicate it’s not.

20
Q

projective tests

A

WHERE: Taught in applied stats and psychometrics

WHAT: A type of non structured assessment in which the test-taker is asked to provide a spontaneous response to ambiguous stimuli
- no right or wrong answer
- often used with personality tests

Based on a projective hypothesis = when people attempt to understand an ambiguous/vague stimulus, their interpretation reflects their deeper feelings/thoughts/experiences/needs/etc.

WHY: Important to know that subjective test scoring is prone to bias and error. TAT and Rorschach tests are generally used and require extensive training. These tests need to be interpreted so they are more at risk for error.

EXAMPLE: The therapist administers the Rorschach inkblot test, a projective test. The purpose is for the client to give their response based on their subjective representation of the inkblot. However, the client’s response to the inkblot could be an indicator of their mood at the time of testing.

21
Q

parametric vs nonparametric statistical analyses

A

WHERE: Taught in applied stats and psychometrics

WHAT:
Parametric statistical analyses
- based on normal distributions
- require certain assumptions about the data set
- * used with interval/ratio data*

Nonparametric statistical analyses
- based on skewed/not normal distributions
- used with small sample sizes
- do not require strict assumptions
- used with nominal/ordinal data

WHY: Important to understand the differences between them to know which one to use. Parametric analyses are preferred because they have greater statistical power and are more likely to detect statistical significance. Nonparametric analyses are used when necessary.

EXAMPLE: A researcher sets up a study categorizing participants to see what proportion of students enrolled in the program prefer exams over papers (Likert scale) This type of research uses ordinal data and would fall under a nonparametric research design versus the parametric design.

22
Q

regression

A

WHERE: Taught in applied stats and psychometrics

WHAT: Predictions made using correlated data
- helps to predict one variable based on another correlated one (predict DV based on IV)

Linear regression = one variable predicts another
Multiple regression = uses multiple predictor variable to predict another variable

WHY: It can be used to describe, explain, or predict an outcome (DV) using one or more predictor (IVs). Can be used to understand the relationship between variables.

EXAMPLE: A researcher finishes a study in which they find a positive correlation between caffeine and test scores. They want to use this data to estimate an individual’s test score based on the amount of caffeine they consume so they calculate a regression. They will use the equation to make a prediction.

23
Q

types of reliability:

A

WHERE: Taught in applied stats and psychometrics

WHAT: Degree to which a measure is free of random error, yielding the same results across multiple applications.

Internal reliability
- extent to which a measure is consistent within itself

External reliability
- extent to which a measure varies from one use to another

Test-Retest reliability
- consistency of a measure from one administration to another
- same test given at two points in time

Interrater reliability
- degrees of consistency/correlation between independent raters

Parallel Forms reliability
- consistency/correlation of results of two similar tests measuring the same thing
- form of test-retest

WHY: Important as they are used to assess how reliable a testing method is to get the same results over different conditions and free of measurement error.

EXAMPLE: While developing a new version of an IQ test, researchers gave the test to the same group of subjects at two different times to evaluate the instrument’s test-retest reliability. If the scores from each date are correlated, the test has high test-retest reliability.

24
Q

sample vs population

A

WHERE: Taught in applied stats and psychometrics

WHAT: Sample = small subset of the population that is selected to represent the general population in a study
Population = all members of a group; the larger group of individuals from which a sample is selected

WHY: It’s important to ensure that the sample is representative of the population in research because it increases the potential that any findings of importance can be generalized back to the whole population

EXAMPLE: A researcher wants to conduct a study to examine how opioid addiction affects depression. As it would be nearly impossible to study every individual with an opioid addiction, they create a sample of individuals that most closely represents the whole population.

25
Q

standard error of estimate

A

WHERE: Taught in applied stats and psychometrics

WHAT: Measures accuracy of predicted outcomes
- average of (actual score-predicted score) for entire data set
- indicates the accuracy of the predictions made by a regression analysis.

Lower SEE = more accurate prediction

WHY: Important to know the SEE, as it indicates how accurate your prediction is.

EXAMPLE: A researcher has found a positive correlation between caffeine (IV) and test scores (DV). They use a regression to make predictions of a person’s test scores based on caffeine intake. Now, they want to know how accurate their predictions are, so they calculate the standard error of estimate.

26
Q

standard error of measurement

A

WHERE: Taught in applied stats and psychometrics

WHAT: An estimate of how much an individual’s score on a test is expected to change upon retesting with the same/equivalent form of test
- differences due to error
Smaller SEM = more precise measurementt
Larger SEM = more error in test

WHY: Important as one’s SEM helps to assess the reliability of a test. It also indicates how confident you can be that an individual’s obtained score on a test represent their true score. Not being aware of a higher SEM may lead to misinterpretation of a test score.

EXAMPLE: A researcher develops a test to measure depression, then administers it to a sample. They then want to use this data to make predictions about scores. They calculate the SEM, which turns out to be low, meaning the predictions are more accurate and that participants’ obtained test scores are a good representation of their true scores.

27
Q

standard error of the difference (2 sample t-test)

A

WHERE: Taught in applied stats and psychometrics

WHAT: Statistical calculation used to measure how much the difference between two group means vary due to chance/error vs a true difference
- tells how much to expect diffs b/w both group means if study is repeated

Low SED = scores are consistent across groups, likely due to true difference
High SED = likely due to chance/error

WHY: Important as the SED helps to tell whether an observed difference is due to random error, or due to true change. It is also used to calculation the t-value in a 2 sample t-test, which compares means to two samples to see if they’re from the same pop.

EXAMPLE: A researcher conducts a study on how caffeine affects test scores. They take the mean of scores from each group (with and without caffeine) and calculate the differences between the means. They then use the S.E.D to see the amount of error between the estimated and actual difference.

28
Q

test bias

A

WHERE: Taught in applied stats and psychometrics

WHAT: An error in the measurement process that influences scores/disadvantages certain groups
OR
Tendency for scores of a test to over or underestimate the true scores
- often bc participants are due to specific minority groups

WHY: Important to keep in mind when dealing with clients in minority groups, or while developing tests. Certain tests may favor certain groups, while disfavoring another leading to inaccurate results.

EXAMPLE: Researchers are developing a test to assess depression levels. The researchers are not careful when writing the questions, are use language that is not easily understood by non-white Americans. This test is now biased, and scores from non-white Americans will not reflect their true scores.

29
Q

type I and type II error

A

WHERE: Taught in applied stats and psychometrics

WHAT:
Type I Error = (liar, false positive) researchers reject the null when they should have failed to reject
- wrongly thought IV affected DV

Type II Error = (stupid, false negative) researchers fail to reject the null when they should have rejected it
- wrongly thought IV did not affect DV

WHY: Important to keep in mind when consuming research data. The awareness may also help to reduce error when conducting research.

EXAMPLE: A researcher is testing a new drug to treat depressive symptoms. After reviewing the results, they concluded that the drug reduced symptoms. However, the results were wrong, and the drug had no impact. This is a Type I error.

30
Q

types of validity:

A

WHERE: Taught in applied stats and psychometrics

WHAT: How well a measure is measuring the intended construct.

Types:

  • Content Validity
  • the degree to which to measure represents all aspects + encompasses full domain a given construct
  • Criterion Validity:
  • how much a test measuring a construct correlates with a different test measuring the same construct
  • how well one measure predicts the outcome of another measure
  • Concurrent Validity
  • how much a NEW test correlates with older, previously established/validated tests
  • Construct Validity
  • how much a test measures the construct it was intended to measure
  • Internal Validity
  • if changes observed in DV are due to the intervention/manipulation of IV and not due to other factors
  • External Validity
  • can the findings of a study be generalized to other populations/situations

WHY: Measures of validity are important as it’s vital to know if a test is adequatley measuring the intended construct. When developing tests and consuming research, important to know if your test is valid/effective.

EXAMPLE: A researcher is developing a test to measure depression symptoms. They want to ensure their test has concurrent validity, so they calculate the correlation between their test and the Beck Depression Inventory. They found their test had a high correlation, indicating good concurrent validity.

31
Q

variance

A

WHERE: Taught in applied stats and psychometrics

WHAT: The measure of the spread of scores within a sample or population from the sample mean. Useful for statistical analysis, but NOT descriptive statistics

Small variance = similar scores
Large variance = larger distance from the mean/larger range

Why: Variance is helpful in research because it can be quantified using statistics and converted to a number that can be used to compare between samples or across samples in populations to see which has the most or least variance or to see how much variance may change due to an intervention or treatment applied.

WHY: This is important as the variance of scores in research can be converted into a number that can be used to compare between or across samples within a population. Levels of variance can indicate change due to an intervention or treatment applied to a sample group.

EXAMPLE: A researcher is studying the effects of an SSRI on depression symptoms. The variance between the placebo group and the treatment group is high. This means the treatment caused significant change in depressive symptoms.