Experimental Design Flashcards

Question

What term describes an experiment that can be repeated and **yields the same results**?

Answer 1

reliable ## Footnote Reliability can apply to the same researchers repeating their experiment on the same subjects, to the same researchers repeating the experiment on different subjects, or to entirely different researchers attempting to repeat the experiment.

Answer 2

replicable, reliable ## Footnote Since the experiment could be repeated at all, it was replicable. However, since its results were found to be inconsistent, it was not reliable.

Answer 3

In the context of research, **validity** is the extent to which a study truly measures what it intends to measure and has results that are applicable outside the exact circumstances of the study. ## Footnote Put simply, validity can be thought of as the extent to which a study's results are *genuine* and *generalizable*.

Answer 4

internal validity ## Footnote If a study is internally valid, that means that if a causal claim is determined (as in, "thing A causes thing B"), that causal claim is likely to be accurate/sound.

Answer 5

**Control** for confounding variables and **rule out** sources of bias. ## Footnote Since internal validity refers to the ability to draw accurate causal conclusions from a study, it is important to reduce or eliminate the impact of confounding variables or bias, both of which can lead to inaccurate interpretations of causality.

Answer 6

external validity ## Footnote Unlike internal validity (which deals with the soundness of causal conclusions), external validity refers to the extent to which a study's results can be generalized to contexts outside the specific circumstances of the study.

Answer 7

1. criterion 2. construct 3. content validity ## Footnote As one might predict from its definition, test validity is very broad. As such, it is sometimes divided into these three slightly more specific subtypes.

Answer 8

**External validity** is not part of test validity. Rather, external validity measures **something separate** (the generalizability of the study). ## Footnote However, both **construct validity** and **criterion validity** are types of test validity.

Answer 9

It refers to the extent to which the results of a given test correspond to those of another well-respected, established, and/or relevant measure. ## Footnote For instance, imagine that job applicants to a large tech firm are typically given a six-hour exam. If the leaders of that firm decide they want to replace it with a 30-minute exam, they will likely first confirm that the results on the shorter exam correspond with the criterion of the longer test.

Answer 10

It refers to whether a given test **accurately evaluates** the construct it was developed to evaluate. ## Footnote In this context, a "construct" is a variable that is being assessed but cannot be directly measured or observed. For example, a test may measure the construct of stress, but to do so, the experimenters must evaluate proxies of stress such as heart rate or cortisol levels.

Answer 11

It refers to how well a given test actually **evaluates the full scope** of what it was designed to test. ## Footnote As such, a test with low content validity might test only a tiny part of the larger phenomenon it was developed to evaluate.

Answer 12

1. **Content validity** evaluates whether a test covers the full scope of a construct. 2. **Construct validity** determines if the test measures the correct construct.

Answer 13

It refers to the extent to which the results of a given test **correspond to results** on some future measure. ## Footnote For instance, if high MCAT scores typically correlate with low medical school dropout rates, then MCAT score has high predictive validity with regard to dropout rate.

Answer 14

* **Criterion validity** assesses how well a test correlates with an established measure. * **Predictive validity** evaluates how well a test predicts future outcomes.

Answer 15

False ## Footnote Whether an experiment produces the same results over multiple identical administrations is reliability, not external validity. External validity refers to the extent to which a study's findings can be generalized to *different* situations.

Answer 16

content ## Footnote Professor Jones' final exam fails to test the entire scope of content covered by the class; therefore, it exhibits low content validity.

Answer 17

external ## Footnote External validity refers to the extent to which the results of a study can be generalized to other situations, often real-life ones. Here, the study lacks externally valid results regarding heart rate.

Answer 18

external ## Footnote External validity refers to the extent to which a study's results can be generalized to other situations or the real world. Unfortunately, the real world is full of confounding variables, which must be minimized to increase internal validity. Therefore, a common struggle in research design is that increases in internal validity bring with them decreases in external validity.

Answer 19

internal ## Footnote Internal validity refers to the ability to draw accurate causal conclusions from a study. Importantly, this is facilitated by properly controlling for potential confounding variables.

Answer 20

construct ## Footnote Here, the teacher is assessing the entirely wrong construct: spelling ability instead of math ability.

Answer 21

predictive ## Footnote Here, college success is a *future* measure, meaning that predictive validity is the most accurate.

Answer 22

criterion validity ## Footnote Here, the results of the new IQ test correspond well to the existing criterion of the gold-standard IQ test.

Answer 23

criterion validity ## Footnote Essentially, a positive control is an existing criterion to which researchers can compare their new treatment. This therefore relates to criterion validity.

Answer 24

They can **normalize the data** against total protein. ## Footnote Often, the results in MCAT experimental passages will contain data that has been "normalized," or scaled in comparison to a given standard. In the example given here, normalization against total protein will allow the researchers to see whether the decrease in the protein of interest has happened proportionally to a decrease in total protein or whether this protein in particular was impacted.

Answer 25

objective ## Footnote Objective measures are (at least under typical circumstances) unbiased, meaning that they cannot be interpreted differently based on the opinions of the interpreter. A person's weight is an example of an objective measure.

Answer 26

subjective ## Footnote Subjective measures are those that are "subject" to opinion, such as open-ended questions and ratings of feelings or perceptions.

Answer 27

* **Quantitative** methods produce numbers as results. * **Qualitative** methods produce results that are not numbers. ## Footnote As such, numerical data can be termed *quantitative data*, while non-numerical data is known as *qualitative data*.

Answer 28

mixed methods ## Footnote If you answered "quantitative and qualitative methods," you're also correct! However, it is important to know that "mixed methods" refers to the use of both quantitative and qualitative methods in a study.

Answer 29

qualitative data ## Footnote All non-numerical data is qualitative. This includes observations, descriptions, words, or phrases, such as the verbal description of the subject's preferred candidate in this example.

Answer 30

mixed methods ## Footnote Since this study is collecting both numerical (ratings from 1 to 10) and non-numerical (short descriptions consisting of words) data, it is a mixed-methods study.

Answer 31

quantitative, subjective ## Footnote Importantly, just because a study collects quantitative data does not mean that it is an objective study! Here, the numbers being collected are subject to opinion, making them subjective.

Answer 32

qualitative, subjective ## Footnote Here, since these descriptions are words (and, as far as we know, do not correspond to any numerical scale), they are qualitative. They are also subjective, since different people experience and describe pain very differently.

Answer 33

objective ## Footnote Here, there is only one right answer: the correct number of dots. This makes this information objective, not subjective. Of course, the subject could always count incorrectly, but that does not make this subjective; rather, to be subjective, different answers would need to be valid depending on the subject's opinions.

Answer 34

It refers to the degree to which multiple measurements are similar to each other. ## Footnote For example, if a person measures the volume of a sample and obtains results of 1.01, 1.02, and 1.00 mL, these results are precise, even if they are nowhere near the actual volume of 0.50 mL.

Answer 35

It refers to the degree to which measurements are similar to the correct value. ## Footnote For example, if a person measures the volume of a sample and obtains a result of 0.50 mL, and the actual volume is 0.50 mL, then this result is extremely accurate.

Answer 36

both accurate and precise ## Footnote Since these measurements are both very close to each other and very close to the actual value, they are both accurate and precise.

Answer 37

precision ## Footnote Reliability refers to whether repeating a study yields similar results to the original study, while precision refers to whether repeating a measurement yields similar results to the original measurement.

Answer 38

accuracy ## Footnote Validity refers to whether the results of a study were genuine and actually reflect what they were designed to measure. This is analogous to accuracy, which refers to whether a measurement actually reflects the true value of that meaurement.

Answer 39

* **Reliability and validity** refer to the methods and design of an entire study. * **Precision and accuracy** refer to a given set of measurements or data.

Answer 40

both valid and reliable ## Footnote This means that the study results are both reflective of the accurate/real-life results and that the results will be similar if the study is repeated over time. Both are desirable characteristics of a study.

Answer 41

valid but not reliable ## Footnote The description makes it clear that this study was valid, at least to a degree (note the implied reference to criterion validity!), but since its results were dissimilar upon repetition, it was unreliable.

Answer 42

neither valid nor reliable ## Footnote The first portion of this description indicates a lack of validity (specifically, both criterion and content validity), while the second indicates a lack of reliability.

Answer 43

reliable but not valid ## Footnote Since the methods are described extremely carefully, it is likely that when other researchers attempt the experiment, they will get similar results (constituting reliability). However, since those methods are highly flawed, the experiment is not valid.

Answer 44

scientific method ## Footnote You don't need to memorize the steps of this method for the MCAT, as it is meant to be more of a cohesive system of practice than a rigid step-by-step manual.

Answer 45

testable ## Footnote This is a critical fact about the scientific method! If a hypothesis is not testable, it effectively doesn't matter that there even *is* a hypothesis, because the subsequent steps of the method cannot be followed and the hypothesis cannot be validated or undermined.

Answer 46

Type I and Type II errors ## Footnote That's easy enough to remember, but you should also understand what these errors are. Type I errors are often termed "false positives," while type II errors are known as "false negatives."

Answer 47

type II error ## Footnote A type II error, also known as a false negative, occurs when a phenomenon or relationship actually is present, but testing fails to detect that phenomenon or relationship (here, cancer).

Answer 48

type I error ## Footnote A type I error, also known as a false positive, occurs when a phenomenon or relationship is absent, but testing produces an inaccurate result indicating that it is present.

Answer 49

True ## Footnote This is absolutely accurate! The occurrence of type II errors, or false negatives, mean that we mistakenly believe that a relationship or medical phenomenon (for example, cancer) is absent. This can have catastrophic results in any field.

Answer 50

neither ## Footnote Researchers and medical professionals should not strive to maximize *any* type of error! It's easy to think that type II errors (false positives) are less catastrophic than type I errors, at least in a medical context. However, these "false positives" typically create distress and require additional testing that can be expensive, painful, or inconvenient.

Answer 51

random and systematic errors ## Footnote These differ from Type I and Type II errors in that they refer to errors in the method of the experiment or its measurements, rather than to the relationship between the results and what exists in reality.

Answer 52

Random errors are fluctuations in a measurement, often due to the inherent lack of precision of the measurement apparatus. ## Footnote If measurements are taken many times, random errors will result in some errors that are below and some that are above the "actual" value.

Answer 53

These are *regular*, consistent errors in a measurement, often resulting from miscalibration or from other mistakes that are made for all trials. ## Footnote If measurements are taken many times, systematic errors will produce results that are always above or always below the "actual" value.

Answer 54

null hypothesis ## Footnote Rejection of the null hypothesis therefore means that a relationship is actually present.

Answer 55

within-subjects design ## Footnote In such a design, participants are exposed to both the experimental condition (or conditions) and the control condition. This reduces the impact of confounding variables, as it does not introduce the undesirable variations that come with comparing different individuals.

Answer 56

between-subjects design ## Footnote While such a design may be necessary in certain situations, it does carry the potential of more confounding variables than a within-subjects design.

Answer 57

Only the **subjects (participants)** are blinded. ## Footnote In other words, the subjects do not know whether they are in the experimental or the control group. However, the researchers *do* know which subjects are in which group.

Answer 58

Both the **subjects (participants)** and the **researchers** are blinded. ## Footnote In other words, neither of these groups is informed regarding which subjects are in the experimental vs. the control group.

Answer 59

outlier ## Footnote For instance, if a dataset contains values of 5.5, 5.7, 5.2, and 14, then 14 exemplifies an outlier. Outliers can be indicative of experimental error.

Answer 60

1. mean 2. median 3. mode ## Footnote According to the AAMC, these are also the measures of central tendency that you must understand for the MCAT.

Answer 61

median ## Footnote You may remember using median as early as elementary school, but it is still relevant on the MCAT, particularly with regard to Reasoning Skill 4 (Data-based and Statistical Reasoning).

Answer 62

mean ## Footnote "Mean" is another term for "average."

Answer 63

mode ## Footnote For instance, if ten children are sampled about their favorite number between 1 and 10, and six children say 7, two children say 3, and two children say 5, then 7 is the mode.

Answer 64

median ## Footnote Importantly, the median is far less susceptible to the influence of outliers than the mean. Analyses of salary data often include such outliers. For instance, the mean of a sample that included Jeff Bezos' salary would be skewed incredibly high, while the median would be much more representative of the sample overall.

Answer 65

False ## Footnote An easy way to tell whether this is false is to imagine a fake set of data, such as 3 mg, 5 mg, and 7 mg. Here, the mean and median of the dataset are both 5 mg. Of course, most datasets in research will be much larger, but the same principle holds true.

Answer 66

1. range 2. interquartile range 3. standard deviation ## Footnote According to the AAMC, these are also the measures of dispersion that you must understand for the MCAT.

Answer 67

It is the **numerical difference** between the highest and lowest values in the dataset. ## Footnote For instance, if the highest value was 95 ppm and the lowest was 17 ppm, the range would be (95 − 17) = 78 ppm.

Answer 68

quartiles ## Footnote For instance, a dataset that included every number from 1 to 100 would have quartiles of 1-25 (lowest), 26-50, 51-75, and 76-100 (highest).

Answer 69

interquartile range | (IQR) ## Footnote Essentially, the IQR describes the numerical spread covered by the middle two quartiles.

Answer 70

The interquartile range (IQR) is less sensitive to the presence of outliers. ## Footnote Since the IQR excludes the values in the highest and lowest quartiles, it would not be affected by, say, one extremely high or one extremely low value. In contrast, the range would be impacted.

Answer 71

standard deviation ## Footnote Note that you don't need to understand the ins and outs of the mathematical calculation of standard deviation for the MCAT, but you should know that it exists and how to interpret it in a figure/table.

Answer 72

True ## Footnote Standard deviation is a measure of the variation in a data set with respect to the mean. A high standard deviation means that values are spread out (they vary) rather than being clustered near the mean.

Answer 73

statistical significance ## Footnote A statistically significant result is one that is extremely unlikely to have arisen if the null hypothesis were true. As such, statistical significance is tied to the rejection of the null hypothesis.

Answer 74

alpha level ## Footnote This threshold is termed the alpha level. In most studies, the alpha level is set at 0.05, meaning that *p*-values \< 0.05 indicate statistical significance.

Answer 75

no ## Footnote While we were not given a threshold for statistical significance (or alpha level), the typical alpha level is 0.05 (or occasionally 0.10). 0.49, therefore, is much too high to indicate significance. Be sure to read numbers carefully; otherwise, it's easy to confuse 0.05 with 0.5!

Answer 76

False ## Footnote Just because no statistically significant relationship was found does *not* mean that the experimental group showed no difference at all from the control group! The virally-infected cells may have displayed higher osteoblast activation, just not to a significant extent.

Answer 77

It is a range of values within which a parameter has a certain probability of falling. ## Footnote This sounds confusing, so let's give an example. If the 95% confidence interval for a study of IQ is 88 to 112, then there is a 95% probability that a given subject's IQ will fall within that range.

Answer 78

Hawthorne effect ## Footnote The Hawthorne effect, also termed subject reactivity, is any change in behavior resulting from the attention study participants perceive they are getting from researchers, rather than resulting from differences in the independent variable(s).

Experimental Design Flashcards

Experimental design is perhaps the single most important topic in MCAT P/S. Use this deck to master types of variables, reliability and validity, relevant statistical concepts, and more. (102 cards)