Midterm Flashcards
What type of variable is temperature (degrees F)?
Interval
What type of variable is femur length (cm)?
Ratio
What type of variable is metastases occurrence (yes or no)?
Nominal
What type of variable is pain intensity (1-10)
Ordinal
Which measure best describes the variation within your sample?
Standard deviation
Which of the following is (are) true of a normal distribution?
- It has a central tendency
- About 68% of its variation is found within one standard deviation from the mean
You have twenty mice in your lab: 10 male (8 with white fur, 2 with brown fur) and 10 female mice (3 with white fur, 7 with brown fur). You suspect fur color might be an X-linked trait (i.e. a relationship between sex and fur color).
State the null hypothesis.
The null hypothesis is that fur color is not an X-linked trait (i.e. there is no relationship between sex and fur color).
I believe amphetamine will increase anxiety in a certain transgenic mouse model. I measure anxiety as the time (in sec) it takes the mouse to move into the central portion of an open field arena with lower movement times interpreted as less anxiety. I measure movement time in one group of mice following amphetamine injection and a second group of mice following injection of a control substance.
What is the null hypothesis?
The null hypothesis is that amphetamine will not increase anxiety in a certain transgenic mouse model meaning that the mouse will move into the central portion with lower movement times.
I test the null hypothesis that a ginseng pre-treatment does not affect alcohol-induced loss of balance in comparison to a control group. I conclude that ginseng has no effect, when it actually does.
Identify which type of error (Type I or Type II) or non-error being made here.
Type II, false negative
I test the null hypothesis that there is no association between eye color and genetic sex (XX, XY, etc.). I conclude that there is no relationship between eye color and sex when in fact there is not.
Identify which type of error (Type I or Type II) or non-error being made here.
No error, fail to reject true null hypothesis
During a drug screening for professional athletes, the null hypothesis is that there is no drug use. A drug test detects the presence cannaboids in the urine and the athlete admits to their use of marijuana.
Identify which type of error (Type I or Type II) or non-error being made here.
No error, reject false null hypothesis
As a juror, my null hypothesis is that the defendant is innocent until proven guilty. I conclude that the individual on trial is guilty when in fact she is not.
Identify which type of error (Type I or Type II) or non-error being made here.
Type I, false positive
When is it appropriate use a one-tailed test vs. a two-tailed test?
If you need directionality in one direction you apply a one-tailed test (i.e. positive direction), if you need directionality in both directions you apply a two-tailed test (i.e. positive and negative direction). Additionally, a two-tail test is appropriate when the null is stated as equal to a value while alternative states the test is not equal to the value.
In relation to statistical hypothesis testing, define alpha (α).
Alpha in relation to statistical hypothesis testing represents a false positive. Furthermore alpha is the level of significance, 1 - the confidence level, and the probability of a type I error.
In relation to statistical hypothesis testing, define beta (β).
Beta in relation to hypothesis testing is a false negative and the probability of a type II error. Furthermore 1-beta represents the power of a test.
In relation to statistical hypothesis testing, what is a Bonferroni correction? How is it related to alpha and beta?
Bonferroni correction is used when there is a multiple statistical tests being run simultaneously. A Bonferroni correction will decrease the alpha and increase the beta.
Define probability
Probability is a mathematical tool used to study randomness. It deals with the chance (the likelihood) of an event occurring.
Define descriptive studies
Descriptive statistics are used to describe or summarize the characteristics of a sample or data set, such as a variable’s mean, standard deviation, or frequency. Also known as “what is” or correlations.
Define experimental studies
Experimental studies are ones where researchers introduce an intervention and study the effects. Experimental studies are usually randomized, meaning the subjects are grouped by chance. Also known as “what if” or causation. Does the treatment affect the observation?
Define correlation vs causation
A correlation is a statistical indicator of the relationship between variables. Causation means that changes in one variable brings about changes in the other; there is a cause-and-effect relationship between variables.
Define experimental validity
Experimental validity refers to the manner in which variables that influence both the results of the research and the generalizability to the population at large.
Define internal validity
Internal validity is defined as the extent to which the observed results represent the truth in the population we are studying and, thus, are not due to methodological errors. Is the effect actually due to the manipulation?
Define external validity
External validity is the extent to which you can generalize the findings of a study to other situations, people, settings and measures. Is the effect unique to your experiment or generalizable?
List the threats to internal validity
- History (unrelated event occurring between 2 measures ex: power outage in the animal’s enclosure, stresses them out)
- Maturation (processes within subjects which act as a function of the passage of time ex: First measurement occurs just after birth, second taken months later)
- Testing (effects measuring study outcomes in participants ex: Post-test scores improves due to exposure to pre-test, not treatment)
- Instrumentation (changes in the instrument, observers, or scorers which may produce changes in outcomes ex: PI measures respiration rate immediately after putting a mouse in the chamber, research assistant waits 5 minutes then records respiration rate)
- Statistical regression (Regression to the mean
ex: Measuring pain levels before and after treatment) - Selection of subjects (the biases which may result in selection of comparison groups ex: Control group consists of older males, treatment is younger females)
- Experimental mortality (the loss of subjects ex: Human subjects leave study, mice die)
Define regression to the mean
RTM refers to the simple fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. RTM is thus a useful concept to consider when designing any scientific experiment, data analysis, or test, which intentionally selects the “most extreme” events - it indicates that follow-up checks may be useful in order to avoid jumping to false conclusions about these events; they may be “genuine” extreme events, a completely meaningless selection due to statistical noise, or a mix of the two cases.
How do you achieve experimental validity?
- Formulate a specific question in advance.
- Have a control group (internal).
- Randomized (block) design (internal).
- Replication (external).
One-Shot Case Study Design
A single group studied only once, often pre-experimental. Lacks a control group and has virtually no internal validity.
One Group Pre-Posttest Design
Pretest,, treatment, posttest. No selection or mortality issues but does not control for history, maturation, testing, instrumentation or RTM.
Static Group Comparison Design
One control group, one treatment group with uneven selection. Not randomized.
Pre-Test Post-Test Control Group Design
Randomized control and treatment groups with pre- and posttests. Does not control for interaction of testing.
Four Group Design
Randomized control and treatment groups with and without pre- and with posttest.
Post-Test Control Group Design
Randomly assign subjects to control and treatment groups. Controls for internal validity issues and effect of testing.
Time-Series Design
Many observations over time.
Factorial Design
Studying the effects of two or more factors and their interactions simultaneously.
Nested Designs
Two factors (A and B), B is nested within A.
1 Standard Deviation
Fits 68.3% of the sample distribution
2 Standard Deviations
Fits 95.4% of the sample distribution
Central Limit Theorem
If we repeatedly take independent random samples of size n from any population,
then when n is large (>30), the distribution of the sample means will approach a normal distribution even if it is not normally distributed. Allows us to make probability statements about the possible range of values around the sample mean, for data that follow the normal distribution.
Define Nominal Variable
Categorical variable, its value cannot be ranked.
ex: sex (male, female)
Define Ordinal Variable
Qualitative variable, its values can be ranked.
ex: aggression (weak, moderate, strong)
Define Interval Variable
The values of the variable can be ranked, and the differences of the values show the distances between the values. This scale does not have a true zero point.
ex: temperature
Define Ratio Variable
The differences of the values show the distances between the values and also the ratio of values is defined, as the variable has a true zero point.
ex: height
Define Binomial Test
A binomial test uses sample data to determine if the population proportion of one level in a binary (or dichotomous) variable equals a specific claimed value.
- Observed vs expected
- Has 2 possible outcomes
- Requires expected probability, number of ‘successful’ trials, total trials
Define Fisher’s Exact Test
Statistical test used to determine if there are nonrandom associations between two categorical variables. Used in the analysis of contingency tables.
Measures of central tendency
Mean, median and mode
Measures of variation
Interquartile range, range (diff between min & max), variance, standard deviation, standard error of the mean, confidence intervals
Define Inferential Statistics
Statistical method that deduces from a small but representative sample the characteristics of a bigger population.
Define Confidence Interval
A range of values so defined that there is a specified probability that the value of a parameter lies within it. Measured as a percentage and expressed as a range of values.
Define Bootstrapping
A resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation.
Define standard deviation
Variation around the sample mean.
Define standard error of the mean
How accurately your data estimates the population mean.
95% Confidence Interval
Sampling error around population mean. Decreases with sample size. Provides information about the magnitude of an effect.
Type I Error
Alpha, a false positive rejecting a true null hypothesis. Increases with multiple comparisons.
Type II Error
Beta, a false negative, failing to reject a false null hypothesis.
P-value
Measures the strength of evidence against the null hypothesis. Does not measure size of an effect.
Define T-Test
Statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.
Define One Sample T-Test
The null hypothesis compares sample mean to theoretical mean. Assumes normal distribution, random sample, independent observations. Calculates p-value using t-statistic.
Cons: Single group studied only once, often pre-experimental, lack of true control, no internal validity
Define T-Value
Calculated from the difference between the sample mean and the theoretical population mean. divided by the standard error of the mean.
Define Degrees of Freedom
Maximum number of independent values. Sample size minus 1.
Define Independent T-Test
Null hypothesis compares sample means. Assumes normal distribution, random sample, independent observations, and equal variance among populations. Calculates p-value with t-statistic.
Define Dependent (Paired) T-Test
The null hypothesis compares sample means. Each datum from one group is a priori matched in some way with one datum from the other group. Assumes normal distribution, random sampling, dependent observations (ex: pairs), and equal variance among both populations. Calculates p-value with t-statistic.
Define Power
The probability of rejecting a false null hypothesis (1-beta)
Define Effect Size
Represents the strength of the relationship between population variables. It indicates the practical significance of a research outcome.
Define Cohen’s d
The effect size based on difference between means relative to their deviations.
What is sample size determined by?
- Pooled standard deviation
- Z-score of a normal distribution at 1-alpha and 1-beta
- Cohen’s d
The 2 types of power analysis
- A priori
2. Sensitivity
Define A Priori Power
Calculation to determine sample size needed to achieve given power.
Define Sensitivity Power
Calculation to determine the effect size given the sample size. Can be paired or unpaired.
Define Fisher’s Analysis of Variance
A statistical method that separates observed variance data into different components to use for additional tests. A one-way ANOVA is used for three or more groups of data, to gain information about the relationship between the dependent and independent variables.
Types of variables for ANOVA
For dependent variables they are continuous and normal. For independent they are discrete.
ANOVA Assumptions
- Normal distribution
- Equal variance among groups
- Independent observations
- Random sampling
ANOVA Hypotheses
H0: ȳ1 = ȳ2 = ȳ3
Ha: at least two of the means significantly differ
How does ANOVA compute the p value?
By use of the F ratio
What is the F ratio in ANOVA?
The F statistic is a ratio of two different measures of variance for the data. If the null hypothesis is true (i.e., the population means of the groups are identical) then these are both estimates of the overall population variance and the ratio will be around 1.
Repeated Measures ANOVA Assumptions
- Normal distribution
- Sphericity: equal variance among differences between treatment levels
- Dependent observations
- Random sampling
Why do we care about multiple comparisons?
Multiple comparisons increase the chances of a type I error
Multiple comparisons among treatment means…
Designed to take into account the multiple comparison problem and provides you with significance values re-defined appropriately based on the specifics of your study
Students T-test multiple comparisons
- a priori or pairwise, alpha inflation
- Avoid unless you’re only doing one, a priori determined comparison
Fisher’s LSD multiple comparisons
- Pairwise post hoc, alpha inflation, not available in R cmdr
- Only conduct if the ANOVA p-value is significant
Tukey HSD multiple comparisons
- Pairwise post hoc, beta inflation
- Available in R commander; most common
Dunnett method multiple comparisons
- a priori, control vs. treatments only, not available in R cmdr
Define Bartlett’s Test
Test of Homogeneity of Variances is a test to identify whether there are equal variances of a continuous or interval-level dependent variable across two or more groups of a categorical, independent variable. It tests the null hypothesis of no difference in variances between the groups.
Define Levene’s Test
Used to test if samples have equal variances. Equal variances across samples is called homogeneity of variance. Some statistical tests, for example ANOVA, assume that variances are equal across groups or samples. The Levene test can be used to verify that assumption.
Define Welch’s Correction
Welch’s correction for T-test or ANOVA tests for unequal variance among groups. The significance level equals the probability of rejecting a null hypothesis that is true (Type I error).
Why don’t we always use Welch’s correction?
- Conservativeresults for largesample sizes
- Inflatedresults for small sample sizes
Define Mauchly’s Test for Sphericity
Tests whether or not the assumption of sphericity is met in a repeated measures ANOVA. Sphericity refers to the condition where the variances of the differences between all combinations of related groups are equal.
𝜀 : Severity of departure from sphericity
Define Greenhouse-Geisser
Used to assess the change in a continuous outcome with three or more observations across time or within-subjects. In most cases, the assumption of sphericity is violated for this type of within-subjects analysis and the Greenhouse-Geisser correction is robust to the violation.
Define Huynh-Feldt
A correction for violations of sphericity.
Do I use Greenhouse-Geisser or Huynh Feldt?
Generally, the recommendation is to use the Greenhouse-Geisser correction, especially if estimated epsilon (ε) is less than 0.75. However, some statisticians recommend using the Huynd-Feldt correction if estimated epsilon (ε) is greater than 0.75.
Define Sphericity
Equal variance among differences between treatment levels
In a data set with three different groups of 5 patients each and each group receives a different treatment, why are there two degrees of freedom for this test, what do they refer to?
There are two degrees of freedom for the ANOVA because the test statistic, the F ratio, involves separate calculations in the numerator and denominator of the ratio, each with its own degrees of freedom. For the independent ANOVA, the degrees of freedom are df-between groups and df-within groups. If we had conducted a rmANOVA, the degrees of freedom would refer to the df-between groups and the df-error term.
Explain briefly why one way ANOVAs
and rmANOVAs results can produce different results despite having the same dataset.
These tests produce different results because the F-ratio is calculated differently. Remember that our estimate of variance within groups lies in the denominator of the F-ratio. With a repeated measures ANOVA we use the sum of square error term, which obtained by removing the variance within subjects (sum of squares subjects) from the variance within groups (sum of squares within).
I am interested in investigating the effects of regular administration of metformin on levels of glycosylated hemoglobin (HbA1C) in obese marmoset monkeys. I measure HbA1C in 10 obese marmosets before exposure to metformin and again after 3 months of regular metformin administration.
What is the null hypothesis?
The null hypothesis is that there will be no effects of regular administration of metformin on levels of glycosylated hemoglobin in obese marmoset monkeys.
I am interested in investigating the effects of regular administration of metformin on levels of glycosylated hemoglobin (HbA1C) in obese marmoset monkeys. I measure HbA1C in 10 obese marmosets before exposure to metformin and again after 3 months of regular metformin administration.
What threats to internal validity might we be concerned with?
Assuming there is no control group based on the information that is stated, this is a one group pre-post test design. There is a pre-test in obese marmoset monkeys, and then the administration of metformin, and finally a post-test 3 months after metformin administration. This means there are not selection or mortality issues, however this study design does not control for history, maturation, testing, instrumentation or regression to the mean which are threats to internal validity.
I am interested in investigating the effects of regular administration of metformin on levels of glycosylated hemoglobin (HbA1C) in obese marmoset monkeys. I measure HbA1C in 10 obese marmosets before exposure to metformin and again after 3 months of regular metformin administration.
What statistical test is appropriate for the experiment outlined and why?
This requires a paired samples t-test because it’s a comparison between 1 group of individuals that was evaluated twice, so the comparison between the HbA1C differences are in the same subject.