Midterm Flashcards

Question

Define regression to the mean

Answer 1

RTM refers to the simple fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. RTM is thus a useful concept to consider when designing any scientific experiment, data analysis, or test, which intentionally selects the "most extreme" events - it indicates that follow-up checks may be useful in order to avoid jumping to false conclusions about these events; they may be "genuine" extreme events, a completely meaningless selection due to statistical noise, or a mix of the two cases.

Answer 2

1. Formulate a specific question in advance. 2. Have a control group (internal). 3. Randomized (block) design (internal). 4. Replication (external).

Answer 3

A single group studied only once, often pre-experimental. Lacks a control group and has virtually no internal validity.

Answer 4

Pretest,, treatment, posttest. No selection or mortality issues but does not control for history, maturation, testing, instrumentation or RTM.

Answer 5

One control group, one treatment group with uneven selection. Not randomized.

Answer 6

Randomized control and treatment groups with pre- and posttests. Does not control for interaction of testing.

Answer 7

Randomized control and treatment groups with and without pre- and with posttest.

Answer 8

Randomly assign subjects to control and treatment groups. Controls for internal validity issues and effect of testing.

Answer 9

Many observations over time.

Answer 10

Studying the effects of two or more factors and their interactions simultaneously.

Answer 11

Two factors (A and B), B is nested within A.

Answer 12

Fits 68.3% of the sample distribution

Answer 13

Fits 95.4% of the sample distribution

Answer 14

If we repeatedly take independent random samples of size n from any population, then when n is large (>30), the distribution of the sample means will approach a normal distribution even if it is not normally distributed. Allows us to make probability statements about the possible range of values around the sample mean, for data that follow the normal distribution.

Answer 15

Categorical variable, its value cannot be ranked. | ex: sex (male, female)

Answer 16

Qualitative variable, its values can be ranked. | ex: aggression (weak, moderate, strong)

Answer 17

The values of the variable can be ranked, and the differences of the values show the distances between the values. This scale does not have a true zero point. ex: temperature

Answer 18

The differences of the values show the distances between the values and also the ratio of values is defined, as the variable has a true zero point. ex: height

Answer 19

A binomial test uses sample data to determine if the population proportion of one level in a binary (or dichotomous) variable equals a specific claimed value. - Observed vs expected - Has 2 possible outcomes - Requires expected probability, number of 'successful' trials, total trials

Answer 20

Statistical test used to determine if there are nonrandom associations between two categorical variables. Used in the analysis of contingency tables.

Answer 21

Mean, median and mode

Answer 22

Interquartile range, range (diff between min & max), variance, standard deviation, standard error of the mean, confidence intervals

Answer 23

Statistical method that deduces from a small but representative sample the characteristics of a bigger population.

Answer 24

A range of values so defined that there is a specified probability that the value of a parameter lies within it. Measured as a percentage and expressed as a range of values.

Answer 25

A resampling technique used to estimate statistics on a population by sampling a dataset with replacement. It can be used to estimate summary statistics such as the mean or standard deviation.

Answer 26

Variation around the sample mean.

Answer 27

How accurately your data estimates the population mean.

Answer 28

Sampling error around population mean. Decreases with sample size. Provides information about the magnitude of an effect.

Answer 29

Alpha, a false positive rejecting a true null hypothesis. Increases with multiple comparisons.

Answer 30

Beta, a false negative, failing to reject a false null hypothesis.

Answer 31

Measures the strength of evidence against the null hypothesis. Does not measure size of an effect.

Answer 32

Statistical test that is used to compare the means of two groups. It is often used in hypothesis testing to determine whether a process or treatment actually has an effect on the population of interest, or whether two groups are different from one another.

Answer 33

The null hypothesis compares sample mean to theoretical mean. Assumes normal distribution, random sample, independent observations. Calculates p-value using t-statistic. Cons: Single group studied only once, often pre-experimental, lack of true control, no internal validity

Answer 34

Calculated from the difference between the sample mean and the theoretical population mean. divided by the standard error of the mean.

Answer 35

Maximum number of independent values. Sample size minus 1.

Answer 36

Null hypothesis compares sample means. Assumes normal distribution, random sample, independent observations, and equal variance among populations. Calculates p-value with t-statistic.

Answer 37

The null hypothesis compares sample means. Each datum from one group is a priori matched in some way with one datum from the other group. Assumes normal distribution, random sampling, dependent observations (ex: pairs), and equal variance among both populations. Calculates p-value with t-statistic.

Answer 38

The probability of rejecting a false null hypothesis (1-beta)

Answer 39

Represents the strength of the relationship between population variables. It indicates the practical significance of a research outcome.

Answer 40

The effect size based on difference between means relative to their deviations.

Answer 41

1. Pooled standard deviation 2. Z-score of a normal distribution at 1-alpha and 1-beta 3. Cohen's d

Answer 42

1. A priori | 2. Sensitivity

Answer 43

Calculation to determine sample size needed to achieve given power.

Answer 44

Calculation to determine the effect size given the sample size. Can be paired or unpaired.

Answer 45

A statistical method that separates observed variance data into different components to use for additional tests. A one-way ANOVA is used for three or more groups of data, to gain information about the relationship between the dependent and independent variables.

Answer 46

For dependent variables they are continuous and normal. For independent they are discrete.

Answer 47

- Normal distribution - Equal variance among groups - Independent observations - Random sampling

Answer 48

H0: ȳ1 = ȳ2 = ȳ3 Ha: at least two of the means significantly differ

Answer 49

By use of the F ratio

Answer 50

The F statistic is a ratio of two different measures of variance for the data. If the null hypothesis is true (i.e., the population means of the groups are identical) then these are both estimates of the overall population variance and the ratio will be around 1.

Answer 51

- Normal distribution - Sphericity: equal variance among differences between treatment levels - Dependent observations - Random sampling

Answer 52

Multiple comparisons increase the chances of a type I error

Answer 53

Designed to take into account the multiple comparison problem and provides you with significance values re-defined appropriately based on the specifics of your study

Answer 54

- a priori or pairwise, alpha inflation | - Avoid unless you’re only doing one, a priori determined comparison

Answer 55

- Pairwise post hoc, alpha inflation, not available in R cmdr - Only conduct if the ANOVA p-value is significant

Answer 56

- Pairwise post hoc, beta inflation | - Available in R commander; most common

Answer 57

- a priori, control vs. treatments only, not available in R cmdr

Answer 58

Test of Homogeneity of Variances is a test to identify whether there are equal variances of a continuous or interval-level dependent variable across two or more groups of a categorical, independent variable. It tests the null hypothesis of no difference in variances between the groups.

Answer 59

Used to test if samples have equal variances. Equal variances across samples is called homogeneity of variance. Some statistical tests, for example ANOVA, assume that variances are equal across groups or samples. The Levene test can be used to verify that assumption.

Answer 60

Welch's correction for T-test or ANOVA tests for unequal variance among groups. The significance level equals the probability of rejecting a null hypothesis that is true (Type I error).

Answer 61

- Conservative results for large sample sizes | - Inflated results for small sample sizes

Answer 62

Tests whether or not the assumption of sphericity is met in a repeated measures ANOVA. Sphericity refers to the condition where the variances of the differences between all combinations of related groups are equal. 𝜀 : Severity of departure from sphericity

Answer 63

Used to assess the change in a continuous outcome with three or more observations across time or within-subjects. In most cases, the assumption of sphericity is violated for this type of within-subjects analysis and the Greenhouse-Geisser correction is robust to the violation.

Answer 64

A correction for violations of sphericity.

Answer 65

Generally, the recommendation is to use the Greenhouse-Geisser correction, especially if estimated epsilon (ε) is less than 0.75. However, some statisticians recommend using the Huynd-Feldt correction if estimated epsilon (ε) is greater than 0.75.

Answer 66

Equal variance among differences between treatment levels

Answer 67

There are two degrees of freedom for the ANOVA because the test statistic, the F ratio, involves separate calculations in the numerator and denominator of the ratio, each with its own degrees of freedom. For the independent ANOVA, the degrees of freedom are df-between groups and df-within groups. If we had conducted a rmANOVA, the degrees of freedom would refer to the df-between groups and the df-error term.

Answer 68

These tests produce different results because the F-ratio is calculated differently. Remember that our estimate of variance within groups lies in the denominator of the F-ratio. With a repeated measures ANOVA we use the sum of square error term, which obtained by removing the variance within subjects (sum of squares subjects) from the variance within groups (sum of squares within).

Answer 69

The null hypothesis is that there will be no effects of regular administration of metformin on levels of glycosylated hemoglobin in obese marmoset monkeys.

Answer 70

Assuming there is no control group based on the information that is stated, this is a one group pre-post test design. There is a pre-test in obese marmoset monkeys, and then the administration of metformin, and finally a post-test 3 months after metformin administration. This means there are not selection or mortality issues, however this study design does not control for history, maturation, testing, instrumentation or regression to the mean which are threats to internal validity.

Answer 71

This requires a paired samples t-test because it's a comparison between 1 group of individuals that was evaluated twice, so the comparison between the HbA1C differences are in the same subject.

Midterm Flashcards

(95 cards)