Week 9: Descriptive and Comparative Statistics Flashcards

Question

Number Variables: Measures of Variability (Spread, dispersion)

Answer 1

-The range for a variable is the difference between the minimum (lowest) and the maximum (highest) values in the data set -Quartiles mark the three values that divide a data set into four equal parts -The interquartile range (IQR) captures the middle 50% of values for a numeric variable

Answer 2

A simple visual depiction of and intuitive way to explore the data

Answer 3

standard error of the mean ADJUSTS FOR THE NUMBER OF OBSERVATIONS

Answer 4

-Provide information about the expected value of a measure in a source population based on the value of that measure in a study population -A larger sample size will yield a narrower confidence interval -A 95% confidence interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population -Example: mean systolic blood pressure of a sample is 120 mmHg; 95%CI: 110-130 -We are 95% confident that the real average is between 110-130; 5% chance that the true value of mean is either larger than 130 or smaller than 110

Answer 5

COMPARING main factors between exposed and unexposed in cohort studies -Average age of exposed=Average age of unexposed -% male in exposed=% male in unexposed -Testing if randomization was effective in experimental studies -Comparing the outcome status -We can NOT just look at the calculated values (these are estimates from samples, subject to random sampling error)

Answer 6

ØTechniques that use statistics from a random sample of a population to make evidence-based assumptions (inference) about the values of parameters in the population as a whole ØDecision about parameters via information obtained from a sample is via hypothesis testing

Answer 7

-Techniques that use STATISTICS from a random sample of a population to make evidence-based assumptions (INFERENCE) about the values of PARAMETERS in the population as a whole -Decision about parameters via information obtained from a sample is via hypothesis testing

Answer 8

Aim: To test an explicit statement or a ‘hypothesis’ about a population parameter The null hypothesis (H0): there is no difference between the two or more values being compared The alternative hypothesis (Ha): there is a difference between the two or more populations being compared

Answer 9

1.Take a random sample from the population of interest 2. Set up two competing hypotheses (based on research questions) Null and Alternative 3.Use sample statistics (mean, frequency) to decide whether to support or reject the null By calculation of a TEST STATISTICS 4. DET IF the null hyp is really true, what the observed sample statistics will be

Answer 10

1. Take a random sample from the population of interest

Answer 11

Set up two competing hypotheses (based on research questions) Null Hypthesis (H0); no effect, no difference between sample and the original population Alternative Hypothesis (H1 or Ha), there is an effect (a difference)

Answer 12

3. Use sample statistics (mean, frequency) to decide whether to support or reject the null By calculation of a test statistics Note: Tests are developed (specific formula) for different types of data and research questions (Figures 30-12 to 30-15 of the textbook)

Answer 13

4. Determine if the null hypothesis is really true, what the observed sample statistics will be How? Idea of (Probability) p. Value ØIntroduced by Fisher to determine whether the observed sample supports the null ØBetween 0.1 and 0.9: no reason to suspect null is false Ø<0.02 sufficiently strong evidence to conclude null does not reflect the state of nature, unlikely to be true Ø“The value for which P=0.05, or 1 in 20; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not." Ø0.05 the convention commonly used in health research * P.value measures how strongly the sample data agrees with the null ØIs calculated from observed data based on a pertinent test statistic ØThe probability that the observed sample will produce a value of the test statistics as or more extreme than the observed test statistic in a universe in which we know that null in true ØIf 0.01 it means if in the real-world null is true (no difference) there is only 1% chance that the data produce results on a difference ØSmall chance, we can safely reject the null ØThe significance level (α) is the p value at which the null hypothesis is rejected, usually 0.05 in health research