Week 9 Descriptive and Comparative Statistics Flashcards

Question

Shape of the Histogram Normal Distribution

Answer 1

Positively skewed looks like a P on his back.

Answer 2

-The range for a variable is the difference between the minimum (lowest) and the maximum (highest) values in the data set -Quartiles mark the three values that divide a data set into four equal parts The interquartile range (IQR) captures the middle 50% of values for a numeric variable

Answer 3

A simple visual depiction of and intuitive way to explore the data

Answer 4

-The extent of deviation from the average value of that variable in the data set -Calculated by adding together the squares of the differences between each observation and the sample mean (µ) and then dividing by the total number of observations -The standard deviation (σ) is the square root of the variance -The standard error of the mean adjusts for the number of observations in the data set by dividing the variance by the total number of observations and then taking the square root of that number

Answer 5

-About 68% of area (population) within μ±1σ; -95% of area within μ±2σ; -99.7% of area within μ±3σ If μ=20 and σ=5, then 68% of subjects are measured between 15 (20-5) and 25 (20+5) The probability of observing a value between 15 and 25=0.68 between 10 and 30=0.95 between 5 and 35 =0.997

Answer 6

-Provide information about the expected value of a measure in a source population based on the value of that measure in a study population --A larger sample size will yield a narrower confidence interval -A 95% confidence interval is usually reported for statistical estimates, which means that 5% of the time the confidence interval is expected to miss capturing the true value of a measure in the source population --Example: mean systolic blood pressure of a sample is 120 mmHg; 95%CI: 110-130 -We are 95% confident that the real average is between 110-130; 5% chance that the true value of mean is either larger than 130 or smaller than 110

Answer 7

Comparing main factors between exposed and unexposed in cohort studies Average age of exposed=Average age of unexposed % male in exposed=% male in unexposed Testing if randomization was effective in experimental studies Comparing the outcome status We can NOT just look at the calculated values (these are estimates from samples, subject to random sampling error)

Answer 8

Techniques that use statistics from a random sample of a population to make evidence-based assumptions (inference) about the values of parameters in the population as a whole Decision about parameters via information obtained from a sample is via hypothesis testing

Answer 9

1. Take a random sample from the population of interest 2. Set up two competing hypotheses (based on research questions) Null Hypthesis (H0); no effect, no difference between sample and the original population Alternative Hypothesis (H1 or Ha), there is an effect (a difference) 3. Use sample statistics (mean, frequency) to decide whether to support or reject the null By calculation of a test statistics Note: Tests are developed (specific formula) for different types of data and research questions (Figures 30-12 to 30-15 of the textbook) 4. Determine if the null hypothesis is really true, what the observed sample statistics will be How?

Answer 10

Introduced by Fisher to determine whether the observed sample supports the null Between 0.1 and 0.9: no reason to suspect null is false <0.02 sufficiently strong evidence to conclude null does not reflect the state of nature, unlikely to be true “The value for which P=0.05, or 1 in 20; it is convenient to take this point as a limit in judging whether a deviation is to be considered significant or not." 0.05 the convention commonly used in health research P.value measures how strongly the sample data agrees with the null

Answer 11

Is calculated from observed data based on a pertinent test statistic The probability that the observed sample will produce a value of the test statistics as or more extreme than the observed test statistic in a universe in which we know that null in true If 0.01 it means if in the real-world null is true (no difference) there is only 1% chance that the data produce results on a difference Small chance, we can safely reject the null The significance level (α) is the p value at which the null hypothesis is rejected, usually 0.05 in health research

Answer 12

assumes the variables being examined have particular distributions Inferential methods are based on types of distributions (mostly normal)

Answer 13

does not make assumptions about the distributions of responses Nonparametric tests are used for ranked variables and when the distribution of a ratio or interval variable is non-normal

Week 9 Descriptive and Comparative Statistics Flashcards

(38 cards)