Applied Statistics Flashcards

Question

What is the correlation coefficient

Answer 1

Numerical value depicting the correlation between two continuous variables: Expresses both the magnitude (0- 1) and the direction of the correlation (positive or negative)

Answer 2

Coefficient of determination is the square of the correlation coefficient. If r = 0.7 then coefficient of determination = 0.49. 0.49 means that 49% of the variation can be explained by the two variables and 51% is due to other factors.

Answer 3

1. Causative occurrence must precede the effect 2. If cause occurs then effect should occur 3. If cause does not occur then effect should not occur Correlation does not imply causality

Answer 4

1. Correlation does not imply causation | 2. Lack of correlation does not mean that the variables are not correlated in a non-linear way

Answer 5

Ordinal data is categorical data with a set order it. The interval between categorical data is not known Continuous data is not categorical and exists on an increasing or decreasing scale with known interval

Answer 6

Pearson's (r) correlation coefficient - Plots two continuous variables Spearman's (rho) correlation coefficient - Plots two ordinal variables OR 1 continous and 1 ranked variable Kendall's correlation coefficient - Plots two categorical variables

Answer 7

The r (the correlation coeeficient)

Answer 8

r^2 This is the coefficient of determination. A coefficient of determination of 0.49 means that 49% of the variation can be explained by the relationship between the variables and therefore 51% explained by other factors

Answer 9

These plots quantify the agreement between 2 readings.

Answer 10

1. comparing the means (and finding no significant difference) 2. The correlation coefficient is measure of association, not agreement Altman-Bland Plots can be used to measure agreement between two readings

Answer 11

Comparing a measurement by a new device/monitor against the gold standard

Answer 12

Y-axis --> Difference between methods (A - B) X-axis --> Average of methods x axis [(A+B)/2] Interpretation: 1. The mean of the difference A - B is the relative bias. 2. The SD is the estimate of the error

Answer 13

CATEGORICAL - Ordinal (ordered) - Nominal (non-ordered) CONTINUOUS (familiar constant and computable differences between variables) - Interval scale - ratio scale

Answer 14

The number of times (N) or proportion (%) of times a variable (data item) has been observed to occur.

Answer 15

Mean: the average Median: The middle value Mode: The most common value

Answer 16

Dispersion tests are tests for the normality of the data distribution. tests of skewness and kurtosis

Answer 17

By Box and Whisker plots: 1. Range: minimum to maximum value within the fence 2. Interquartile range: 25th to 75 percentile 3. Quartiles: Four equal groups of 25 % Fences are calculated as follows Lower fence is Q1 - 1.5 x IQR Upper fence is Q3 + 1.5 x IQR

Answer 18

Mean (mu population and x-bar sample) Standard deviation

Answer 19

X - mean S - Standard Deviation 1. Calculate X 2. Subtract the mean from each data point (x - X) 3. Square the result to make all differences positive (x - X)^2 4. Sum all the differences SUM [(x-X)^2] 5. Divide the result by n = 1 --> gives you the variance 6. Take the square root of the variance and you get the standard deviation

Answer 20

Allows you to determine the distribution in relationship to the mean. 1SD - 68% of people fall within 1 SD 1.96SD - 95% of people fall within 1.96 SD 2SD - 95.4% of people fall within 2 SD 3SD - 99.7% of people fall within 3SD

Answer 21

If the study was repeated, you would get different patients and hence different results. You can estimate the error in your sample by calculating the standard error of the mean. SE = SD/ √n

Answer 22

This is the range of values within which the true population mean is likely to lie 95% CI = X + 1.96 (SE) to X - 1.96 (SE) Where SE = SD/ √n Thus the SE becomes smaller with increasing sample size. As SE gets larger with increasing sample size, the 95%CI gets smaller indicating greater certainty in the precision of the result. i.e. the larger the sample size --> the smaller the SE --> the more precise the result --> the narrower the distribution.

Answer 23

The standard deviation (and reference range) describes the amount of variability between individuals within a single sample The standard error of the mean (and confidence interval) measure the precision with which a population value (e.g. mean (mu)) is estimated by a single sample.

Answer 24

It is the number of standard deviations that a value (x) is above or below the mean Z = (observed value - population mean)/(population SD) Z = (x - mu) / (sigma)

Answer 25

There is no difference between the groups. H0: X = mu We assume that the groups that are being compared are being drawn from the same population, and hence the population parameters mew and sigma are known

Answer 26

There is a difference between the groups H1: X does not equal mu An alternative hypothesis states that there is a difference between the groups.

Answer 27

The p-value is the probability of the observed result arising by chance (If H0 is true). The p-value is the chance of getting the reported study result when the null hypothesis is actually true. The smaller the p-value, the stronger the finding.

Answer 28

This means that there is a 1 in 20 chance that the study result occurred by chance despite the null hypothesis actually being true. If this is the case the null hypothesis is accepted and the alternative hypothesis (H1) rejected. A p value of less than 5% is statistically significant meaning that there is less than 5% chance that the study result occurred by chance if the null hypothesis is actually true.

Answer 29

- A false positive - The null hypothesis is incorrectly rejected (there really is no treatment effect, but the study finds one) - the alpha-value determines the risk of this happening. An alpha-value of 0.05 - same as the p - value - so there is a 5 % chance of making a type 1 error p-value is the probability of the observed result arising by chance alone (if the H0 is actually true) a -value is the chance that the null hypothesis is incorrectly rejected ( a false positive / a type 1 error)

Answer 30

- This is a false negative - The null hypothesis is incorrectly accepted (there is a treatment effect, but the study finds none) - The (1 - beta) determines the risk of this happening - At a Beta is 0.8, so there is a 20% chance of making a type 2 error

Answer 31

The power of a statistical test is the probability of CORRECTLY REJECTING the null hypothesis It is the chance of the study demonstrating a true result. You can use 'the power' to calculate a sufficient sample size, and not run the risk of performing a pointless negative study. Power = 1 - false negative rate Power = 1 - Beta error Normally power is 80% (i.e. a 20% chance of false negative result)

Answer 32

1. Alpha value: level of significance (normally 0.05) 2. Beta value: the power (normally 0.2) 3. The statistical test you plan to use 4. The variance of the population (the greater the variance, the larger the sample size) 5. The effect size (the smaller the effect size, the larger the sample required)

Answer 33

STATISTICAL SIGNIFICANCE - the likelihood othat the results obtained were not due to chance - data which do not reach statistical significance are too weak to reach any conclusion CLINICAL SIGNIFICANCE - the practical importance of a treatment effect - clinical significance implies that the difference between treatmnes in effectiveness is clinically important, and it is possible that clinical practice will change if such a difference is seen. Statistical significance is used to inform clinical significance

Answer 34

Only the primary outcome can change practice, if the study findings are found to be both statistically and clinically significant

Answer 35

Secondary outcomes are only hypothesis generating. They need further investigation to ensure that this was not just a chance finding

Answer 36

S = √ [ Σ(x - X)^2 / (n-1) ] x - obsrervation X - mean n = total number Chi squared (X^2) = Σ(Oi - Ei)^2 / Ei Oi - Observed value Ei - Expected value

Answer 37

The Chi squared test can be used to test the 'goodness of fit' between observed and expected data. It is used similar to the p-value used for quantitative data. Interpretation: Calculated Chi squared > Chi square critical value (p = 0.05) --> reject your null hypothesis. Calculated Chi squared < Chi squared critical value (p=0.05) --> accept your null hypothesis

Answer 38

Measure of robustness (or fragility) of the results of a clinical trial The fragility index is the number indicating how many patients would be required to convert a trial from being statistically significant to not significant (p>0.05) The larger the fragility index the better

Answer 39

1. Intervention and control groups may be different at the start 2. Intervention and control groups may become different as the study progresses 3. Intervention and control groups differ, independent of treatment at the end of the study

Answer 40

Treatment and control patients differ in prognosis Therapy - Randomisation - Randomisation with stratification Harm - Statistical adjustment of prognostic factors - Matching

Answer 41

Placebo - Therapy: Blinding of patients - Harm: Objective outcomes (mortality) Co-intervention - therapy: Blinding of caregivers - Harm: Document treatment differences and statistically adjust Bias in assessment - Therapy: Blinding of assessors of outcomes - Harm: Document treatment and statistically adjust

Answer 42

Loss to follow up - therapy: ensure complete follow up - harm: Ensure complete follow up Stop study early because of large effect - therapy: complete study as initially planned Omitting patients who did not receive assigned treatments - therapy: include all patients in the arm to which they were randomized

Answer 43

RCTs - HIGH QUALITY 1a - Systematic review (with homogeneity) of RCTs 1b - Individual RCT (w narrow CI) 1c - All or none ( All pts dies before Rx avail but now some survive on Rx. or some patients died before Rx avail. but now none die on it) LOW QUALITY RCTs and COHORT STUDIES 2a - Systematic review (with homogeneity) of cohort studies 2b - Individual Cohort studies (including low quality RCT < 80% follow up) 2c - 'Outcomes' Research or ecological studies 3a - Systematic review (with homogeneity) of Case control studies 3b - Individual Case control studies 4 - Case series (poor quality cohort and case control) 5 - Expert opinion or based on physiology, research or first principles

Answer 44

to present the summary data of a meta-analysis

Answer 45

X-axis: Odd's ratio Y-axis: List of studies Vertical line: Line of no effect - Odds Ratio of 1.0 Horizontal lines: confidence interval of individual study Square position: A point estimate of odds ratio Square size: Weight of study according to weighing rules of the meta-analysis (representing sample size and statistical power) Diamond: Combined result of the meta-analysis Results can be considered statistically significant if the CIs of the combined result do not cross the line of no effect

Answer 46

Parametric tests - Rqr. Normal distribution - Are more accurate - Rqr. Large sample size Non-parametric tests - Make no assumptions about the distribution of data - Better with smaller sample sizes (n < 30) - Have less power than parametric tests

Answer 47

A curve to determine the cut -off point in continuously distributed data, the predicts the presence of an outcome. 1) Screening cut point 2) Diagnostic cut point 3) Optimal cut point

Answer 48

Data point at extreme top and left of the curve = perfect test i.e. 100% sensitivity and 100% specificty X - Axis is 100 - specificity Y - Axis is Sensitivity Further left on the plot the more specific (Minimum false pos) Further up on the plot the more sensitive (Minimum false neg)

Answer 49

To determine an appropriate cut point for a test e.g. at what STOP-BANG score should you consider postoperative apnoea a clinical problem.

Answer 50

To present the time to an outcome in two different groups. Used to report the time of specific outcomes in two patient cohorts. The utility of a survival plot is that it can indicate the time period at which the patient is most likely to be at risk of the outcome (the steepest part of the curve)

Applied Statistics Flashcards

(75 cards)