Statistics Flashcards

Question

Definition of hazard ratio

Answer 1

relative likelihood of an event occurring in the treatment vs control group at any given point

Answer 2

statistical analysis method to predict a binary outcome, such as yes or no, based on existing independent variables

Answer 3

regression model that estimates relationship between one independent variable and one dependent variable using a straight line

Answer 4

hypothesis test to determine whether observed frequencies are significantly different to expected frequencies if the null hypothesis was true categorical variables

Answer 5

hypothesis test to determine whether means of two groups are significantly different from each other continuous variables

Answer 6

hypothesis test to determine whether means of three or more groups are significantly different from each other continuous variables

Answer 7

hypothesis test to compare the survival distributions of two samples

Answer 8

probability of survival curves for categorical values

Answer 9

survival analysis for both quantitative & categorical variables, which can simultaneously assess the effect of several risk factors on survival time

Answer 10

how closely 2 continuous variables move with each other

Answer 11

parametric: Pearson’s R non-parametric: Spearman’s rank correlation Rho

Answer 12

a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters: True Positive Rate. False Positive Rate

Answer 13

X axis: 1-specificity (false +ves) Y axis: sensitivity (true +ves)

Answer 14

X axis: study outcome, e.g. OR Y axis: study precision, e.g. SEM

Answer 15

RR = A/(A+B) / C/(C+D) (those who got the disease in all exposed vs those who got the disease in all not exposed)

Answer 16

Prospective studies

Answer 17

Ratio of odds of something happening vs the odds of something not happening with a particular exposure

Answer 18

Case-control studies

Answer 19

OR = A/C / B/D Odds of exposure in the cases, vs odds of exposure in the controls (the odds of getting the disease when exposed vs the odds of not getting the disease when exposed)

Answer 20

True - however, they are not the same thing … and most times they end up being very different

Answer 21

Useful when the risk is not constant with respect to time - it uses data from different time points where the risk might be changing over a period of time

Answer 22

45% more likely to have outcome X

Answer 23

In those who had oral cancer, the odds of chewing tobacco were 1.6 times higher than those who did not have oral cancer.

Answer 24

Odds ratio of 1.6 means the odds of disease is 60% higher in exposed people Whereas risk ratio of 1.6 means exposed people are 60% more likely to be diseased

Answer 25

At any particular point, group A is 21% less likely to have outcome X

Answer 26

Number of new cases of a disease within a specific period of time

Answer 27

Number of cases of disease at a given time

Answer 28

1/ARR -> tells you how many people need to be treated with that intervention in order to prevent one outcome occurring

Answer 29

ARR / incidence [control group] as % RR of 0.8 = RRR of 20% Relative risk reduction (RRR) refers to the percentage decrease in risk achieved by the group receiving the intervention vs. the group that did not receive the intervention (the control group). Absolute risk reduction (ARR) refers to the actual difference in risk between the treated and the control group.

Answer 30

bias confounding data dredging

Answer 31

Sample size too small Measurement variance being too large

Answer 32

Probability of making a type II error (under 0.8 and we are not too fussed?) (alpha is the probability of making a type I error)

Answer 33

Increase sample size Increase effect size Increase measurement precision

Answer 34

Accurate representation of the effect of the intervention because you have only included the people who have properly done the intervention.

Answer 35

Susceptible to attrition bias and exclusion bias

Answer 36

More accurate of results in clinical practice because in practice patients do not always follow instructions/protocols More generalisable

Answer 37

Not getting a true, accurate estimate of how well the drug actually does in optimal conditions Imputed values may be inaccurate

Answer 38

The assumption that any difference between experimental groups is due to chance

Answer 39

Worst-case scenario Hot deck imputation: fill in missing values from similar subjects with complete records Last observation carried forward

Answer 40

The narrower the standard deviation, the less important it is to have a large sample size

Answer 41

Paired t-test

Answer 42

One way ANOVA

Answer 43

Independent t-test

Answer 44

One way ANOVA

Answer 45

Wilcoxon signed rank

Answer 46

Friedman test

Answer 47

Mann-Whitney U test

Answer 48

Kruskal Wallis test

Answer 49

data that assumes a normal distribution. When data sets are large enough, parametric statistical tests can be employed regardless of normality. Parametric tests are generally considered to have greater statistical power.

Answer 50

data that does not assume a normal distribution. The data is ordinal, ranked, or has outliers that cannot be removed.

Answer 51

Cox proportional hazards, log-rank or Wilcoxon two-sample test. Cox model is the most used.

Answer 52

Data dredging means that some associations will crop up due to chance. Dredging: “cherry-picking of promising findings leading to a spurious excess of statistically significant results in published or unpublished literature”.

Answer 53

Log-rank test

Answer 54

Power is the ability to discern a certain difference if that difference exists. You usually pick a clinically meaningful difference. You need a population mean and standard deviation AND: ▪ The standard deviation of the test group ▪ The clinically meaningful difference of the test group ▪ Then you can calculate the size of the sample you need for certain power

Answer 55

effect size is the magnitude of the difference between groups. The absolute effect size is the difference between the average, or mean, outcomes in two different intervention groups.

Answer 56

a type of qualitative data which groups variables into categories ie hair colour

Answer 57

a kind of qualitative data that groups variables into ordered categories. ie range of income, or level or education

Answer 58

a data type which is measured along a scale, in which each point is placed at equal distance from one another ie temperature in degrees, time in minutes

Answer 59

a form of quantitative (numeric) data. ie height, weight,

Answer 60

Paired means that both samples consist of the same test subjects Unpaired means that both samples consist of distinct test subjects

Answer 61

also known as the significance level is the probability of rejecting the null hypothesis when it is true type 1 error - false positive

Answer 62

Inferential statistics allow you to test a hypothesis or assess whether your data is generalizable to the broader population

Answer 63

Correlation is a statistical measure that expresses the extent to which two variables change together at a constant rate.

Answer 64

a statistical technique that relates a dependent variable to one or more independent (explanatory) variables

Answer 65

Correlation and regression are techniques used to analyze the relationship between two quantitative variables. While correlation measures the strength of a linear relationship between two variables, regression measures how those variables affect each other using an equation.

Answer 66

degrees of freedom in a statistical calculation represent how many values involved in a calculation have the freedom to vary calculated to help ensure the statistical validity of chi-squared tests or t-tests etc

Answer 67

Statistical validity can be defined as the extent to which drawn conclusions of a research study can be considered accurate and reliable from a statistical test

Answer 68

Accuracy is how close a given set of measurements (observations or readings) are to their true value,

Answer 69

the agreement among repeated measurements of the same variable.

Answer 70

term variance refers to a statistical measurement of the spread between numbers in a data set how far each number in the set is from the mean

Answer 71

a substance that has no therapeutic effect, used as a control in testing new drugs.

Answer 72

Participants are assessed before and after an intervention Analysis is of the same participant

Answer 73

A single subject trial where an individual is the sole observation Provides optimal intervention for an individual (e.g. optimal dose)

Answer 74

Study that investigates multiple independent variables on an outcome measure (both separately and combined)

Answer 75

Number of participants required to take a medication/have an intervention (compared with the control) to see one positive event Is 1/ARR

Answer 76

How well the test is able to detect those with the disease True Positive (correctly detected with disease) /True Positive +False Negative (total with disease)

Answer 77

How well the test is able to rule out those without the disease True Negative (correctly detected without disease) /True Negative + False Positive (total without disease)

Answer 78

The percentage of people that test positive, that truly have the disease True Positive (correctly detected with disease / True Positive +False Positive (total that tested positive)

Answer 79

The percentage of people that test negative, that truly do NOT have the disease True Negative (correctly detected without disease)/ True Negative + False Negative (total that tested negative)

Answer 80

derived statistic that tells us how many patients must receive a particular treatment for 1 additional patient to experience a particular adverse outcome. Lower NNT and higher NNH values are associated with a more favorable treatment profile

Answer 81

An outlier is an observation that lies an abnormal distance from other values in a random sample from a population Extreme values that stand out greatly from the overall pattern of values in a dataset

Statistics Flashcards

(106 cards)