Overall Flashcards

Question

If A and B are statistically independent, what is P(AB) equal to?

Answer 1

P(AB) = P(A)P(B)

Answer 2

P(B|A)P(A) / P(B)

Answer 3

It incorporates any prior knowledge that a researcher might have about a hypothesis

Answer 4

A is the cause and B the effect

Answer 5

A number X(z) assigned to every outcome z of an experiment

Answer 6

The probability density function f(x) = dF(x)/dx

Answer 7

A discrete distribution function, where f(x) = the sum of P{X=x_i}delta{x-x_i}, where delta is an impulse function

Answer 8

No because P(X=x) = 0 when continuous. We have to calculate the probability that X lies in a small interval around x by integrating f(x) across a small interval

Answer 9

The integral of xf(x)

Answer 10

The sum of x_i multiplied by P(X=x_i)

Answer 11

sigma squared = E[X^2] - E[X]^2

Answer 12

When the sample size tends to infinity

Answer 13

Continuous

Answer 14

Bernoulli, binomial and uniform (discrete)

Answer 15

Normal (gaussian), poisson, exponential, uniform (continuous)

Answer 16

Binomial distribution with a single trial (n=1)

Answer 17

A single experiment with outcome 0 or 1

Answer 18

The number of successes in a sequence of independent Bernoulli trials

Answer 19

n = number of trials, p = probability of success

Answer 20

X ~ Bernoulli(p)

Answer 21

X ~ B(n,p)

Answer 22

A finite number n of outcome values are equally likely to be observed

Answer 23

A continuous random variable that is likely to take any value between two states bounds a and b

Answer 24

X~U(a,b), where a and b are the bounds (minimum and maximum values) with a

Answer 25

X ~ N(mu, sigma squared), where mu = mean and sigma squared = variance (sigma = standard deviation)

Answer 26

B(n,p) approx = N(np, np(1-p))

Answer 27

n*p and n*(1-p) > 5

Answer 28

A normal distribution with a mean of 0 and a standard deviation of 1

Answer 29

Z ~ N(0,1)

Answer 30

Stretched by the value of the standard deviation and translated by the mean value

Answer 31

Z = (X - mu) / sigma (standard deviation)

Answer 32

Capital phi(z)

Answer 33

A plot of the sorted values from the data set against the expected value of the corresponding quantiles from the standard normal distribution

Answer 34

It is used to visually assess the normality of data i.e. it compares two probability distributions by plotting their quantiles against each other

Answer 35

The z-distribution (standard normal)

Answer 36

An approach to change the shape of a skewed distribution so that it becomes normal or nearly normal with power transformations

Answer 37

Mean (Y) = a * mu + b and variance (Y) = a squared * sigma squared

Answer 38

Mean(Y) = mu1 + mu2 +...+ mu_n and variance (Y) = sigma 1squared + sigma 2squared +...+ sigma n squared

Answer 39

Mean(Y) = mu1 - mu2 and variance(Y) = sigma 1 squared + sigma 2 squared

Answer 40

A normal distribution

Answer 41

X ~ Poisson(mu), where mu is the mean

Answer 42

mu (= to the mean)

Answer 43

B(n,p) approx = Poisson (np)

Answer 44

If n is large (n>50) and p is small (p<0.05)

Answer 45

T ~ M(lambda), where lambda is more than 0 and it is called the rate parameter

Answer 46

The distance (can be any measure or units, eg time) between events in a Poisson point process

Answer 47

A model for the occurrence of events in continuous time. It is a counting process for events that appear to happen at a certain rate but completely at random

Answer 48

Events occur singly, the rate of occurrence of events remains constant and the incidence of future events is independent of the past

Answer 49

X ~ Poisson(lambda * t), which models the number of events at time t and T ~ M(lambda), which models the waiting time between events

Answer 50

A random interval which contains the parameter being estimated with the probability of the confidence level

Answer 51

The confidence intervals would be expected to include the true value on 95 occasions

Answer 52

The sample mean (x bar), the sample standard deviation (s), the number of items in the sample (n) and z, the (1-alpha/2) quantile of the standard normal distribution

Answer 53

100(1-alpha)%, where alpha is used to calculate the z in the confidence interval equation

Answer 54

The sum of the squares of independent standard normal distributions

Answer 55

W ~ chi-squared X^2 (v), where v is the mean and number of independent standard normal distributions

Answer 56

For the ratio of their sample variances S1^2/S2^2 with v1 and v2 degrees of freedom. This is for two independent samples with normal distribution and degrees of freedom v1 and v2

Answer 57

It can be either depending on the null hypothesis (eg two sided is required if the null is drug A = drug B but one sided if the null is drug A is not effective)

Answer 58

No it can only be rejected/refuted

Answer 59

It provides evidence in favour of the alternative hypothesis

Answer 60

A hypothesis that predicts the direction of a relationship or difference between two variables. Also known as a one-tailed hypothesis

Answer 61

A one-tailed test looks for an increase or decrease in a parameter, whereas a two-tailed test looks for a change in parameter

Answer 62

If the null hypothesis is true, the significance level is the proportion of the repeated experiments in which the null hypothesis will be falsely rejected

Answer 63

Rejecting the null hypothesis when it is true (called a false positive)

Answer 64

Not rejecting the null hypothesis when it is false (false negative)

Answer 65

0.05 (5%) and alpha level

Answer 66

Gamma and 0.8

Answer 67

For a sample of size n with normal distribution with values for the mean and standard deviation. The t test can be used to test a null hypothesis: mu = mu_nought (value). (Compare paired samples)

Answer 68

n-1, where n is the sample size

Answer 69

n1 + n2 - 2, where n1 and n2 are the sample sizes

Answer 70

Calculate the test statistic (either 1 or 2 sample), determine the degrees of freedom, go to the correct row on the table that matches the degrees of freedom and determine which quantile the test statistic matches with

Answer 71

1 minus the quantile for one sided tests and double this for two sided

Answer 72

The probability of having observed our data or more extreme given the null hypothesis is true

Answer 73

Little evidence against the null hypothesis

Answer 74

Weak evidence against the null hypothesis

Answer 75

Moderate evidence against the null hypothesis

Answer 76

Strong evidence against the null hypothesis

Answer 77

It can sometimes be misinterpreted as meaning the probability of the null hypothesis being correct or the probability that the observed effect is not real

Answer 78

Research findings with p more than 0.05 sometimes do not get published

Answer 79

Researchers sometimes change their conclusions radically depending on which side of 0.05 the p value is

Answer 80

For two samples sizes n1 and n2 with values for the means (x bar 1 and x bar 2) and standard deviations. The t test can be used to test a null hypothesis: x bar 1 = x bar 2 (means of both samples are equal) (compare two unrelated samples)

Answer 81

Variation in each population can be modelled by a normal distribution. Samples are independent. Populations variances are equal (differ by a factor of < 3)

Answer 82

A binomial distribution with n (number of samples) and p (probability of a success) (remember this can be approximated by a normal distribution)

Answer 83

The hypothesised difference

Answer 84

Normal distribution e.g. n*p and n*(1-p) > 5 this must be followed

Answer 85

Two-tailed and it is the standard normal distribution (z distribution)

Answer 86

The study sample size required (sample size per group)

Answer 87

The standard deviation for the underlying population sigma, the hypothesised difference between two groups (d), the quantile values on the standard normal distribution table that relate to (1 - half the significance level) and the power

Answer 88

To calculate the size of difference (d) that could be detected as statistically significant given the sample size per group

Answer 89

The study sample size required

Answer 90

The difference in means version has the standard deviation, whereas the difference in proportions version has pi nought, which is the average proportion of the two groups

Answer 91

The power is the probability of detecting a significant difference when one exists.

Answer 92

The process of determining the sample size for a research study to detect a significant difference in means or proportions

Answer 93

There is no assumption that the underlying distribution comes from a specific family

Answer 94

Wilcoxon sign rank test

Answer 95

Mann-Whitney test

Answer 96

Data ranks

Answer 97

The data is skewed or the sample size is too small

Answer 98

n is the sample size after deletion and it should be 16 or above (in handbook)

Answer 99

They are uncorrelated and independent (and no assumption of normal distribution)

Answer 100

n_A and n_B are the respective samples and each sample size should be 8 or above (in handbook)

Answer 101

The chi-squared tests for goodness of fit of an observed distribution (of observed frequencies) to a theoretical one

Answer 102

p is the number of estimated parameters

Answer 103

Null: data is a good fit to the model. Alternative: the difference is too large (as squared so can't be negative)

Answer 104

At least 5

Answer 105

Students t-test is comparing means between two groups, whereas ANOVA is to compare means of two or more groups

Answer 106

Statistical independence of cases within each group

Answer 107

Normality (distribution in each group is normal) and equality of variances (homoscedasticity), so the variance in each group are assumed to be the same (can differ by a factor of 3)

Answer 108

Parametric: one way ANOVA. Non-parametric: Kruskall-Wallis

Answer 109

Parametric: Two way ANOVA. Non-parametric: Friedman

Answer 110

Signal detection theory

Answer 111

To assess the performance of diagnostic tests

Answer 112

A decision threshold

Answer 113

Sensitivity

Answer 114

Specificity

Answer 115

Type I error and 1 - specificity

Answer 116

Type II error and 1 - sensitivity

Answer 117

We can improve the sensitivity by moving the decision threshold to a higher value (less strict criteria for positive), or we can improve the specificity by moving the decision threshold to a lower value (more strict criteria for positive)

Answer 118

A graph of sensitivity against 1-specificity (type I error) so true positive rate against false positive rate

Answer 119

If the curve is closer to the left hand and top border of the ROC space

Answer 120

The area under the curve, with an area of 1 being a perfect test

Answer 121

Positive predictive value and negative predictive value

Answer 122

How reliable is this positive result and no

Answer 123

For the probabilities of P(disease|positive) and P(-disease|negative), which is the opposite way around to the sensitivity and specificity probabilities

Answer 124

If knowing the value of one of the variables tells you something about the value of the other

Answer 125

Linear correlation (parametric)

Answer 126

Monotonic correlation (non-parametric)

Answer 127

Ordinal data (as well as continuous) because it uses ranks instead of assumptions of normality

Answer 128

Null hypothesis is zero correlation, whereas the alternative is a 2-sided hypothesis (there is some sort of correlation)

Answer 129

T-distribution tables and n-2 degrees of freedom

Answer 130

In contingency tables (cross tabulation format) so one variable at the top and one on the left of the table

Answer 131

(row total multiplied by column total) divided by overall total

Answer 132

Chi squared test (null hypothesis of no correlation)

Answer 133

(number of rows -1) multiplied by (number of columns -1)

Answer 134

Greater than one indicates an exposure to be harmful (increased risk), whereas less than one indicates a protective effect (decreased risk)

Answer 135

If both the exposure and disease are associated with a third variable (confounder)

Answer 136

Kappa and non-parametric

Answer 137

The sum of the matching terms on the contingency table (should be along the y=-x line). For the percentage agreement, divide this number by the total)

Answer 138

For each concordant pair on the contingency table, multiple the row and column totals and divide by overall total. For the percentage agreements, sum these values together and divide by total

Answer 139

They measure the agreement between two methods measuring the same parameter. It plots the difference between the two measurements against the average measurement

Answer 140

It describes the change in effect (e.g. OR) caused by change in level of exposure

Answer 141

Method validation, quality improvement, service evaluation, audit, research

Answer 142

Integrated Research Application System (IRAS)

Answer 143

The collection of documentation (sponsor's file plus each investigator site file) needed to evaluate the study in terms of conduct, integrity of data and compliance

Answer 144

Data are collected on one or more groups of subjects purely from a non-interfering observers point of view

Answer 145

The researcher deliberately influences the clinical management of the subjects in order to investigate the outcome

Answer 146

Subjects with the disease are identified and compared to those without but who are otherwise comparable (controls). The past history of the groups is examined to determine their exposure to a particular risk

Answer 147

Two groups are identified as one exposed and one not exposed to a risk. The groups are followed up over time and the occurrence of the disease in each group is identified

Answer 148

Rare diseases will need lots of subjects and make take a long time. Subjects might drop out. Might not be feasible or ethical

Answer 149

They do not rely on the accuracy of medical records

Answer 150

Surveys where the subject are contact once

Answer 151

Subjects are assessed before and after an intervention

Answer 152

Subjects receive both intervention and control treatments in a randomised manner with a washout period in between

Answer 153

Studies that investigate the effects of more than one variable on the outcome

Answer 154

Selection bias and randomization is a process to reduce the effect of bias

Answer 155

Each patient has an equal chance of being allocated to treatment given

Answer 156

Subject are randomly allocated to blocks which determine the order in which they receive the treatment

Answer 157

Subjects are first divided into subgroups according to a particular characteristic and randomization is balanced within the subgroups

Answer 158

It is done to reduce bias due to the observer's or subject's judgement

Answer 159

The subject does not know what treatment they are receiving

Answer 160

Both the subject and observer do not know which treatment is given

Answer 161

Subjects in the treatment arm might drop out of the study if they experience problems

Answer 162

Ensuring a proven method (e.g. lab test) is reliable within specific parameters

Answer 163

Making local changes to improve local service

Answer 164

An assessment about what standards do new or existing services meet and how are they performing. It often goes hand in hand with innovation (evaluation-innovation cycle)

Answer 165

Is the service meeting a particular standard

Answer 166

Generating new generalisable knowledge and requires formal approval. In a clinical context, it can introduce non-standard of care healthcare

Answer 167

The organisation taking overall responsibility for proportionate, effective arrangements in place to set up, run and report a research project

Answer 168

The overall lead researcher for a research project, responsible for the overall conduct of a research project

Answer 169

They are responsible for the conduct at a research site with one PI per site

Answer 170

The organisation responsible for the management and oversight of the data

Answer 171

Identifying and addressing poorly designed research. Ensures that the roles and responsibilities of all parties are agreed and recorded. They initiate a site.

Answer 172

Study design, methods of data collection and data analysis, sample technique and sample size

Answer 173

Sponsorship, grant application, REC approval, HRA approval, other regulatory approvals, local NHS Trust approvals

Answer 174

To conduct an independent ethical review to ensure that participant safety is central and follows the principles of the Declaration of Helsinki

Answer 175

For efficacy (not ethical to keep going if you know it works), and for safety concerns (if you know it isn't working or might not be safe)

Answer 176

Can bias the results, systematic over-estimation of benefits of intervention when stopped for efficacy, and precision of estimates of effect sizes will be poorer

Answer 177

Dissemination of results, destruction of samples, archiving and update the public database

Answer 178

Approval confirming a study is complaint with applicable regulations and standards, including a favourable opinion from a REC, a clinical trials authorisation or any other relevant approvals (eg radiation)

Answer 179

An MPE to quantify the dose and risk and clinical radiation expert (CRE) to justify it

Answer 180

One or more other variables

Answer 181

The data pairs (x,y) are statistically independent

Answer 182

Find the maximum of the likelihood function

Answer 183

(B=beta) E[Y|x]=B0 + B1x + B2x + ...(doesn't have to be linear in x but typically it seems to be)

Answer 184

It is a measure of the fit of a linear model, where 1 implies a good fit and near 0 is a bad fit

Answer 185

The correlation coefficient

Answer 186

Normal distribution

Answer 187

Between 0 and 1 because it is predicting the probability of success

Answer 188

Bernoulli or binomial

Answer 189

Yes, they both look like B0 + B1x + B2x + .... (B=beta)

Answer 190

All means (a_i) are equal

Answer 191

The number of groups is fixed a priori as part of the design

Answer 192

Measuring the length of time for an event (such as death or failure) to occur. Also called 'time to event' analysis

Answer 193

1 (no events have occurred e.g. everyone's alive)

Answer 194

0 (all events possible have occurred e.g. everyone's dead)

Answer 195

The probability of the event happening at a time given it has not happened yet

Answer 196

The survival function is the probability of surviving at least to time t, whereas the hazard function is the conditional probability of dying at time t having survived to that time

Answer 197

When subjects drop out of the study due to causes other than the cause of interest

Answer 198

Subjects are recruited until a certain number of events have occurred, or subjects are followed up for a fixed period of time

Answer 199

It can introduce bias in the data and ignoring censored data is wasteful and reduces the power of the study and leads to pessimistic estimation of survival

Answer 200

A graph of cumulative survival probabilities against time. It is a step-function, with each step indicating an event or censoring

Answer 201

Intention-to-treat includes all participants in the statistics whether or not they withdrew. Per protocol only includes subjects that completed the study (not those that withdrew)

Answer 202

It can make the treatment look better than is is if the number of dropouts is large, and a large number of dropouts in the treatment group can indicate problems with the treatment

Answer 203

For each case, a control subject is selected matched to the confounding variables

Answer 204

No association in a 1-1 matched case-control study

Answer 205

Identifies and critically appraises all research on a specific topic, and combines valid studies

Answer 206

Rigorous pooling of results, increase confidence from small studies, may eradicate bias, can be updated, identifies areas where more research is needed

Answer 207

Time consuming, expensive, may be affected by publication bias

Overall Flashcards

(249 cards)