Overall Flashcards

1
Q

What are the two major types of data?

A

Categorical (qualitative) and metric (quantitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two subtypes of categorical (qualitative) data?

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two subtypes of metric (quantitative) data?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does nominal data relate to?

A

It is used to label variables without any order or quantitative value. It usually relates to named things and there are no units of measurements. We allocate each value to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does ordinal data relate to?

A

The values can be meaningfully ordered and it is categorical because each value is assigned to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does discrete data relate to?

A

The values are distinct and can have units of measurements. The data can have finite values and they are integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does continuous data relate to?

A

Fractional numbers that result from measurement and they can have units of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a box (and whisker) plot, what are the adjacent values (defined in this specific course)?

A

Furthest away from the median but still within 1.5 times the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a box (and whisker) plot, what are the points outside the adjacent values?

A

Potential outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the interquartile range?

A

Upper quartile value (3/4) subtracted by the lower quartile value (1/4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the sample standard deviation?

A

Square root of the summation of (each value minus the mean) squared then divided by the sample size - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the residuals in the standard deviation equation?

A

Value minus the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the variance when the sample standard deviation is s?

A

s squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is skewness and how is it measured?

A

A measure of symmetry of a distribution and it is measured by the skewness coefficient that can vary between -1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the skewness coefficient for a symmetric distribution?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the skewness coefficient for a distribution with the mean to the left of the mode (most values are larger values in the range, long tail to the left in the negative direction)

A

Closer to -1 (left or negative skew)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the skewness coefficient for a distribution with the mean to the right of the mode (most values are smaller values in the range, long tail to the right in the positive direction)

A

Closer to +1 (right or positively skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Probability theory is based on set theory, what is contained in set S (called space)?

A

All sets are subsets of set S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the null set?

A

The set that contains no elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For experimental events, what is an event represented by and what is an impossible event?

A

An event is a set and an impossible event is the null set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If sets A and B are mutually exclusive, what is P(A+B) and the intersection of A and B?

A

P(A+B) = P(A)+P(B), and AB ={}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The conditional probability of A given B is defined as: P(A|B) =

A

P(AB)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For conditional probability, should there be a causal or temporal relation between A and B?

A

They may or may not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does it mean if conditional probability has no effect on the probability of an event P(A|B)=P(A)?

A

Events A and B are statistically independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
If A and B are statistically independent, what is P(AB) equal to?
P(AB) = P(A)P(B)
26
What is Bayes' theorem? P(A|B) = ?
P(B|A)P(A) / P(B)
27
What does Bayesian probability include?
It incorporates any prior knowledge that a researcher might have about a hypothesis
28
Why is Bayes' theorem also called the theorem of probability of causes?
A is the cause and B the effect
29
What is a random variable (RV)?
A number X(z) assigned to every outcome z of an experiment
30
What is the cumulative distribution function F(x)?
P{X<=x}
31
If the cumulative distribution function F(X) is continuous, what is its derivative?
The probability density function f(x) = dF(x)/dx
32
If the cumulative distribution function is discrete, what is f(x)?
A discrete distribution function, where f(x) = the sum of P{X=x_i}delta{x-x_i}, where delta is an impulse function
33
Can we calculate P(X=x) for a continuous cumulative distribution function and what do we do?
No because P(X=x) = 0 when continuous. We have to calculate the probability that X lies in a small interval around x by integrating f(x) across a small interval
34
What is the expectation or mean of a random variable when the cumulative distribution function is continuous?
The integral of xf(x)
35
What is the expectation or mean of a random variable when the cumulative distribution function is discrete?
The sum of x_i multiplied by P(X=x_i)
36
What is the variance of a random variable in terms of the mean or expectation (E)?
sigma squared = E[X^2] - E[X]^2
37
When do the sample-based approximations of mean and variance converge to the theoretical quantities?
When the sample size tends to infinity
38
Are probability mass functions for discrete or continuous variables?
Discrete
39
Are probability distribution functions for discrete or continuous variables?
Continuous
40
What are the three types of probability mass functions that this course deals with?
Bernoulli, binomial and uniform (discrete)
41
What are the four types of probability distribution functions that this course deals with?
Normal (gaussian), poisson, exponential, uniform (continuous)
42
The Bernoulli distribution is a special case of what type of distribution and what is the special case?
Binomial distribution with a single trial (n=1)
43
What is the outcome of a Bernoulli triart with outcome 0 or 1l?
A single experiment with outcome 0 or 1
44
For a Bernoulli distribution, what is the probability of X=1 (P(1))?
p
45
For a Bernoulli distribution, what is the probability of X=0 (P(0))?
1-p
46
What is the mean (expected value) of a Bernoulli random variable?
p
47
What is the variance of a Bernoulli random variable?
p(1-p)
48
Since the Bernoulli distribution is a special case of the binomial distribution, what could the binomial distribution be thought of as?
The number of successes in a sequence of independent Bernoulli trials
49
What are the parameters of the binomial distribution?
n = number of trials, p = probability of success
50
How are the Bernoulli probability distribution random variables denoted?
X ~ Bernoulli(p)
51
How are the Binomial probability distribution random variables denoted?
X ~ B(n,p)
52
What is the mean (expected value) of a binomially distributed random variable?
np
53
What is the variance of a binomially distributed random variable?
np(1-p)
54
What is the discrete uniform distribution?
A finite number n of outcome values are equally likely to be observed
55
What is the probability of every one of the n outcome values in a discrete uniform distribution?
1/n
56
What is the mean (expected value) of a discrete uniform distribution?
(n+1)/2
57
What is the continuous uniform distribution?
A continuous random variable that is likely to take any value between two states bounds a and b
58
How are the continuous uniform probability distribution random variables denoted and what do the parameters mean?
X~U(a,b), where a and b are the bounds (minimum and maximum values) with a
59
What is the probability p(x) of a variable under the continuous uniform distribution?
1/(b-a)
60
What is the mean (expected value) of a random variable that is distributed via the continuous normal distribution?
(a+b)/2
61
How are the normal probability distribution random variables denoted and what do the parameters mean?
X ~ N(mu, sigma squared), where mu = mean and sigma squared = variance (sigma = standard deviation)
62
How can a binomial distribution be approximated by a normal distribution?
B(n,p) approx = N(np, np(1-p))
63
Under what circumstances can the normal approximation to the binomial distribution be used?
n*p and n*(1-p) > 5
64
What is the standard normal distribution?
A normal distribution with a mean of 0 and a standard deviation of 1
65
How is the standard normal probability distribution random variables denoted?
Z ~ N(0,1)
66
Every normal distribution is a version of standard normal distribution, whose domain is stretched by what factor and translated by what value?
Stretched by the value of the standard deviation and translated by the mean value
67
How do you convert the random variable X for the normal distribution to the random variable Z for the standard normal distribution?
Z = (X - mu) / sigma (standard deviation)
68
What denotes the cumulative distribution function (cdf) of the standard normal distribution P(Z<=z)?
Capital phi(z)
69
What is the Q-Q plot (or quantile or normal probability plot)?
A plot of the sorted values from the data set against the expected value of the corresponding quantiles from the standard normal distribution
70
What is the normal probability plot (Q-Q plot) used for?
It is used to visually assess the normality of data i.e. it compares two probability distributions by plotting their quantiles against each other
71
If the two distributions being compared are similar, what line will the points on the Q-Q plot (normal probability plot) lie on?
y=x
72
What distribution does the normal probability plot use?
The z-distribution (standard normal)
73
What is the ladder of powers?
An approach to change the shape of a skewed distribution so that it becomes normal or nearly normal with power transformations
74
If X is a random variable with mean mu and variance sigma squared, what is the mean and variance of Y=aX+b?
Mean (Y) = a * mu + b and variance (Y) = a squared * sigma squared
75
If X1, X2,..., Xn are independent random variables with mean mu1... and variances sigma 1 squared...., what is the mean and variance of Y = X1 + X2 +... + Xn?
Mean(Y) = mu1 + mu2 +...+ mu_n and variance (Y) = sigma 1squared + sigma 2squared +...+ sigma n squared
76
If X1 and X2 are independent random variable with means mu2 and mu2 and variances sigma 1 squared..., what is the mean and variance of Y= = X1-X2?
Mean(Y) = mu1 - mu2 and variance(Y) = sigma 1 squared + sigma 2 squared
77
If independent random variables X1, X2, ... Xn are combined algebraically (eg Y=X1-X2 or Y=X1+X2+...+Xn) and they all X variables have normal distributions, what distribution does Y have?
A normal distribution
78
How are the poisson probability distribution random variables denoted and what do the parameters mean?
X ~ Poisson(mu), where mu is the mean
79
What is the variance of a poisson distribution random variable?
mu (= to the mean)
80
How can the binomial distribution be approximated by the poisson distribution?
B(n,p) approx = Poisson (np)
81
When can you use the Poisson approximation to the binomial distribution?
If n is large (n>50) and p is small (p<0.05)
82
How are the exponential probability distribution random variables denoted and what do the parameters mean?
T ~ M(lambda), where lambda is more than 0 and it is called the rate parameter
83
What is the exponential distribution for?
The distance (can be any measure or units, eg time) between events in a Poisson point process
84
What is the mean of the exponential distribution?
1/lambda
85
What is the Poisson process?
A model for the occurrence of events in continuous time. It is a counting process for events that appear to happen at a certain rate but completely at random
86
What are the assumptions of the poisson process?
Events occur singly, the rate of occurrence of events remains constant and the incidence of future events is independent of the past
87
If a Poisson process can be modelled with event that occur at random with a rate of lambda and time t, what two random variables can be used which have two different probability distributions?
X ~ Poisson(lambda * t), which models the number of events at time t and T ~ M(lambda), which models the waiting time between events
88
What is a confidence interval?
A random interval which contains the parameter being estimated with the probability of the confidence level
89
If there is a 95% confidence interval and an experiment is repeated 100 times, what can we say about the results?
The confidence intervals would be expected to include the true value on 95 occasions
90
To calculate the confidence interval for the mean mu of a population, using a random sample of size n (large n), what values are required?
The sample mean (x bar), the sample standard deviation (s), the number of items in the sample (n) and z, the (1-alpha/2) quantile of the standard normal distribution
91
When calculating the confidence interval, the confidence level is required. What is the equation for the confidence level?
100(1-alpha)%, where alpha is used to calculate the z in the confidence interval equation
92
What is the chi-squared distribution?
The sum of the squares of independent standard normal distributions
93
How are the chi-squared probability distribution random variables denoted and what do the parameters mean?
W ~ chi-squared X^2 (v), where v is the mean and number of independent standard normal distributions
94
When is the F distribution used?
For the ratio of their sample variances S1^2/S2^2 with v1 and v2 degrees of freedom. This is for two independent samples with normal distribution and degrees of freedom v1 and v2
95
Is the alternative hypothesis one or two sided?
It can be either depending on the null hypothesis (eg two sided is required if the null is drug A = drug B but one sided if the null is drug A is not effective)
96
Can a null hypothesis be accepted or proved?
No it can only be rejected/refuted
97
What does rejecting the null hypothesis do?
It provides evidence in favour of the alternative hypothesis
98
What is a directional hypothesis?
A hypothesis that predicts the direction of a relationship or difference between two variables. Also known as a one-tailed hypothesis
99
What is the difference between a one- and two-tailed test in terms of the hypothesis?
A one-tailed test looks for an increase or decrease in a parameter, whereas a two-tailed test looks for a change in parameter
100
What is the significance level?
If the null hypothesis is true, the significance level is the proportion of the repeated experiments in which the null hypothesis will be falsely rejected
101
What is type I error?
Rejecting the null hypothesis when it is true (called a false positive)
102
What is type II error?
Not rejecting the null hypothesis when it is false (false negative)
103
What significance level is typically set and what can it be referred to as when considering type I error?
0.05 (5%) and alpha level
104
What symbol denotes type II error and what is it usually set to?
Gamma and 0.8
105
When should the students 1-sample t-test be used?
For a sample of size n with normal distribution with values for the mean and standard deviation. The t test can be used to test a null hypothesis: mu = mu_nought (value). (Compare paired samples)
106
What is the degrees of freedom of a students 1-sample t-test?
n-1, where n is the sample size
107
What is the degrees of freedom of a students 2-sample t-test?
n1 + n2 - 2, where n1 and n2 are the sample sizes
108
How do you use the t-distribution tables for a students t-test question?
Calculate the test statistic (either 1 or 2 sample), determine the degrees of freedom, go to the correct row on the table that matches the degrees of freedom and determine which quantile the test statistic matches with
109
After you get a quantile from the t-distribution tables, how do you determine the p value from this?
1 minus the quantile for one sided tests and double this for two sided
110
What is the definition of the P value?
The probability of having observed our data or more extreme given the null hypothesis is true
111
How do you interpret a p value above 0.1?
Little evidence against the null hypothesis
112
How do you interpret a p value between 0.1 and 0.05?
Weak evidence against the null hypothesis
113
How do you interpret a p value between 0.01 and 0.05?
Moderate evidence against the null hypothesis
114
How do you interpret a p value below 0.01?
Strong evidence against the null hypothesis
115
Why is misinterpretation a problem with p values?
It can sometimes be misinterpreted as meaning the probability of the null hypothesis being correct or the probability that the observed effect is not real
116
Why is publication bias a problem with p values?
Research findings with p more than 0.05 sometimes do not get published
117
Why is over-reliance a problem with p values?
Researchers sometimes change their conclusions radically depending on which side of 0.05 the p value is
118
When should the students 2-sample t-test be used?
For two samples sizes n1 and n2 with values for the means (x bar 1 and x bar 2) and standard deviations. The t test can be used to test a null hypothesis: x bar 1 = x bar 2 (means of both samples are equal) (compare two unrelated samples)
119
What assumptions have to be in place for a students 2-sample t-test?
Variation in each population can be modelled by a normal distribution. Samples are independent. Populations variances are equal (differ by a factor of < 3)
120
How are proportion data modelled and with what parameters?
A binomial distribution with n (number of samples) and p (probability of a success) (remember this can be approximated by a normal distribution)
121
The test statistic for the difference in proportions includes d, what is this parameter?
The hypothesised difference
122
What assumption needs to be met for the differences in proportions test to be applied?
Normal distribution e.g. n*p and n*(1-p) > 5 this must be followed
123
For a differences in proportions test, is the null hypothesis one or two tailed and what type of distribution does it have?
Two-tailed and it is the standard normal distribution (z distribution)
124
To detect a statistically significant difference between the means of two groups, what calculation needs to be done?
The study sample size required (sample size per group)
125
What parameters are used in the equation for the sample size for the difference in means?
The standard deviation for the underlying population sigma, the hypothesised difference between two groups (d), the quantile values on the standard normal distribution table that relate to (1 - half the significance level) and the power
126
The sample size equation for difference in means can be rearranged to determine what instead?
To calculate the size of difference (d) that could be detected as statistically significant given the sample size per group
127
To detect a statistically significant difference between the proportions of two groups, what calculation needs to be done?
The study sample size required
128
The equations for the sample size for difference in means and difference in proportions contain the same parameters except one difference in each, what is this difference?
The difference in means version has the standard deviation, whereas the difference in proportions version has pi nought, which is the average proportion of the two groups
129
The equations for the sample size for difference in means and difference in proportions include a quantile that relates to power. What is the power and what does a higher value of it mean?
The power is the probability of detecting a significant difference when one exists.
130
What is power analysis?
The process of determining the sample size for a research study to detect a significant difference in means of proportions
131
When are non-parametric tests used?
There is no assumption that the underlying distribution comes from a specific family
132
What is the non-parametric version of the 1-sample students t-test to compare paired samples?
Wilcoxon sign rank test
133
What is the non-parametric version of the 2-sample students t-test to compare two unrelated samples?
Mann-Whitney test
134
Instead of using actual values, what does the Wilcoxon signed rank test use?
Data ranks
135
In what cases could the Wilcoxon signed rank test be used instead of the 1-sample t-test?
The data is skewed or the sample size is too small
136
The test statistic (W_+) is for the Wilcoxon signed rank test to approximate the sum of positive ranks as an approximate normal distribution. What is n in the equations and when is the approximation adequate?
n is the sample size after deletion and it should be 16 or above (in handbook)
137
What does the Mann-Whitney test assume about two samples?
They are uncorrelated and independent (and no assumption of normal distribution)
138
The test statistic (U_A) is for the Mann-Whitney test to approximate the sum of ranks for sample A as an approximate normal distribution. What is n_A and n_B in the equations and when is the approximation adequate?
n_A and n_B are the respective samples and each sample size should be 8 or above (in handbook)
139
When is the chi-squared distribution used?
The chi-squared tests for goodness of fit of an observed distribution (of observed frequencies) to a theoretical one
140
What is p in the equation for the degrees of freedom (k-p-1) for the chi-squared goodness of fit test?
p is the number of estimated parameters
141
What is the null and alternative hypotheses for a chi-squared goodness of fit test?
Null: data is a good fit to the model. Alternative: the difference is too large (as squared so can't be negative)
142
Are chi-squared goodness of fit tests 1-sided or 2-sided?
1-sided
143
For a chi-squared goodness of fit test, what value should the expected frequency in each category be at least?
At least 5
144
How is the students t-test a special case of an ANOVA?
Students t-test is comparing means between two groups, whereas ANOVA is to compare means of two or more groups
145
What is an assumption for both parametric and non-parametric ANOVA?
Statistical independence of cases within each group
146
What additional assumptions are required for a parametric ANOVA?
Normality (distribution in each group is normal) and equality of variances (homoscedasticity), so the variance in each group are assumed to be the same (can differ by a factor of 3)
147
What is the parametric and non-parametric tests to compare more than two sets of observations on the same sample?
Parametric: one way ANOVA. Non-parametric: Kruskall-Wallis
148
What is the parametric and non-parametric tests to compare more than two sets of observations on a single sample under different conditions?
Parametric: Two way ANOVA. Non-parametric: Friedman
149
Receiver operating characteristic (ROC) analysis is part of what theory?
Signal detection theory
150
What is our purpose for ROC (receiver operating characteristic) analysis?
To assess the performance of diagnostic tests
151
What is the value called where above it we consider the test to be abnormal and below we consider the test to be normal?
A decision threshold
152
What is the true positive rate also known as? This is the probability of a positive instance given that the disease is present
Sensitivity
153
What is the true negative rate also known as? This is the probability of a negative instance given that the disease is not present
Specificity
154
What is the false positive rate also known as and how does it relate to specificity? This is the probability of a positive instance given that the disease is not present
Type I error and 1 - specificity
155
What is the false negative rate also known as and how does this relate to sensitivity? This is the probability of a negative instance given that the disease is present
Type II error and 1 - sensitivity
156
In what way is there a trade off between sensitivity and specificity in ROC analysis?
We can improve the sensitivity by moving the decision threshold to a higher value (less strict criteria for positive), or we can improve the specificity by moving the decision threshold to a lower value (more strict criteria for positive)
157
What is a ROC curve?
A graph of sensitivity against 1-specificity (type I error) so true positive rate against false positive rate
158
What demonstrates an accurate test on a ROC curve?
If the curve is closer to the left hand and top border of the ROC space
159
What is a scalar measure of the performance of a test on a ROC curve?
The area under the curve, with an area of 1 being a perfect test
160
For ROC analysis, reliability is also known by what two names that are calculable?
Positive predictive value and negative predictive value
161
In ROC analysis, what is reliability and does accuracy directly imply reliability?
How reliable is this positive result and no
162
In ROC analysis, an alternative definition of reliability uses Bayes theorem in what way?
For the probabilities of P(disease|positive) and P(-disease|negative), which is the opposite way around to the sensitivity and specificity probabilities
163
When are two variables said to be correlated?
If knowing the value of one of the variables tells you something about the value of the other
164
What type of correlation is measured by the Pearson correlation coefficient (R)?
Linear correlation (parametric)
165
What type of correlation is measured by the Spearman rank correlation coefficient (R_s)?
Monotonic correlation (non-parametric)
166
What type of data can Spearmans coefficient be used for that Pearsons cannot?
Ordinal data (as well as continuous) because it uses ranks instead of assumptions of normality
167
The test of correlation uses what hypotheses?
Null hypothesis is zero correlation, whereas the alternative is a 2-sided hypothesis (there is some sort of correlation)
168
What distribution tables and degrees of freedom are used in the test of correlation?
T-distribution tables and n-2 degrees of freedom
169
To investigate the correlation between two categorical variables, how must the data be presented first?
In contingency tables (cross tabulation format) so one variable at the top and one on the left of the table
170
To test the correlation between categorical variables, how are the expected frequencies due to chance calculated?
(row total multiplied by column total) divided by overall total
171
To test the correlation between categorical variables, what type of test is performed? (it compares the expected frequencies due to chance with observed frequencies)
Chi squared test (null hypothesis of no correlation)
172
To test the correlation between categorical variables, what is the degrees of freedom of the chi-squared test?
(number of rows -1) multiplied by (number of columns -1)
173
Do small sample sizes lead to large or small confidence inteverals?
Large
174
What does a relative risk or odds ratio greater or less than one indicate?
Greater than one indicates an exposure to be harmful (increased risk), whereas less than one indicates a protective effect (decreased risk)
175
When does confounding occur?
If both the exposure and disease are associated with a third variable (confounder)
176
What test is used to investigate the correlation between two ordinal variables and is it parametric or non-parametric?
Kappa and non-parametric
177
What is the number of observed agreements in the kappa test statistic equation?
The sum of the matching terms on the contingency table (should be along the y=-x line). For the percentage agreement, divide this number by the total)
178
For the kappa statistic, how do you calculate the expected agreements due to chance?
For each concordant pair on the contingency table, multiple the row and column totals and divide by overall total. For the percentage agreements, sum these values together and divide by total
179
What are Bland-Altman plots?
They measure the agreement between two methods measuring the same parameter. It plots the difference between the two measurements against the average measurement
180
What is a dose-response relationship?
It describes the change in effect (e.g. OR) caused by change in level of exposure
181
What are the 5 ways in which we can improve under research governance?
Method validation, quality improvement, service evaluation, audit, research
182
What system is required to apply for permissions and approvals in healthcare research in the UK?
Integrated Research Application System (IRAS)
183
What is a Trial Master File (TMF)?
The collection of documentation (sponsor's file plus each investigator site file) needed to evaluate the study in terms of conduct, integrity of data and compliance
184
What are observational studies?
Data are collected on one or more groups of subjects purely from a non-interfering observers point of view
185
What are experimental studies?
The researcher deliberately influences the clinical management of the subjects in order to investigate the outcome
186
What are case-control studies? This is a subtype of observational studies
Subjects with the disease are identified and compared to those without but who are otherwise comparable (controls). The past history of the groups is examined to determine their exposure to a particular risk
187
What are cohort (longitudinal) studies? This is a subtype of observational studies
Two groups are identified as one exposed and one not exposed to a risk. The groups are followed up over time and the occurrence of the disease in each group is identified
188
What are the disadvantages of cohort studies?
Rare diseases will need lots of subjects and make take a long time. Subjects might drop out. Might not be feasible or ethical
189
What are the advantages of cohort studies?
They do not rely on the accuracy of medical records
190
What are cross-sectional studies?
Surveys where the subject are contact once
191
What are 'within subjects trial' studies?
Subjects are assessed before and after an intervention
192
What are cross-over trial studies?
Subjects receive both intervention and control treatments in a randomised manner with a washout period in between
193
What are multi-factorial designs studies?
Studies that investigate the effects of more than one variable on the outcome
194
What is one of the main problems in clinical trials and how can it be reduced?
Selection bias and randomization is a process to reduce the effect of bias
195
What is simple randomization?
Each patient has an equal chance of being allocated to treatment given
196
What is block randomization?
Subject are randomly allocated to blocks which determine the order in which they receive the treatment
197
What is stratified randomization?
Subjects are first divided into subgroups according to a particular characteristic and randomization is balanced within the subgroups
198
Why is blinding done in studies?
It is done to reduce bias due to the observer's or subject's judgement
199
What is a single blind study?
The subject does not know what treatment they are receiving
200
What is a double blind study?
Both the subject and observer do not know which treatment is given
201
Why might selection bias after randomization occur?
Subjects in the treatment arm might drop out of the study if they experience problems
202
What is method validation?
Ensuring a proven method (e.g. lab test) is reliable within specific parameters
203
What is quality improvement?
Making local changes to improve local service
204
What is service evaluation?
An assessment about what standards do new or existing services meet and how are they performing. It often goes hand in hand with innovation (evaluation-innovation cycle)
205
What question is an audit addressing?
Is the service meeting a particular standard
206
What is research?
Generating new generalisable knowledge and requires formal approval. In a clinical context, it can introduce non-standard of care healthcare
207
What is the sponsor in the UK policy framework for health and social care research?
The organisation taking overall responsibility for proportionate, effective arrangements in place to set up, run and report a research project
208
Who is the chief investigator in the UK policy framework for health and social care research?
The overall lead researcher for a research project, responsible for the overall conduct of a research project
209
Who is the principal investigator in the UK policy framework for health and social care research?
They are responsible for the conduct at a research site with one PI per site
210
Who is the data controller in the UK policy framework for health and social care research?
The organisation responsible for the management and oversight of the data
211
What are some of the responsibilities of the sponsor of a research project?
Identifying and addressing poorly designed research. Ensures that the roles and responsibilities of all parties are agreed and recorded. They initiate a site.
212
What should be in the study protocol when planning research?
Study design, methods of data collection and data analysis, sample technique and sample size
213
What are the potential points on the timeline for research before it can start?
Sponsorship, grant application, REC approval, HRA approval, other regulatory approvals, local NHS Trust approvals
214
What is the purpose of a Research Ethics Committee (REC)?
To conduct an independent ethical review to ensure that participant safety is central and follows the principles of the Declaration of Helsinki
215
What are some reasons a trial might stop early?
For efficacy (not ethical to keep going if you know it works), and for safety concerns (if you know it isn't working or might not be safe)
216
What are the statistical issues with stopping a trial early?
Can bias the results, systematic over-estimation of benefits of intervention when stopped for efficacy, and precision of estimates of effect sizes will be poorer
217
What should be done after a trial has ended?
Dissemination of results, destruction of samples, archiving and update the public database
218
What is HRA approval?
Approval confirming a study is complaint with applicable regulations and standards, including a favourable opinion from a REC, a clinical trials authorisation or any other relevant approvals (eg radiation)
219
Who has to review an IRAS form regarding radiation?
An MPE to quantify the dose and risk and clinical radiation expert (CRE) to justify it
220
In regression models, the expected value of one random variable is presumed to be dependent on what?
One or more other variables
221
What is typically assumed as part of the design of a regression model?
The data pairs (x,y) are statistically independent
222
What is one of the most common methods to estimate the beta parameters of a regression model?
Find the maximum of the likelihood function
223
What does E[Y|x] look like for linear regression?
(B=beta) E[Y|x]=B0 + B1x + B2x + ...(doesn't have to be linear in x but typically it seems to be)
224
What is the coefficient of determination in regression analysis?
It is a measure of the fit of a linear model, where 1 implies a good fit and near 0 is a bad fit
225
What is the square root of the coefficient of determination?
The correlation coefficient
226
What is the simplest probability distribution function that we model Y for in linear regression?
Normal distribution
227
What type of outcomes is the dataset in logistic regression?
Binary
228
What range of values can the result of a logistic regression model be and why?
Between 0 and 1 because it is predicting the probability of success
229
What probability distribution functions could the binary variable Y in logistic regression be modelled by?
Bernoulli or binomial
230
Does the RHS of the equation for logistic regression look the same as the RHS of the equation for linear regression?
Yes, they both look like B0 + B1x + B2x + .... (B=beta)
231
What is the null hypothesis of ANOVA tests?
All means (a_i) are equal
232
Why is the term 'fixed effect' used for one-way fixed effect ANOVA?
The number of groups is fixed a priori as part of the design
233
What is survival analysis and another name for it?
Measuring the length of time for an event (such as death or failure) to occur. Also called 'time to event' analysis
234
What is S(0) equal to in survival analysis and what does this mean?
1 (no events have occurred e.g. everyone's alive)
235
What is S(infinity) equal to in survival analysis and what does this mean?
0 (all events possible have occurred e.g. everyone's dead)
236
In survival analysis, is S(t1) or S(t2) larger if t2 > t1?
S(t1)
237
What is the hazard function in survival analysis?
The probability of the event happening at a time given it has not happened yet
238
What is the difference between the survival function and hazard function?
The survival function is the probability of surviving at least to time t, whereas the hazard function is the conditional probability of dying at time t having survived to that time
239
When does censoring occur?
When subjects drop out of the study due to causes other than the cause of interest
240
When could administrative censoring occur for example?
Subjects are recruited until a certain number of events have occurred, or subjects are followed up for a fixed period of time
241
What are the main problems with censoring?
It can introduce bias in the data and ignoring censored data is wasteful and reduces the power of the study and leads to pessimistic estimation of survival
242
What is a Kaplan-Meier curve?
A graph of cumulative survival probabilities against time. It is a step-function, with each step indicating an event or censoring
243
What is the difference between intention-to-treat and per protocol analysis?
Intention-to-treat includes all participants in the statistics whether or not they withdrew. Per protocol only includes subjects that completed the study (not those that withdrew)
244
What is the problem with per protocol analysis?
It can make the treatment look better than is is if the number of dropouts is large, and a large number of dropouts in the treatment group can indicate problems with the treatment
245
What is 1-1 case-control matching?
For each case, a control subject is selected matched to the confounding variables
246
What is the McNemar's test null hypothesis?
No association in a 1-1 matched case-control study
247
What is a systematic review?
Identifies and critically appraises all research on a specific topic, and combines valid studies
248
What are the advantages of systematic reviews?
Rigorous pooling of results, increase confidence from small studies, may eradicate bias, can be updated, identifies areas where more research is needed
249
What are the disadvantages of systematic reviews?
Time consuming, expensive, may be affected by publication bias
250