Overall Flashcards

1
Q

What are the two major types of data?

A

Categorical (qualitative) and metric (quantitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two subtypes of categorical (qualitative) data?

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two subtypes of metric (quantitative) data?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does nominal data relate to?

A

It is used to label variables without any order or quantitative value. It usually relates to named things and there are no units of measurements. We allocate each value to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does ordinal data relate to?

A

The values can be meaningfully ordered and it is categorical because each value is assigned to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does discrete data relate to?

A

The values are distinct and can have units of measurements. The data can have finite values and they are integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does continuous data relate to?

A

Fractional numbers that result from measurement and they can have units of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a box (and whisker) plot, what are the adjacent values (defined in this specific course)?

A

Furthest away from the median but still within 1.5 times the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a box (and whisker) plot, what are the points outside the adjacent values?

A

Potential outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the interquartile range?

A

Upper quartile value (3/4) subtracted by the lower quartile value (1/4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the sample standard deviation?

A

Square root of the summation of (each value minus the mean) squared then divided by the sample size - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the residuals in the standard deviation equation?

A

Value minus the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the variance when the sample standard deviation is s?

A

s squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is skewness and how is it measured?

A

A measure of symmetry of a distribution and it is measured by the skewness coefficient that can vary between -1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the skewness coefficient for a symmetric distribution?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the skewness coefficient for a distribution with the mean to the left of the mode (most values are larger values in the range, long tail to the left in the negative direction)

A

Closer to -1 (left or negative skew)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the skewness coefficient for a distribution with the mean to the right of the mode (most values are smaller values in the range, long tail to the right in the positive direction)

A

Closer to +1 (right or positively skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Probability theory is based on set theory, what is contained in set S (called space)?

A

All sets are subsets of set S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the null set?

A

The set that contains no elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For experimental events, what is an event represented by and what is an impossible event?

A

An event is a set and an impossible event is the null set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If sets A and B are mutually exclusive, what is P(A+B) and the intersection of A and B?

A

P(A+B) = P(A)+P(B), and AB ={}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The conditional probability of A given B is defined as: P(A|B) =

A

P(AB)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For conditional probability, should there be a causal or temporal relation between A and B?

A

They may or may not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does it mean if conditional probability has no effect on the probability of an event P(A|B)=P(A)?

A

Events A and B are statistically independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If A and B are statistically independent, what is P(AB) equal to?

A

P(AB) = P(A)P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Bayes’ theorem? P(A|B) = ?

A

P(B|A)P(A) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does Bayesian probability include?

A

It incorporates any prior knowledge that a researcher might have about a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why is Bayes’ theorem also called the theorem of probability of causes?

A

A is the cause and B the effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a random variable (RV)?

A

A number X(z) assigned to every outcome z of an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the cumulative distribution function F(x)?

A

P{X<=x}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

If the cumulative distribution function F(X) is continuous, what is its derivative?

A

The probability density function f(x) = dF(x)/dx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If the cumulative distribution function is discrete, what is f(x)?

A

A discrete distribution function, where f(x) = the sum of P{X=x_i}delta{x-x_i}, where delta is an impulse function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Can we calculate P(X=x) for a continuous cumulative distribution function and what do we do?

A

No because P(X=x) = 0 when continuous. We have to calculate the probability that X lies in a small interval around x by integrating f(x) across a small interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the expectation or mean of a random variable when the cumulative distribution function is continuous?

A

The integral of xf(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the expectation or mean of a random variable when the cumulative distribution function is discrete?

A

The sum of x_i multiplied by P(X=x_i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the variance of a random variable in terms of the mean or expectation (E)?

A

sigma squared = E[X^2] - E[X]^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

When do the sample-based approximations of mean and variance converge to the theoretical quantities?

A

When the sample size tends to infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Are probability mass functions for discrete or continuous variables?

A

Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Are probability distribution functions for discrete or continuous variables?

A

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the three types of probability mass functions that this course deals with?

A

Bernoulli, binomial and uniform (discrete)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the four types of probability distribution functions that this course deals with?

A

Normal (gaussian), poisson, exponential, uniform (continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

The Bernoulli distribution is a special case of what type of distribution and what is the special case?

A

Binomial distribution with a single trial (n=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the outcome of a Bernoulli triart with outcome 0 or 1l?

A

A single experiment with outcome 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

For a Bernoulli distribution, what is the probability of X=1 (P(1))?

A

p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

For a Bernoulli distribution, what is the probability of X=0 (P(0))?

A

1-p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the mean (expected value) of a Bernoulli random variable?

A

p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the variance of a Bernoulli random variable?

A

p(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Since the Bernoulli distribution is a special case of the binomial distribution, what could the binomial distribution be thought of as?

A

The number of successes in a sequence of independent Bernoulli trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the parameters of the binomial distribution?

A

n = number of trials, p = probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How are the Bernoulli probability distribution random variables denoted?

A

X ~ Bernoulli(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How are the Binomial probability distribution random variables denoted?

A

X ~ B(n,p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the mean (expected value) of a binomially distributed random variable?

A

np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is the variance of a binomially distributed random variable?

A

np(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is the discrete uniform distribution?

A

A finite number n of outcome values are equally likely to be observed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

What is the probability of every one of the n outcome values in a discrete uniform distribution?

A

1/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

What is the mean (expected value) of a discrete uniform distribution?

A

(n+1)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

What is the continuous uniform distribution?

A

A continuous random variable that is likely to take any value between two states bounds a and b

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

How are the continuous uniform probability distribution random variables denoted and what do the parameters mean?

A

X~U(a,b), where a and b are the bounds (minimum and maximum values) with a<b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

What is the probability p(x) of a variable under the continuous uniform distribution?

A

1/(b-a)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

What is the mean (expected value) of a random variable that is distributed via the continuous normal distribution?

A

(a+b)/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

How are the normal probability distribution random variables denoted and what do the parameters mean?

A

X ~ N(mu, sigma squared), where mu = mean and sigma squared = variance (sigma = standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

How can a binomial distribution be approximated by a normal distribution?

A

B(n,p) approx = N(np, np(1-p))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Under what circumstances can the normal approximation to the binomial distribution be used?

A

np and n(1-p) > 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

What is the standard normal distribution?

A

A normal distribution with a mean of 0 and a standard deviation of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

How is the standard normal probability distribution random variables denoted?

A

Z ~ N(0,1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

Every normal distribution is a version of standard normal distribution, whose domain is stretched by what factor and translated by what value?

A

Stretched by the value of the standard deviation and translated by the mean value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

How do you convert the random variable X for the normal distribution to the random variable Z for the standard normal distribution?

A

Z = (X - mu) / sigma (standard deviation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

What denotes the cumulative distribution function (cdf) of the standard normal distribution P(Z<=z)?

A

Capital phi(z)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

What is the Q-Q plot (or quantile or normal probability plot)?

A

A plot of the sorted values from the data set against the expected value of the corresponding quantiles from the standard normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

What is the normal probability plot (Q-Q plot) used for?

A

It is used to visually assess the normality of data i.e. it compares two probability distributions by plotting their quantiles against each other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

If the two distributions being compared are similar, what line will the points on the Q-Q plot (normal probability plot) lie on?

A

y=x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

What distribution does the normal probability plot use?

A

The z-distribution (standard normal)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

What is the ladder of powers?

A

An approach to change the shape of a skewed distribution so that it becomes normal or nearly normal with power transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

If X is a random variable with mean mu and variance sigma squared, what is the mean and variance of Y=aX+b?

A

Mean (Y) = a * mu + b and variance (Y) = a squared * sigma squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

If X1, X2,…, Xn are independent random variables with mean mu1… and variances sigma 1 squared…., what is the mean and variance of Y = X1 + X2 +… + Xn?

A

Mean(Y) = mu1 + mu2 +…+ mu_n and variance (Y) = sigma 1squared + sigma 2squared +…+ sigma n squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

If X1 and X2 are independent random variable with means mu2 and mu2 and variances sigma 1 squared…, what is the mean and variance of Y= = X1-X2?

A

Mean(Y) = mu1 - mu2 and variance(Y) = sigma 1 squared + sigma 2 squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

If independent random variables X1, X2, … Xn are combined algebraically (eg Y=X1-X2 or Y=X1+X2+…+Xn) and they all X variables have normal distributions, what distribution does Y have?

A

A normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

How are the poisson probability distribution random variables denoted and what do the parameters mean?

A

X ~ Poisson(mu), where mu is the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

What is the variance of a poisson distribution random variable?

A

mu (= to the mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
80
Q

How can the binomial distribution be approximated by the poisson distribution?

A

B(n,p) approx = Poisson (np)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
81
Q

When can you use the Poisson approximation to the binomial distribution?

A

If n is large (n>50) and p is small (p<0.05)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
82
Q

How are the exponential probability distribution random variables denoted and what do the parameters mean?

A

T ~ M(lambda), where lambda is more than 0 and it is called the rate parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
83
Q

What is the exponential distribution for?

A

The distance (can be any measure or units, eg time) between events in a Poisson point process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
84
Q

What is the mean of the exponential distribution?

A

1/lambda

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
85
Q

What is the Poisson process?

A

A model for the occurrence of events in continuous time. It is a counting process for events that appear to happen at a certain rate but completely at random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
86
Q

What are the assumptions of the poisson process?

A

Events occur singly, the rate of occurrence of events remains constant and the incidence of future events is independent of the past

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
87
Q

If a Poisson process can be modelled with event that occur at random with a rate of lambda and time t, what two random variables can be used which have two different probability distributions?

A

X ~ Poisson(lambda * t), which models the number of events at time t and T ~ M(lambda), which models the waiting time between events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
88
Q

What is a confidence interval?

A

A random interval which contains the parameter being estimated with the probability of the confidence level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
89
Q

If there is a 95% confidence interval and an experiment is repeated 100 times, what can we say about the results?

A

The confidence intervals would be expected to include the true value on 95 occasions

90
Q

To calculate the confidence interval for the mean mu of a population, using a random sample of size n (large n), what values are required?

A

The sample mean (x bar), the sample standard deviation (s), the number of items in the sample (n) and z, the (1-alpha/2) quantile of the standard normal distribution

91
Q

When calculating the confidence interval, the confidence level is required. What is the equation for the confidence level?

A

100(1-alpha)%, where alpha is used to calculate the z in the confidence interval equation

92
Q

What is the chi-squared distribution?

A

The sum of the squares of independent standard normal distributions

93
Q

How are the chi-squared probability distribution random variables denoted and what do the parameters mean?

A

W ~ chi-squared X^2 (v), where v is the mean and number of independent standard normal distributions

94
Q

When is the F distribution used?

A

For the ratio of their sample variances S1^2/S2^2 with v1 and v2 degrees of freedom. This is for two independent samples with normal distribution and degrees of freedom v1 and v2

95
Q

Is the alternative hypothesis one or two sided?

A

It can be either depending on the null hypothesis (eg two sided is required if the null is drug A = drug B but one sided if the null is drug A is not effective)

96
Q

Can a null hypothesis be accepted or proved?

A

No it can only be rejected/refuted

97
Q

What does rejecting the null hypothesis do?

A

It provides evidence in favour of the alternative hypothesis

98
Q

What is a directional hypothesis?

A

A hypothesis that predicts the direction of a relationship or difference between two variables. Also known as a one-tailed hypothesis

99
Q

What is the difference between a one- and two-tailed test in terms of the hypothesis?

A

A one-tailed test looks for an increase or decrease in a parameter, whereas a two-tailed test looks for a change in parameter

100
Q

What is the significance level?

A

If the null hypothesis is true, the significance level is the proportion of the repeated experiments in which the null hypothesis will be falsely rejected

101
Q

What is type I error?

A

Rejecting the null hypothesis when it is true (called a false positive)

102
Q

What is type II error?

A

Not rejecting the null hypothesis when it is false (false negative)

103
Q

What significance level is typically set and what can it be referred to as when considering type I error?

A

0.05 (5%) and alpha level

104
Q

What symbol denotes type II error and what is it usually set to?

A

Gamma and 0.8

105
Q

When should the students 1-sample t-test be used?

A

For a sample of size n with normal distribution with values for the mean and standard deviation. The t test can be used to test a null hypothesis: mu = mu_nought (value). (Compare paired samples)

106
Q

What is the degrees of freedom of a students 1-sample t-test?

A

n-1, where n is the sample size

107
Q

What is the degrees of freedom of a students 2-sample t-test?

A

n1 + n2 - 2, where n1 and n2 are the sample sizes

108
Q

How do you use the t-distribution tables for a students t-test question?

A

Calculate the test statistic (either 1 or 2 sample), determine the degrees of freedom, go to the correct row on the table that matches the degrees of freedom and determine which quantile the test statistic matches with

109
Q

After you get a quantile from the t-distribution tables, how do you determine the p value from this?

A

1 minus the quantile for one sided tests and double this for two sided

110
Q

What is the definition of the P value?

A

The probability of having observed our data or more extreme given the null hypothesis is true

111
Q

How do you interpret a p value above 0.1?

A

Little evidence against the null hypothesis

112
Q

How do you interpret a p value between 0.1 and 0.05?

A

Weak evidence against the null hypothesis

113
Q

How do you interpret a p value between 0.01 and 0.05?

A

Moderate evidence against the null hypothesis

114
Q

How do you interpret a p value below 0.01?

A

Strong evidence against the null hypothesis

115
Q

Why is misinterpretation a problem with p values?

A

It can sometimes be misinterpreted as meaning the probability of the null hypothesis being correct or the probability that the observed effect is not real

116
Q

Why is publication bias a problem with p values?

A

Research findings with p more than 0.05 sometimes do not get published

117
Q

Why is over-reliance a problem with p values?

A

Researchers sometimes change their conclusions radically depending on which side of 0.05 the p value is

118
Q

When should the students 2-sample t-test be used?

A

For two samples sizes n1 and n2 with values for the means (x bar 1 and x bar 2) and standard deviations. The t test can be used to test a null hypothesis: x bar 1 = x bar 2 (means of both samples are equal) (compare two unrelated samples)

119
Q

What assumptions have to be in place for a students 2-sample t-test?

A

Variation in each population can be modelled by a normal distribution. Samples are independent. Populations variances are equal (differ by a factor of < 3)

120
Q

How are proportion data modelled and with what parameters?

A

A binomial distribution with n (number of samples) and p (probability of a success) (remember this can be approximated by a normal distribution)

121
Q

The test statistic for the difference in proportions includes d, what is this parameter?

A

The hypothesised difference

122
Q

What assumption needs to be met for the differences in proportions test to be applied?

A

Normal distribution e.g. np and n(1-p) > 5 this must be followed

123
Q

For a differences in proportions test, is the null hypothesis one or two tailed and what type of distribution does it have?

A

Two-tailed and it is the standard normal distribution (z distribution)

124
Q

To detect a statistically significant difference between the means of two groups, what calculation needs to be done?

A

The study sample size required (sample size per group)

125
Q

What parameters are used in the equation for the sample size for the difference in means?

A

The standard deviation for the underlying population sigma, the hypothesised difference between two groups (d), the quantile values on the standard normal distribution table that relate to (1 - half the significance level) and the power

126
Q

The sample size equation for difference in means can be rearranged to determine what instead?

A

To calculate the size of difference (d) that could be detected as statistically significant given the sample size per group

127
Q

To detect a statistically significant difference between the proportions of two groups, what calculation needs to be done?

A

The study sample size required

128
Q

The equations for the sample size for difference in means and difference in proportions contain the same parameters except one difference in each, what is this difference?

A

The difference in means version has the standard deviation, whereas the difference in proportions version has pi nought, which is the average proportion of the two groups

129
Q

The equations for the sample size for difference in means and difference in proportions include a quantile that relates to power. What is the power and what does a higher value of it mean?

A

The power is the probability of detecting a significant difference when one exists.

130
Q

What is power analysis?

A

The process of determining the sample size for a research study to detect a significant difference in means of proportions

131
Q

When are non-parametric tests used?

A

There is no assumption that the underlying distribution comes from a specific family

132
Q

What is the non-parametric version of the 1-sample students t-test to compare paired samples?

A

Wilcoxon sign rank test

133
Q

What is the non-parametric version of the 2-sample students t-test to compare two unrelated samples?

A

Mann-Whitney test

134
Q

Instead of using actual values, what does the Wilcoxon signed rank test use?

A

Data ranks

135
Q

In what cases could the Wilcoxon signed rank test be used instead of the 1-sample t-test?

A

The data is skewed or the sample size is too small

136
Q

The test statistic (W_+) is for the Wilcoxon signed rank test to approximate the sum of positive ranks as an approximate normal distribution. What is n in the equations and when is the approximation adequate?

A

n is the sample size after deletion and it should be 16 or above (in handbook)

137
Q

What does the Mann-Whitney test assume about two samples?

A

They are uncorrelated and independent (and no assumption of normal distribution)

138
Q

The test statistic (U_A) is for the Mann-Whitney test to approximate the sum of ranks for sample A as an approximate normal distribution. What is n_A and n_B in the equations and when is the approximation adequate?

A

n_A and n_B are the respective samples and each sample size should be 8 or above (in handbook)

139
Q

When is the chi-squared distribution used?

A

The chi-squared tests for goodness of fit of an observed distribution (of observed frequencies) to a theoretical one

140
Q

What is p in the equation for the degrees of freedom (k-p-1) for the chi-squared goodness of fit test?

A

p is the number of estimated parameters

141
Q

What is the null and alternative hypotheses for a chi-squared goodness of fit test?

A

Null: data is a good fit to the model. Alternative: the difference is too large (as squared so can’t be negative)

142
Q

Are chi-squared goodness of fit tests 1-sided or 2-sided?

143
Q

For a chi-squared goodness of fit test, what value should the expected frequency in each category be at least?

A

At least 5

144
Q

How is the students t-test a special case of an ANOVA?

A

Students t-test is comparing means between two groups, whereas ANOVA is to compare means of two or more groups

145
Q

What is an assumption for both parametric and non-parametric ANOVA?

A

Statistical independence of cases within each group

146
Q

What additional assumptions are required for a parametric ANOVA?

A

Normality (distribution in each group is normal) and equality of variances (homoscedasticity), so the variance in each group are assumed to be the same (can differ by a factor of 3)

147
Q

What is the parametric and non-parametric tests to compare more than two sets of observations on the same sample?

A

Parametric: one way ANOVA. Non-parametric: Kruskall-Wallis

148
Q

What is the parametric and non-parametric tests to compare more than two sets of observations on a single sample under different conditions?

A

Parametric: Two way ANOVA. Non-parametric: Friedman

149
Q

Receiver operating characteristic (ROC) analysis is part of what theory?

A

Signal detection theory

150
Q

What is our purpose for ROC (receiver operating characteristic) analysis?

A

To assess the performance of diagnostic tests

151
Q

What is the value called where above it we consider the test to be abnormal and below we consider the test to be normal?

A

A decision threshold

152
Q

What is the true positive rate also known as? This is the probability of a positive instance given that the disease is present

A

Sensitivity

153
Q

What is the true negative rate also known as? This is the probability of a negative instance given that the disease is not present

A

Specificity

154
Q

What is the false positive rate also known as and how does it relate to specificity? This is the probability of a positive instance given that the disease is not present

A

Type I error and 1 - specificity

155
Q

What is the false negative rate also known as and how does this relate to sensitivity? This is the probability of a negative instance given that the disease is present

A

Type II error and 1 - sensitivity

156
Q

In what way is there a trade off between sensitivity and specificity in ROC analysis?

A

We can improve the sensitivity by moving the decision threshold to a higher value (less strict criteria for positive), or we can improve the specificity by moving the decision threshold to a lower value (more strict criteria for positive)

157
Q

What is a ROC curve?

A

A graph of sensitivity against 1-specificity (type I error) so true positive rate against false positive rate

158
Q

What demonstrates an accurate test on a ROC curve?

A

If the curve is closer to the left hand and top border of the ROC space

159
Q

What is a scalar measure of the performance of a test on a ROC curve?

A

The area under the curve, with an area of 1 being a perfect test

160
Q

For ROC analysis, reliability is also known by what two names that are calculable?

A

Positive predictive value and negative predictive value

161
Q

In ROC analysis, what is reliability and does accuracy directly imply reliability?

A

How reliable is this positive result and no

162
Q

In ROC analysis, an alternative definition of reliability uses Bayes theorem in what way?

A

For the probabilities of P(disease|positive) and P(-disease|negative), which is the opposite way around to the sensitivity and specificity probabilities

163
Q

When are two variables said to be correlated?

A

If knowing the value of one of the variables tells you something about the value of the other

164
Q

What type of correlation is measured by the Pearson correlation coefficient (R)?

A

Linear correlation (parametric)

165
Q

What type of correlation is measured by the Spearman rank correlation coefficient (R_s)?

A

Monotonic correlation (non-parametric)

166
Q

What type of data can Spearmans coefficient be used for that Pearsons cannot?

A

Ordinal data (as well as continuous) because it uses ranks instead of assumptions of normality

167
Q

The test of correlation uses what hypotheses?

A

Null hypothesis is zero correlation, whereas the alternative is a 2-sided hypothesis (there is some sort of correlation)

168
Q

What distribution tables and degrees of freedom are used in the test of correlation?

A

T-distribution tables and n-2 degrees of freedom

169
Q

To investigate the correlation between two categorical variables, how must the data be presented first?

A

In contingency tables (cross tabulation format) so one variable at the top and one on the left of the table

170
Q

To test the correlation between categorical variables, how are the expected frequencies due to chance calculated?

A

(row total multiplied by column total) divided by overall total

171
Q

To test the correlation between categorical variables, what type of test is performed? (it compares the expected frequencies due to chance with observed frequencies)

A

Chi squared test (null hypothesis of no correlation)

172
Q

To test the correlation between categorical variables, what is the degrees of freedom of the chi-squared test?

A

(number of rows -1) multiplied by (number of columns -1)

173
Q

Do small sample sizes lead to large or small confidence inteverals?

174
Q

What does a relative risk or odds ratio greater or less than one indicate?

A

Greater than one indicates an exposure to be harmful (increased risk), whereas less than one indicates a protective effect (decreased risk)

175
Q

When does confounding occur?

A

If both the exposure and disease are associated with a third variable (confounder)

176
Q

What test is used to investigate the correlation between two ordinal variables and is it parametric or non-parametric?

A

Kappa and non-parametric

177
Q

What is the number of observed agreements in the kappa test statistic equation?

A

The sum of the matching terms on the contingency table (should be along the y=-x line). For the percentage agreement, divide this number by the total)

178
Q

For the kappa statistic, how do you calculate the expected agreements due to chance?

A

For each concordant pair on the contingency table, multiple the row and column totals and divide by overall total. For the percentage agreements, sum these values together and divide by total

179
Q

What are Bland-Altman plots?

A

They measure the agreement between two methods measuring the same parameter. It plots the difference between the two measurements against the average measurement

180
Q

What is a dose-response relationship?

A

It describes the change in effect (e.g. OR) caused by change in level of exposure

181
Q

What are the 5 ways in which we can improve under research governance?

A

Method validation, quality improvement, service evaluation, audit, research

182
Q

What system is required to apply for permissions and approvals in healthcare research in the UK?

A

Integrated Research Application System (IRAS)

183
Q

What is a Trial Master File (TMF)?

A

The collection of documentation (sponsor’s file plus each investigator site file) needed to evaluate the study in terms of conduct, integrity of data and compliance

184
Q

What are observational studies?

A

Data are collected on one or more groups of subjects purely from a non-interfering observers point of view

185
Q

What are experimental studies?

A

The researcher deliberately influences the clinical management of the subjects in order to investigate the outcome

186
Q

What are case-control studies? This is a subtype of observational studies

A

Subjects with the disease are identified and compared to those without but who are otherwise comparable (controls). The past history of the groups is examined to determine their exposure to a particular risk

187
Q

What are cohort (longitudinal) studies? This is a subtype of observational studies

A

Two groups are identified as one exposed and one not exposed to a risk. The groups are followed up over time and the occurrence of the disease in each group is identified

188
Q

What are the disadvantages of cohort studies?

A

Rare diseases will need lots of subjects and make take a long time. Subjects might drop out. Might not be feasible or ethical

189
Q

What are the advantages of cohort studies?

A

They do not rely on the accuracy of medical records

190
Q

What are cross-sectional studies?

A

Surveys where the subject are contact once

191
Q

What are ‘within subjects trial’ studies?

A

Subjects are assessed before and after an intervention

192
Q

What are cross-over trial studies?

A

Subjects receive both intervention and control treatments in a randomised manner with a washout period in between

193
Q

What are multi-factorial designs studies?

A

Studies that investigate the effects of more than one variable on the outcome

194
Q

What is one of the main problems in clinical trials and how can it be reduced?

A

Selection bias and randomization is a process to reduce the effect of bias

195
Q

What is simple randomization?

A

Each patient has an equal chance of being allocated to treatment given

196
Q

What is block randomization?

A

Subject are randomly allocated to blocks which determine the order in which they receive the treatment

197
Q

What is stratified randomization?

A

Subjects are first divided into subgroups according to a particular characteristic and randomization is balanced within the subgroups

198
Q

Why is blinding done in studies?

A

It is done to reduce bias due to the observer’s or subject’s judgement

199
Q

What is a single blind study?

A

The subject does not know what treatment they are receiving

200
Q

What is a double blind study?

A

Both the subject and observer do not know which treatment is given

201
Q

Why might selection bias after randomization occur?

A

Subjects in the treatment arm might drop out of the study if they experience problems

202
Q

What is method validation?

A

Ensuring a proven method (e.g. lab test) is reliable within specific parameters

203
Q

What is quality improvement?

A

Making local changes to improve local service

204
Q

What is service evaluation?

A

An assessment about what standards do new or existing services meet and how are they performing. It often goes hand in hand with innovation (evaluation-innovation cycle)

205
Q

What question is an audit addressing?

A

Is the service meeting a particular standard

206
Q

What is research?

A

Generating new generalisable knowledge and requires formal approval. In a clinical context, it can introduce non-standard of care healthcare

207
Q

What is the sponsor in the UK policy framework for health and social care research?

A

The organisation taking overall responsibility for proportionate, effective arrangements in place to set up, run and report a research project

208
Q

Who is the chief investigator in the UK policy framework for health and social care research?

A

The overall lead researcher for a research project, responsible for the overall conduct of a research project

209
Q

Who is the principal investigator in the UK policy framework for health and social care research?

A

They are responsible for the conduct at a research site with one PI per site

210
Q

Who is the data controller in the UK policy framework for health and social care research?

A

The organisation responsible for the management and oversight of the data

211
Q

What are some of the responsibilities of the sponsor of a research project?

A

Identifying and addressing poorly designed research. Ensures that the roles and responsibilities of all parties are agreed and recorded. They initiate a site.

212
Q

What should be in the study protocol when planning research?

A

Study design, methods of data collection and data analysis, sample technique and sample size

213
Q

What are the potential points on the timeline for research before it can start?

A

Sponsorship, grant application, REC approval, HRA approval, other regulatory approvals, local NHS Trust approvals

214
Q

What is the purpose of a Research Ethics Committee (REC)?

A

To conduct an independent ethical review to ensure that participant safety is central and follows the principles of the Declaration of Helsinki

215
Q

What are some reasons a trial might stop early?

A

For efficacy (not ethical to keep going if you know it works), and for safety concerns (if you know it isn’t working or might not be safe)

216
Q

What are the statistical issues with stopping a trial early?

A

Can bias the results, systematic over-estimation of benefits of intervention when stopped for efficacy, and precision of estimates of effect sizes will be poorer

217
Q

What should be done after a trial has ended?

A

Dissemination of results, destruction of samples, archiving and update the public database

218
Q

What is HRA approval?

A

Approval confirming a study is complaint with applicable regulations and standards, including a favourable opinion from a REC, a clinical trials authorisation or any other relevant approvals (eg radiation)

219
Q

Who has to review an IRAS form regarding radiation?

A

An MPE to quantify the dose and risk and clinical radiation expert (CRE) to justify it