Overall Flashcards

1
Q

What are the two major types of data?

A

Categorical (qualitative) and metric (quantitative)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two subtypes of categorical (qualitative) data?

A

Nominal and ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the two subtypes of metric (quantitative) data?

A

Continuous and discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does nominal data relate to?

A

It is used to label variables without any order or quantitative value. It usually relates to named things and there are no units of measurements. We allocate each value to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does ordinal data relate to?

A

The values can be meaningfully ordered and it is categorical because each value is assigned to a specific category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does discrete data relate to?

A

The values are distinct and can have units of measurements. The data can have finite values and they are integers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does continuous data relate to?

A

Fractional numbers that result from measurement and they can have units of measurement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

In a box (and whisker) plot, what are the adjacent values (defined in this specific course)?

A

Furthest away from the median but still within 1.5 times the interquartile range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a box (and whisker) plot, what are the points outside the adjacent values?

A

Potential outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the interquartile range?

A

Upper quartile value (3/4) subtracted by the lower quartile value (1/4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the sample standard deviation?

A

Square root of the summation of (each value minus the mean) squared then divided by the sample size - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the residuals in the standard deviation equation?

A

Value minus the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the variance when the sample standard deviation is s?

A

s squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is skewness and how is it measured?

A

A measure of symmetry of a distribution and it is measured by the skewness coefficient that can vary between -1 and +1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the skewness coefficient for a symmetric distribution?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the skewness coefficient for a distribution with the mean to the left of the mode (most values are larger values in the range, long tail to the left in the negative direction)

A

Closer to -1 (left or negative skew)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the skewness coefficient for a distribution with the mean to the right of the mode (most values are smaller values in the range, long tail to the right in the positive direction)

A

Closer to +1 (right or positively skewed)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Probability theory is based on set theory, what is contained in set S (called space)?

A

All sets are subsets of set S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the null set?

A

The set that contains no elements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For experimental events, what is an event represented by and what is an impossible event?

A

An event is a set and an impossible event is the null set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

If sets A and B are mutually exclusive, what is P(A+B) and the intersection of A and B?

A

P(A+B) = P(A)+P(B), and AB ={}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

The conditional probability of A given B is defined as: P(A|B) =

A

P(AB)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

For conditional probability, should there be a causal or temporal relation between A and B?

A

They may or may not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What does it mean if conditional probability has no effect on the probability of an event P(A|B)=P(A)?

A

Events A and B are statistically independent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

If A and B are statistically independent, what is P(AB) equal to?

A

P(AB) = P(A)P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Bayes’ theorem? P(A|B) = ?

A

P(B|A)P(A) / P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What does Bayesian probability include?

A

It incorporates any prior knowledge that a researcher might have about a hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Why is Bayes’ theorem also called the theorem of probability of causes?

A

A is the cause and B the effect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is a random variable (RV)?

A

A number X(z) assigned to every outcome z of an experiment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is the cumulative distribution function F(x)?

A

P{X<=x}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

If the cumulative distribution function F(X) is continuous, what is its derivative?

A

The probability density function f(x) = dF(x)/dx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

If the cumulative distribution function is discrete, what is f(x)?

A

A discrete distribution function, where f(x) = the sum of P{X=x_i}delta{x-x_i}, where delta is an impulse function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Can we calculate P(X=x) for a continuous cumulative distribution function and what do we do?

A

No because P(X=x) = 0 when continuous. We have to calculate the probability that X lies in a small interval around x by integrating f(x) across a small interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is the expectation or mean of a random variable when the cumulative distribution function is continuous?

A

The integral of xf(x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is the expectation or mean of a random variable when the cumulative distribution function is discrete?

A

The sum of x_i multiplied by P(X=x_i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is the variance of a random variable in terms of the mean or expectation (E)?

A

sigma squared = E[X^2] - E[X]^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

When do the sample-based approximations of mean and variance converge to the theoretical quantities?

A

When the sample size tends to infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Are probability mass functions for discrete or continuous variables?

A

Discrete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Are probability distribution functions for discrete or continuous variables?

A

Continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the three types of probability mass functions that this course deals with?

A

Bernoulli, binomial and uniform (discrete)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the four types of probability distribution functions that this course deals with?

A

Normal (gaussian), poisson, exponential, uniform (continuous)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

The Bernoulli distribution is a special case of what type of distribution and what is the special case?

A

Binomial distribution with a single trial (n=1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the outcome of a Bernoulli triart with outcome 0 or 1l?

A

A single experiment with outcome 0 or 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

For a Bernoulli distribution, what is the probability of X=1 (P(1))?

A

p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

For a Bernoulli distribution, what is the probability of X=0 (P(0))?

A

1-p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the mean (expected value) of a Bernoulli random variable?

A

p

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is the variance of a Bernoulli random variable?

A

p(1-p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Since the Bernoulli distribution is a special case of the binomial distribution, what could the binomial distribution be thought of as?

A

The number of successes in a sequence of independent Bernoulli trials

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

What are the parameters of the binomial distribution?

A

n = number of trials, p = probability of success

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How are the Bernoulli probability distribution random variables denoted?

A

X ~ Bernoulli(p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How are the Binomial probability distribution random variables denoted?

A

X ~ B(n,p)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is the mean (expected value) of a binomially distributed random variable?

A

np

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is the variance of a binomially distributed random variable?

A

np(1-p)

54
Q

What is the discrete uniform distribution?

A

A finite number n of outcome values are equally likely to be observed

55
Q

What is the probability of every one of the n outcome values in a discrete uniform distribution?

A

1/n

56
Q

What is the mean (expected value) of a discrete uniform distribution?

A

(n+1)/2

57
Q

What is the continuous uniform distribution?

A

A continuous random variable that is likely to take any value between two states bounds a and b

58
Q

How are the continuous uniform probability distribution random variables denoted and what do the parameters mean?

A

X~U(a,b), where a and b are the bounds (minimum and maximum values) with a<b)

59
Q

What is the probability p(x) of a variable under the continuous uniform distribution?

A

1/(b-a)

60
Q

What is the mean (expected value) of a random variable that is distributed via the continuous normal distribution?

A

(a+b)/2

61
Q

How are the normal probability distribution random variables denoted and what do the parameters mean?

A

X ~ N(mu, sigma squared), where mu = mean and sigma squared = variance (sigma = standard deviation)

62
Q

How can a binomial distribution be approximated by a normal distribution?

A

B(n,p) approx = N(np, np(1-p))

63
Q

Under what circumstances can the normal approximation to the binomial distribution be used?

A

np and n(1-p) > 5

64
Q

What is the standard normal distribution?

A

A normal distribution with a mean of 0 and a standard deviation of 1

65
Q

How is the standard normal probability distribution random variables denoted?

A

Z ~ N(0,1)

66
Q

Every normal distribution is a version of standard normal distribution, whose domain is stretched by what factor and translated by what value?

A

Stretched by the value of the standard deviation and translated by the mean value

67
Q

How do you convert the random variable X for the normal distribution to the random variable Z for the standard normal distribution?

A

Z = (X - mu) / sigma (standard deviation)

68
Q

What denotes the cumulative distribution function (cdf) of the standard normal distribution P(Z<=z)?

A

Capital phi(z)

69
Q

What is the Q-Q plot (or quantile or normal probability plot)?

A

A plot of the sorted values from the data set against the expected value of the corresponding quantiles from the standard normal distribution

70
Q

What is the normal probability plot (Q-Q plot) used for?

A

It is used to visually assess the normality of data i.e. it compares two probability distributions by plotting their quantiles against each other

71
Q

If the two distributions being compared are similar, what line will the points on the Q-Q plot (normal probability plot) lie on?

A

y=x

72
Q

What distribution does the normal probability plot use?

A

The z-distribution (standard normal)

73
Q

What is the ladder of powers?

A

An approach to change the shape of a skewed distribution so that it becomes normal or nearly normal with power transformations

74
Q

If X is a random variable with mean mu and variance sigma squared, what is the mean and variance of Y=aX+b?

A

Mean (Y) = a * mu + b and variance (Y) = a squared * sigma squared

75
Q

If X1, X2,…, Xn are independent random variables with mean mu1… and variances sigma 1 squared…., what is the mean and variance of Y = X1 + X2 +… + Xn?

A

Mean(Y) = mu1 + mu2 +…+ mu_n and variance (Y) = sigma 1squared + sigma 2squared +…+ sigma n squared

76
Q

If X1 and X2 are independent random variable with means mu2 and mu2 and variances sigma 1 squared…, what is the mean and variance of Y= = X1-X2?

A

Mean(Y) = mu1 - mu2 and variance(Y) = sigma 1 squared + sigma 2 squared

77
Q

If independent random variables X1, X2, … Xn are combined algebraically (eg Y=X1-X2 or Y=X1+X2+…+Xn) and they all X variables have normal distributions, what distribution does Y have?

A

A normal distribution

78
Q

How are the poisson probability distribution random variables denoted and what do the parameters mean?

A

X ~ Poisson(mu), where mu is the mean

79
Q

What is the variance of a poisson distribution random variable?

A

mu (= to the mean)

80
Q

How can the binomial distribution be approximated by the poisson distribution?

A

B(n,p) approx = Poisson (np)

81
Q

When can you use the Poisson approximation to the binomial distribution?

A

If n is large (n>50) and p is small (p<0.05)

82
Q

How are the exponential probability distribution random variables denoted and what do the parameters mean?

A

T ~ M(lambda), where lambda is more than 0 and it is called the rate parameter

83
Q

What is the exponential distribution for?

A

The distance (can be any measure or units, eg time) between events in a Poisson point process

84
Q

What is the mean of the exponential distribution?

A

1/lambda

85
Q

What is the Poisson process?

A

A model for the occurrence of events in continuous time. It is a counting process for events that appear to happen at a certain rate but completely at random

86
Q

What are the assumptions of the poisson process?

A

Events occur singly, the rate of occurrence of events remains constant and the incidence of future events is independent of the past

87
Q

If a Poisson process can be modelled with event that occur at random with a rate of lambda and time t, what two random variables can be used which have two different probability distributions?

A

X ~ Poisson(lambda * t), which models the number of events at time t and T ~ M(lambda), which models the waiting time between events

88
Q

What is a confidence interval?

A

A random interval which contains the parameter being estimated with the probability of the confidence level

89
Q

If there is a 95% confidence interval and an experiment is repeated 100 times, what can we say about the results?

A

The confidence intervals would be expected to include the true value on 95 occasions

90
Q

To calculate the confidence interval for the mean mu of a population, using a random sample of size n (large n), what values are required?

A

The sample mean (x bar), the sample standard deviation (s), the number of items in the sample (n) and z, the (1-alpha/2) quantile of the standard normal distribution

91
Q

When calculating the confidence interval, the confidence level is required. What is the equation for the confidence level?

A

100(1-alpha)%, where alpha is used to calculate the z in the confidence interval equation

92
Q

What is the chi-squared distribution?

A

The sum of the squares of independent standard normal distributions

93
Q

How are the chi-squared probability distribution random variables denoted and what do the parameters mean?

A

W ~ chi-squared X^2 (v), where v is the mean and number of independent standard normal distributions

94
Q

When is the F distribution used?

A

For the ratio of their sample variances S1^2/S2^2 with v1 and v2 degrees of freedom. This is for two independent samples with normal distribution and degrees of freedom v1 and v2

95
Q

Is the alternative hypothesis one or two sided?

A

It can be either depending on the null hypothesis (eg two sided is required if the null is drug A = drug B but one sided if the null is drug A is not effective)

96
Q

Can a null hypothesis be accepted or proved?

A

No it can only be rejected/refuted

97
Q

What does rejecting the null hypothesis do?

A

It provides evidence in favour of the alternative hypothesis

98
Q

What is a directional hypothesis?

A

A hypothesis that predicts the direction of a relationship or difference between two variables. Also known as a one-tailed hypothesis

99
Q

What is the difference between a one- and two-tailed test in terms of the hypothesis?

A

A one-tailed test looks for an increase or decrease in a parameter, whereas a two-tailed test looks for a change in parameter

100
Q

What is the significance level?

A

If the null hypothesis is true, the significance level is the proportion of the repeated experiments in which the null hypothesis will be falsely rejected

101
Q

What is type I error?

A

Rejecting the null hypothesis when it is true (called a false positive)

102
Q

What is type II error?

A

Not rejecting the null hypothesis when it is false (false negative)

103
Q

What significance level is typically set and what can it be referred to as when considering type I error?

A

0.05 (5%) and alpha level

104
Q

What symbol denotes type II error and what is it usually set to?

A

Gamma and 0.8

105
Q

When should the students 1-sample t-test be used?

A

For a sample of size n with normal distribution with values for the mean and standard deviation. The t test can be used to test a null hypothesis: mu = mu_nought (value)

106
Q

What is the degrees of freedom of a students 1-sample t-test?

A

n-1, where n is the sample size

107
Q

What is the degrees of freedom of a students 2-sample t-test?

A

n1 +n2 - 2, where n1 and n2 are the sample sizes

108
Q

How do you use the t-distribution tables for a students t-test question?

A

Calculate the test statistic (either 1 or 2 sample), determine the degrees of freedom, go to the correct row on the table that matches the degrees of freedom and determine which quantile the test statistic matches with

109
Q

After you get a quantile from the t-distribution tables, how do you determine the p value from this?

A

1 minus the quantile for one sided tests and double this for two sided

110
Q

What is the definition of the P value?

A

The probability of having observed our data or more extreme given the null hypothesis is true

111
Q

How do you interpret a p value above 0.1?

A

Little evidence against the null hypothesis

112
Q

How do you interpret a p value between 0.1 and 0.05?

A

Weak evidence against the null hypothesis

113
Q

How do you interpret a p value between 0.01 and 0.05?

A

Moderate evidence against the null hypothesis

114
Q

How do you interpret a p value below 0.01?

A

Strong evidence against the null hypothesis

115
Q

Why is misinterpretation a problem with p values?

A

It can sometimes be misinterpreted as meaning the probability of the null hypothesis being correct or the probability that the observed effect is not real

116
Q

Why is publication bias a problem with p values?

A

Research findings with p more than 0.05 sometimes do not get published

117
Q

Why is over-reliance a problem with p values?

A

Researchers sometimes change their conclusions radically depending on which side of 0.05 the p value is

118
Q

When should the students 2-sample t-test be used?

A

For two samples sizes n1 and n2 with values for the means (x bar 1 and x bar 2) and standard deviations. The t test can be used to test a null hypothesis: x bar 1 = x bar 2 (means of both samples are equal)

119
Q

What assumptions have to be in place for a students 2-sample t-test?

A

Variation in each population can be modelled by a normal distribution. Samples are independent. Populations variances are equal (differ by a factor of < 3)

120
Q

How are proportion data modelled and with what parameters?

A

A binomial distribution with n (number of samples) and p (probability of a success) (remember this can be approximated by a normal distribution)

121
Q

The test statistic for the difference in proportions includes d, what is this parameter?

A

The hypothesised difference

122
Q

What assumption needs to be met for the differences in proportions test to be applied?

A

Normal distribution e.g. np and n(1-p) > 5 this must be followed

123
Q

For a differences in proportions test, is the null hypothesis one or two tailed and what type of distribution does it have?

A

Two-tailed and it is the standard normal distribution (z distribution)

124
Q

To detect a statistically significant difference between the means of two groups, what calculation needs to be done?

A

The study sample size required

125
Q

What parameters are used in the equation for the sample size for the difference in means?

A

The standard deviation for the underlying population sigma, the hypothesised difference between two groups (d), the quantile values on the standard normal distribution table that relate to (1 - half the significance level) and the power

126
Q

The sample size equation for difference in means can be rearranged to determine what instead?

A

To calculate the size of difference (d) that could be detected as statistically significant given the sample size

127
Q

To detect a statistically significant difference between the proportions of two groups, what calculation needs to be done?

A

The study sample size required

128
Q

The equations for the sample size for difference in means and difference in proportions contain the same parameters except one difference in each, what is this difference?

A

The difference in means version has the standard deviation, whereas the difference in proportions version has pi nought, which is the average proportion of the two groups

129
Q

The equations for the sample size for difference in means and difference in proportions include a quantile that relates to power. What is the power and what does a higher value of it mean?

A

The power is the probability of detecting a significant difference when one exists.

130
Q

What is power analysis?

A

The process of determining the sample size for a research study to detect a significant difference in means of proportions

131
Q
A