Statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What does an ANOVA assume

A
  • variable is Normally distributed in each group in the population (or sample size is large and variable not too skewed)
  • standard deviation is similar across groups
  • participants (observations) are independent across groups – i.e., NOT paired/matched
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If proportion = 1. Odds = ?

A

Infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by ‘interquartile range’

A

The interquartile range spans the values between the lower quartile (25th percentile) and the upper quartile (75th percentile), that is the middle 50% of observations.

The interquartile range is used to quantify variation (dispersion) or the amount of spread of the scores.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe a Mann-Whitney Test

A

A non-parametric test for comparing a quantitative variable between two independent groups. Provides an IQR for each group, and p-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a ‘p-value’

A

The p-value is used to quantify extent to which the sample estimate contradicts the null hypothesis. P-value can take values between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Paired T-Test

A

Confidence interval & hypothesis test for mean difference between two paired groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is meant by ‘median’

A

The median (also referred to as the 50th percentile) is the value below which 50% of the observations lie (and above which 50% of the observations lie). It quantifies average (or centrality) in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If R2 = 1 then…

A

all the variation is explained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a linear regression, what is ‘b’

A

the slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How would you calculate the relative risk

A

the risk in one group divided by the risk in the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a paired t-test assume

A

They assume within-pair differences on variable are Normally distributed (or sample size is large and within-pair differences are not too skewed).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

If proportion = 0.5. Odds = ?

A

1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Give an example of when participants might be matched/paired

A
  • participants paired on some criteria (e.g., gender, age) before randomly allocating one member of each pair to each of two trial arms under comparison
  • measurements taken before and after an intervention is administered on all study participants; compare before (control) and after (intervention) conditions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How would you calculate the lower bound of range?

A

mean – 1.96 x standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are non-parametric methods

A

They analyse the rank ordering in the data rather than the actual scores themselves.

They do not compare the mean between groups, rather they compare the entire distribution, and only provide p-values, not confidence intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When would Fishers’ test be used instead of Chi-squared

A

– fewer than 20 participants or

– between 20 and 39 participants and the expected value in at least one cell is less than 5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Give two parametric methods for comparing groups

A

ANOVA and T test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Give two ways correlation can be summarised i.e. graphically and numerically

A
  • graphically: using scatterplots

* numerically: using correlation coefficients

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False:

if r > 0, as one variable increases the other increases

A

true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Define ‘true negative’

A

do not have the disease and correctly test negative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Describe the CHi-squared test

A

A parametric method that makes distributional assumptions for contingency tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How would you calculate NPV

A

TN/(FN+TN)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How would you calculate the risk

A

It is calculated by dividing the number of people who have the disease by the total number of people.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How would you calculate PPV

A

TP/(TP + FP)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does a Box and Whiskers plot show?

A
Graph indicates:
•	median
•	lower quartile
•	upper quartile
•	range that contains most values
•	outliers – extreme observations with very low or very high values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What does a Two-Sample Unpaired T-Test assume

A
  1. Variable is normally distributed in each group in the population (or the sample size is large and the variable is not too skewed)
  2. Standard deviation is similar in the two groups
  3. Participants (observations) are independent between groups – i.e., NOT paired
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In a scatterplot, which axis is the outcome variable plotted on?

A

y axis/vertical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Describe least squares estimation

A

minimises the sum of the (vertical) squared distances between actual outcome scores and the line – line of best fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is meant by standard error?

A

The precision with which the true population parameter (the mean) is estimated.

The smaller the standard error the more precise the sample estimate is of the true mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Describe a Wilcoxon Signed Ranks Test

A

A non-parametric test for comparing a quantitative variable between two paired groups. Provides an IQR for each group, and p-value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

define ‘null hypothesis’

A

the most boring truth imaginable, not necessarily what you think the truth is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Define ‘correlation coefficient’

A

quantify the strength of association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How is proportion calculated

A

number of participants in a category/total number of participants

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Describe a bar chart

A

graph where the heights of rectangular bars are used to indicate the number (or proportion or percentage) of participants that are in each category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Describe a histogram

A

Graph where the heights of rectangular bars (or bins) are used to indicate the (relative) frequency with which values in specific ranges occur.

Unlike bar charts (which are used for categorical data) they have no gaps between the bins.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is a repeated measures ANOVA

A

Hypothesis test for comparing the mean across three or more paired (matched) groups. It provides a global p-value comparing the mean across all groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

True or False: In linear regression the predictor is often assumed to be a potential cause of the outcome

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Define ‘correlation’

A

the association between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How is number needed to treat calculated

A

1/risk difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Describe a scatterplot

A

graph used to summarise the relationship between two quantitative variables on two axes. Each participant is represented on the scatterplot using a symbol such as a dot (●) or cross (×).

The position of the dot on the vertical axis (y axis) indicates the score on one variable and the position on the horizontal axis (x axis) indicates the score on the other variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is meant by odds and odds ratio?

A

The odds are how common a binary characteristic is for a single group

The odds ratio is the ratio of the odds in one group to the odds in another group.

It’s calculated to compare the odds between groups

42
Q

Describe a Dotplot graph

A

Like a histogram that is turned back to front and flipped on its side. Each observation is represented as a dot (●).

The length of the “bars” indicate how common the value is.

43
Q

What types of variable is Body Temperature (choose from binary; nominal; ordinal; and quantitative)?

A

Quantitative

44
Q

Define ‘true positive’

A

have the disease and correctly test positive

45
Q

What is meant by risk and relative risk?

A

The risk is the proportion of people in a single group who have a disease.

The relative risk is used to compare the risk between two groups

46
Q

Which 2 tests can compare p values of binary variables between two independent groups

A

Chi-squared test or Fisher’s exact test

47
Q

Define ‘R squared’

A

the proportion of the variation in one variable that is explained by another variable

48
Q

When should non parametric methods be used

A

when the assumptions that underlie parametric methods for independent groups do not hold, specifically where:
• variable is skewed (and sample size is small)
• standard deviation differs markedly across groups
• variable is more ordinal (categorical) than quantitative

49
Q

What is number needed to treat

A

This is the number of people that need to receive the intervention before 1 person benefits from it.

50
Q

True or false:

if p >= 0.05 then do not reject the null hypothesis

A

True

51
Q

What is meant by ‘mean’ and how is it calculated?

A

The mean quantifies the average for the quantitative variables.
It is calculated for a given variable as the sum of the values divided by the total number of values

52
Q

Describe ‘normal distribution’

A

A mathematically defined theoretical distribution characterised by symmetrical bell-shaped curve.

53
Q

Define distribution (of variables)

A

Distribution refers to the different values that occur and the frequency with which they occur for a given variable.

54
Q

How can categorical data be summarised

A

Either numerical or graphical representation of:
• frequency
o actual number in each category
• relative frequency
o proportion of the total in each category
o percentage of the total in each category

55
Q

Describe a Friedman Test

A

A non-parametric test for comparing a quantitative variable across three or more paired (matched) groups. Provides an IQR for each group, and p-value.

56
Q

What are near perfect tests called

A

Reference standards

57
Q

If a correlation is non-monotonic i.e. U shaped, which correlation coefficient can be used

A

Neither

58
Q

How would you calculate the upper bound of range?

A

upper bound of range = mean + 1.96 x standard deviation

59
Q

Why is the estimate from the sample almost never the same as the true population parameter?

A
  • sample is only a subset of the population
  • there is variability across people
  • sample is not necessarily representative
60
Q

Describe positively skewed distribution

A

most observations bunched at the lower values with a longer tail at the higher values

61
Q

If R2 = 0 then…

A

no variation is explained

62
Q

When the assumptions of the paired t-test are not satisfied what alternative test can be used?

A

Wilcoxon signed-rank test can be used as an alternative to the paired t-test when the assumptions for the latter are not satisfied. This test compares the distribution between the first and second measurements.

63
Q

Why would there be uncertainty about a TRUE answer from using a population sample?

A
  • variability (differences) between people in what you are trying to measure
  • the sample is only a subset of the population and is not perfectly representative of it
64
Q

Define ‘linear regression’

A

Estimating a mathematical equation that describes the linear relationship between a quantitative outcome and a quantitative predictor

65
Q

What is an ANOVA test

A

Hypothesis test for comparing the mean across three or more independent groups.

It provides a global p-value comparing the mean across all groups

66
Q

True or false:

if p <= 0.05 then do not reject the null hypothesis

A

False

67
Q

What is meant by ‘correlated’

A

Correlated means the scores on one variable are associated with (or predicted by) scores on the other

68
Q

If the analysis of variance test is significant at the 5% level (i.e., p<0.05), what could you do

A

you may to compare the groups to each other using pairwise comparisons

69
Q

How would you calculate sensitivity

A

TP/(TP+FN)

70
Q

Define ‘pearson correlation coefficient’

A

a measure of the correlation between two quantitative variables that have a linear relationship.

71
Q

Describe Fishers’ exact test

A

the non-parametric alternative to the Chi-squared test to be used for contingency tables

72
Q

What is a Two-Sample Unpaired T-Test

A

Allows for the interpretation of the confidence interval & hypothesis test (incl. p value) for the mean difference between two independent groups

73
Q

If proportion = 0. Odds = ?

A

0

74
Q

What is a confidence interval

A

The confidence interval is the range of values within which we can be 95% certain the true value of the parameter of interest lies in the population

75
Q

Define relative risk

A

proportion with disease in exposed group/proportion with disease in non-exposed group

76
Q

What are binary/dichotomous variable

A

categorical with 2 categories e.g. mortality status: alive versus dead

77
Q

Describe a Kruskal-Wallis Test

A

A non-parametric test for comparing a quantitative variable across three or more independent groups. Provides an IQR for each group, and p-value.

78
Q

What is meant by a ‘two-tailed, unpaired t-test’?

A

An unpaired (or two-sample) t-test is used to compare means between 2 independent groups.

It tests the hypothesis that the mean is the same in the populations from which the participants in each group were drawn.

79
Q

What do PPV/NPV depend on above sensitivity and specificity

A

the prevalence of the disease

80
Q

define ‘variation’

A

how far apart are the values from each other

81
Q

Define outcome and exposure

A

The exposure defines the groups i.e. intervention category in a trial.

The outcome is the binary variable being compared i.e. the disease/disorder category of interest.

82
Q

How would you calculate specificity

A

TN/(FP +TN)

83
Q

What does ‘SD’ stand for and what does it mean?

A

SD stands for standard deviation.

It quantifies the variation in the scores for the quantitative variables.

It can be interpreted as the average difference between the scores and the mean

84
Q

What is a type II error- rejecting hypothesis

A

null hypothesis might not be rejected when it is false

study not large (powerful) enough to reach significance

85
Q

The more common the disease, the ——– the PPV

A

greater

86
Q

Define ‘average’

A

o the value which characterises the middle of the distribution

87
Q

Define ‘false positive’

A

do not have the disease but incorrectly test positive

88
Q

Define ‘symmetry’

A

for each person that has a score below the average is there a corresponding person with a score the same distance above the average

89
Q

What are the 4 assumptions of a linear regression

A

1) outcome is quantitative
2) relationship between the outcome and quantitative predictor is linear
3) residuals are Normally distributed (or the sample size is large)
4) constant variance (homoscedasticity)

90
Q

Define ‘false negative’

A

have the disease but incorrectly test negative

91
Q

What is the general set threshold for rejecting or not rejecting the null hypothesis

A

p-value of 0.05 (or less)

92
Q

In a linear regression, what is ‘a’

A

constant or intercept

93
Q

What types of variable is Age, in 10 year intervals (choose from binary; nominal; ordinal; and quantitative)?

A

Ordinal

94
Q

What conditions must the data satisfy for ‘two-tailed, unpaired t-tests to be valid?

A

The assumptions made by the t-test are that the distribution of the variables is Normal in each of the groups and the standard deviation is approximately the same in each group.

95
Q

How are odds calculated

A

number of participants in category of interest/number of participants in other category

96
Q

Define ‘population’

A

full set of units that we are interested in

97
Q

What types of variable is Marital Status (choose from binary; nominal; ordinal; and quantitative)?

A

Nominal

98
Q

what does a repeated measures ANOVA assume

A
  • the difference scores between any two groups are Normally distributed in the population (or sample size is large and difference scores not too skewed)
  • the standard deviation of the difference scores when comparing any two groups should be similar (“sphericity” assumption)
99
Q

What types of variable is hypertension status- yes/no (choose from binary; nominal; ordinal; and quantitative)?

A

Nominal/Binary

100
Q

Define ‘spearman correlation coefficient’

A

measure of the correlation between 2 quantitative variables- doesn’t have to be linear