Quantitative Flashcards

1
Q

Define an observational study.

A

A study that does not include an intervention or experiment, only observation of natural relationships between factors and outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some types of observational studies?

A

cross-sectional, longitudinal, case-control, cohort and survey studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe a cross-sectional study.

A

A study that looks at a cohort at a single point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe a longitudinal study.

A

A study that uses repeated measures over a long period of time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe a case-control study.

A

A study that looks at the relationship of an outcome (case) versus no outcome (controls) and compares this to previous exposures. Also known as a ‘retrospective study’.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe a cohort study.

A

Almost the opposite to a case-control study. A study that follows a population with exposure to identify whether an outcome is developed or not.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe a survey study.

A

A study that uses surveys to collect data from participants. Particularly useful for collecting data from a geographically widespread population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define an interventional study.

A

A study that employs manipulation of a variable to define the outcome of this intervention on a specific population. Also known as experimental studies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are some types of interventional studies?

A

randomised control trials, pre-post studies, and non-randomised control trials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe a randomised control trial.

A

A trial where subjects are randomly assigned to one of two (or more) groups- either the experimental or the control group. The outcomes of both groups are then compared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some features of a well-designed RCT?

A

a large enough sample to allow generalisation of results

concealed randomisation of the subjects to each group

both groups are treated identically by researchers

analysis is focused on the research question

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe a pre-post study.

A

A study that measures the occurrence of an outcome before and again after a particular intervention is implemented.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why is a pre-post study not as strong as an RCT?

A

They suffer poor internal validity because they cannot accurately control for every variable that may be responsible for the outcome of an intervention like an RCT can.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe a non-randomised trial.

A

Similar to an RCT where there is an intervention and control group however there is no randomisation of participants into these groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Why is a non-randomised trial not considered a strong study design?

A

They can suffer from bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a variable?

A

an attribute that varies or changes between individuals, objects, qualities, and properties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some different types of variables?

A

numeric (discrete or continuous), categorical (nominal or ordinal).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is a numeric variable?

A

a variable that has a measurable value described by a number

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the difference between a discrete and continuous variable?

A

a discrete variable uses only whole numbers (i.e. 1 child) whereas continuous can use values between units (i.e. 55.4kg).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is a categorical variable?

A

A variable that may be divided into groups (i.e. race, sex, age group).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the difference between a nominal and an ordinal variable?

A

nominal variables have no natural order (i.e. gender), whereas ordinal variables are able to be ordered (i.e. satisfaction of treatment is 1=not satisfied, 2= slightly satisfied 3= moderately satisfied & 4= very satisified).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the difference between an interval scale and a ratio scale?

A

A ratio scale uses a true-zero point (i.e. weight, height) whereas an interval scale uses an arbitory zero point (i.e. temperature).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

When might we see a bimodal distribution of data on a histogram?

A

When there are two distributions mixed together, i.e. heights of males and females on the same histogram.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

When the distribution of data on a histogram is skewed, which direction do we name this for?

A

towards the tail, so a distribution with a tail to the right will be skewed to the right or positively skewed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

When would we use a denisty plot over a histogram to visualise data?

A

When we need a better understanding of the data density. Histograms can vary in their picture depending on how many ‘bins’ are chosen.

It is also possible to overlay density plots making comparing of two data groups possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

When might histograms be chosen over density plots?

A

If visualisation of data must be done by hand, density plots are difficult to draw and need software to be produced.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is the difference between mean and median?

A

The mean is the average of a data set whereas the median is the middle figure in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the median also known as?

A

the 50th percentile or the 0.5 quantile

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

How do we calculate a 5-number summary for a data set?

A

By finding the minimum, first quartile, median, third quartile and maximum.

minimum= smallest number
1st quartile= median of the values below the median
median= middle number
3rd quartile= median of the values above the median
maximum= largest number

30
Q

What is the interquartile range?

A

The distance between the first and third quartiles.

Also known as the middle 50% of the data.

31
Q

What is the p quantile?

A

the 100th percentile or the maximum.

32
Q

What kind of distribution is more likely to have observations being flagged as unsual or outliers in a box & whisker plot?

A

a skewed distribution.

33
Q

What data is used in a box & whiskers plot?

A

a 5-number summary

34
Q

What is an outlier?

A

a data value that does not seem to match the overall distribution observed

it could be either a genuine observation or a data entry error which is why they are marked in SPSS for review

35
Q

When might we see a number flagged as a possible outlier in a box & whisker plot?

A

if that number/data point is more than 1.5 times the interquartile range above the third quartile.

36
Q

True or false, a skewed data set is more likely to flag possible outliers on a box & whiskers box plot.

A

True

37
Q

What type of variables are most likely going to be visualised using bar charts?

A

categorical (nominal or ordinal)

38
Q

Why should pie charts be avoided for visualising categorical data?

A

they are less reliable for interpretation

39
Q

If data distribution is skewed, the mean will be pulled in which direction?

A

towards the tail

40
Q

What does it mean to say that the mean/average is susceptible to the presence of outliers?

A

It means that outliers can influence the average of a data set by skewing it so that it is no longer accurate.

41
Q

What are some advantages of using a mean rather than median to discuss a data set?

A

the mean tends to be more powerful than the median because it takes into account every piece of data

mean has a rich theory, through the central limit theorem which makes it very useful in practice

42
Q

What are some disadvantages of using a mean rather than median to discuss a data set?

A

the mean does not carry meaningful quantitative information for data gathered from nominal or ordinal scales

the mean is sensitive to extreme values

43
Q

What is variance?

A

the extent to which each observation deviates from the mean

44
Q

What is the 68 - 95 - 99.7 rule?

A

For any normal distribution, the area within 1 standard deviation of the mean is 68%, the area within 2 standard deviations of the mean is 95% and the area within 3 standard deviations of the mean is 99.7%.

This rule is used to make statements about data that has a normal distribution i.e. what range of values would include 95% of subjects.

45
Q

How can we tell if data has a normal distribution?

A

data that looks symmetrical on a histogram

data that matches up with a normal quantile plot

46
Q

Why is Normal distribution our friend?

A

We can use it to:

Describe the distribution of observations, such as height.

Describe the distribution of statistics, such as the sample mean.

47
Q

Why is Normal distribution our friend?

A

We can use it to:

Describe the distribution of observations, such as height.

Describe the distribution of statistics, such as the sample mean.

48
Q

What is the student’s T test and when would we use it?

A

a test to determine if there is a significant difference between the means of two groups or populations. It is typically used when the sample sizes are small and the variances of the two groups may be different.

49
Q

What is a t-value?

A

a ratio of the difference between the means of two groups to the variation within each group.

a larger t value suggests a larger difference between the means and a smaller probability that the difference is due to chance.

50
Q

What kind of distribution and data is a t test suitable for?

A

a normal distribution, continuous

51
Q

What are the two types of t-tests? What are they used for?

A

independent samples t-test: used to compare the means of two independent groups

paired (dependent) samples t-test: used to compare the means of related groups (typically based on before-and-after measurements or matched subjects)

52
Q

How do we calculate the degrees of freedom (df) in a t-test?

A

sample 1 size + sample 2 size -2

53
Q

What are pooled t-tests appropriate for? What about a Welch t-test?

A

types of independent two-sample t-tests

pooled t-test is used if the two populations being compared have equal variances (as confirmed by a Levene’s test which has an outcome that is not significant)

welch t-test is used if the two populations being compared do not have equal variances (as confirmed by a Levene’s test which has an outcome that is significant)

54
Q

What are type 1 and type 2 errors?

A

type 1 error (also known as a false positive) is the error of rejecting the null hypothesis when it is actually true.

type 2 error (also known as a false negative) is the error of not rejecting the null hypothesis even though it is false

55
Q

What is central limit theorem and how is it useful?

A

if a data set is sufficiently large (sample size >20) and independent, the distribution will be approximately normal

it is useful to allow us to use tests that assume a normal distribution

56
Q

When are non-parametric tests used?

A

also known as distribution-free tests, used when data is not normally distributed and central limit theorem does not apply (small sample size)

‘free from parameters’, a t-test is a parametric test because it estimate parameters i.e. population means using statistics

57
Q

What are some non-parametric tests? When are each appropriate?

A

The Mann-Whitney U test (similar to the Wilcoxon Rank Sum test) used to compare two independent groups

The Kruskal-Wallis test used to compare more than two independent groups (better for outliers or ordinal data than ANOVA)

The chi-square test used to compare the association between two categorical variables

58
Q

How are non-parametric tests protected from outliers?

A

by ranking the values (i.e. each data set is given a rank rather than a nominal value)

59
Q

When is an ANOVA test used? (Analysis of Variance) What assumptions are made to use this kind of test?

A

to compare means between three or more groups

data is normally distributed, independent and variances between groups are equal

60
Q

What kind of data is a chi-square test suitable for?

A

categorical

61
Q

What is a chi-square test for?

A

it is used to determine if there is a significant association/dependence or independence between two categorical variables

i.e. ‘is there a significant relationship between gender and voting preference?’

62
Q

What value does a chi-square test give us?

A

a chi-square statistic which can then be compared with the p-value (or critical value) to determine if the data occurred by chance or has significance

63
Q

What are the degrees of freedom (df) in a chi-square test?

A

the number of rows -1 x number of columns -1

64
Q

What is the difference between the critical value and p-value in a chi-square test?

A

the critical value is a specific value derived from a data set whereas a p-value is a probability value

65
Q

How do we know if a chi-square test has told us the data is significant?

A

the chi-square statistic will exceed the critical value or the p-value will be below 0.05.

66
Q

What is a Pearson Correlation coefficient and what does it tell us?

A

denoted by r

a unit-less measure that ranges from -1 to +1

tells us if there is a strong, moderate or weak, positive or negative, linear or non-linear corrrelation between two sets of data

i.e. a scatterplot that is linear and moving upwards has a strong, positive correlation coefficient whereas a scatterplot that is linear and moving downwards has a strong, negative correlation coefficient.

67
Q

What is an R-squared value?

A

pearson correlation coefficient squared.

the proportion of all the variability that is explained by the differences between groups

so how much of the variability can be explained by the data in question (i.e. how much variability in SPPB score can be attributed to levels of physical activity)

calculated as the sum of squares between groups divided by the total sum of squares presented as 0.4 or 40% (as an example).

68
Q

What is a Spearman correlation coefficient and what does it tell us?

A

denoted as Spearman’s rho

tells us about the correlation between data just like a Pearson correlation coefficient however first ranks the observations in each variable seperately (much like non-parametric methods this protects from outliers)

useful for when data is ordinal or there are outliers

69
Q

What is linear regression and what does it tell us?

A

like correlation, it is a method used to describe the relationship between a dependent variable and one or more independent variable

aims to establish a linear line that best fits the data points and predicts the value of the dependent variable based on the values of the independent variable

assumptions: independent observations, linear association, normal variability, constant variability

70
Q

come back to residuals???

A