Lecture 0 - Introduction Flashcards

1
Q

Data can be collected in various ways:
* Cross-sectionally
* Prospectively
* Retrospectively

Explain these terms.

A
  • Cross-sectionally: data collected at one point in time.
  • Prospectively: subjects are followed over time where measurements occur at baseline and in the future/over time.
  • Retrospectively: the outcome has been assessed and the study looks back in time to find determinants of the outcome.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Types of data are:
* Binary
* Categorical
* Continuous
* Time-to-event (i.e. survival data)

Explain what time-to-event data is.

A

Time-to-event data is time until a specific event occurs such as time to dead, time to recurrence after treatment, time to get employed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Data can be numerically summarized, measures are:
* mean
* median
* mode

Explain these measures.

A
  • mean: the average.
  • median: the middle value when data is ranked from low to high.
  • mode: the highest frequency of a ‘score’.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Most data is normally distributed, which can be depicted in a normally distributed curve/figure. This curve can be:
* Symmetrical
* Positively skewed
* Negatively skewed

Explain these terms and how the mean, median and mode are distributed in these terms. Also give an example of which type of data is typically distributed as symmetrical, or positively or negatively skewed.

A
  • Symmetrical: mean is equal to the median. Example: height.
  • Positively skewed: means that the normal distribution is skewed to the right. Here, there is a high frequency of low ‘scores’ and a low frequency of high ‘scores’. Mean > median > mode. Example: house prices or income.
  • Negatively skewed: means that the normal distribution is skewed to the left. Here, there is a low frequency of low ‘scores’ and a high frequency of high ‘scores’. Mode > median > mean. Example: retirement age.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The extent of variability (i.e. measures of spread) within a data set can be calculated with:
* Standard deviation (SD)
* Variance
* Range
* Interquartile range (IQR)

Describe how variance, range and IQR are calculated.

A
  • Variance: SD^2
  • Range: maximum - minimum
  • IQR: Q3 - Q1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What information can you find in a boxplot?

A
  • The minimum
  • Q1
  • The median
  • Q3
  • The maximum
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which values are reported in medical articles when:
* data is symmetrically distributed
* data is skewed

A
  • mean and SD
  • median and IQR
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the central limit theorem?

A

The distribution of sample means approximates a normal distribution as the sample size gets larger, regardless of the population’s distribution. Sample sizes equal to or greater than 30 are often considered sufficient for the central limit theorem to hold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is pearson correlation coefficient?

A

A measure of linear correlation between two sets of data. It is a number between -1 and +1 that measures the strength and direction of the relationship/correlation between two variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does it mean when the pearson correlation coefficient is:
* +1
* 0
* -1

A
  • +1: perfectly positive linear association
  • 0: no linear association
  • -1: perfectly negative linear association
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the goal of inferential statistics?

A

To draw a conclusion beyond your data sample with the use of effect size, confidence intervals, hypothesis testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which measure denotes the confidence interval for mean or proportion?

A

Standard error, which quantifies the uncertainity of a certain observed effect.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What do the following terms describe:
* Sensitivity
* Specificity
* Positive predictive value (PVV)
* Negative predictive value (NPV)

A
  • Sensitivity: the probablity of a postive test result truly being positive.
  • Specificity: the probability of a negative test result being truly negative
  • Positive predictive value (PVV): the proportion of positive results that are true positives
  • Negative predictive value (NPV): the proportion of negative results that are truly negatives.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Let’s say that the sensitivity of a specific test is 0.756 and the positive predictive value is 50.0%.
Describe how you would calculate the 95% CIs for the sensitivity and PPV.

A
  • Sensitivity: 0.756 +/- 1.96 x (square root(0.756x0.244/n))
  • PPV: 0.5 +/- 1.96 x (square root(0.5x0.5)/n))
How well did you know this?
1
Not at all
2
3
4
5
Perfectly