Clinical Data Analysis Flashcards

1
Q

What is ontology?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is epistemology?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the 2 main types of stats?

A

-Descriptive
-Inferential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do descriptive stats do?

A

Summarising large data sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 4 types of variables?

A

-Dependent
-Independent
-Confounding/extraneous
-Control

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the dependent variable?

A

Variable measured e.g., heart rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the independent variable?

A

Variable manipulated e.g., drug dosage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are confounding variables?

A

Variables that have a hidden effect on your DV

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are control variables?

A

Variables held constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can a variable be an IV, DV, confounding or control indifferent scenarios?

A

Yes - they are NOT exclusive - context influences this

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the 4 types of data?

A

-Nominal
-Ordinal
-Interval
-Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is meant by data either being:
-Discrete
or
-Continuous

A

-Discrete = data takes certain values
-Continuous = data takes any value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is nominal data?

A

-Categorical
-Discrete
-Mutually exclusive groups
-No intrinsic order
e.g., blood groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is ordinal data?

A

-Discrete
-Intrinsically ordered
e.g., places in a race

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is interval data?

A

-Discrete or continuous
-Intrinsically ordered
-Difference between values in meaningful & consistent
e.g., temperature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is ratio data?

A

-Interval - with an absolute zero point
-Can use ratios
e.g., length

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are measures of central tendency?

A

-Difference in scales = differences in representation
—>makes assumptions about nature if data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the 3 types of measures of central tendency?

A

-Mean/average
-Median
-Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 4 types of measures of dispersion (spread of distribution)?

A

-Standard deviation
-Variance (squared deviation from the mean)
-Quantiles: interquartile range, percentiles
-Range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

For a normal distribution curve - what would it look like & where would the %s of the data values be found within 1SD,2SDs & 3SDs away from the mean?

A

-Bell shaped curve

*68.27% of values lie within 1SD of mean (i.e., 34.1% on each side)
*95% of values lie within 2SDs of mean (i.e., ~47% on each side)
*99.73% of values lie within 3SDs of mean (i.e., ~507 on each side)

21
Q

When would you use standard deviation for a data set?

A

-When data has normal distribution
–> as SD assumes data symmetry

22
Q

Which measure of central tendency can be used (along with SD) for normal distribution data sets?

A

Mean

23
Q

When would you use box plots - i.e., interquartile range?

A

When data is skewed - positively or negatively –> as is no symmetry about mean (data not symmetrical - 99.73% of data doesn’t lie either side of mean)
-Look at middle 50% (median) to see where 25% & 75% of data is

24
Q

Which measure of central tendency can be used (along with IQR) for skewed data?

A

Median

25
Q

What is IQR?

A

The difference between the 75th & 25th percentiles of the data
Q1 = 25% of data values
Q2 = 50% of data values
Q3 = 75% of data values

26
Q

Label this box and whisker plot or dot plot (v. similar) with:
-IQR
-Mean
-Median
-Range

A
27
Q

Describe which measures of central tendency pair up with which measures of dispersion?

A

-Mean –> Standard deviation

-Median –> Quantiles i.e., Interquartile range

-Mode –> very uncommon to see measure of dispersion - is for when have categorical data (mutually exclusive groups - where spread is meaningless)

28
Q

Summarise which types of data can have different calculations done on them.

A
29
Q

Why is showing the distribution of data important?

A

-So can eyeball the data shape
–> to choose best data representation

30
Q

Describe normal distribution.

A

-Data is symmetrical around mean
-99.73% data lies within 3SDs

31
Q

Describe skewed distribution.

A

-Loss of data symmetry
-Skew = refers to which end the tail of data lies (where data is pulled to)
+ve skew = spread is pulled to right by larger values (tail end on right)
-ve skew = spread is pulled to left by lower values (tail end on left)

32
Q

What is kurtosis?

A

-Measure of ‘tailedness’ of data
-Loss of bell shape

33
Q

What is leptokurtic distribution?

A

High peak around mean
+ve kurtosis

34
Q

What is platykurtic distribution?

A

Fairly flat distribution - low peak - quite uniform
-ve kurtosis

35
Q

What is bimodal distribution?

A

2 peaks shown in data

36
Q

How is probability/risk calculated

A

p + q = 1

37
Q

How is the odds of having a disease calculated?

A

p = q - 1
q = 1 - p

38
Q

What are the types of rate & risks?

A

-Absolute risk
-Attributable risk (risk difference, excess risk)
-Number needed to treat

39
Q

What is absolute risk?

A

Prevalence (current no. cases) & incidence rates (no. new cases)

40
Q

What is attributable risk?

A

-Difference between 2 probabilities
-Exposed vs unexposed groups
+ve intervention = AR dec (p pl;acebo - p drug)
-ve intervention = AR inc (p smoking - p non-smoking)

41
Q

What is number needed to treat?

A

-No. patients you need to treat to prevent 1 additional bad outcome (or harm reversed)

1/attributable risk reduction

42
Q

Calculate absolute risk.

A

*Vaccine, hospitalisation = 20/428 = 0.047
*Vaccine, no hospitalisation = 408/428 = 0.953

*No Vaccine, hospitalisation = 42/438 = 0.096
*No Vaccine, no hospitalisation = 396/438 = 0.904

43
Q

Calculate attributable risk (risk difference).

A

*Hospitalisation = no vaccine risk – vaccine risk = 0.096 – 0.047 = 0.049
*No Hospitalisation = No vaccine risk – vaccine risk = 0.904 – 0.486 = - 0.418

44
Q

Calculate number needed to treat

A

NNT = 1/ARR = 1/0.049 = 20.408
-So need to treat 21 people

45
Q

Calculate relative risk.

A

= Absolute risk / absolute risk
=0.096/0.047
= 0.487

46
Q

Calculate odds ratio.

A

Vaccine, hospitalisation odds = 0.049
No Vaccine, hospitalisation odds = 0.106

Odds ratio = 0.049/0.106 = 0.462

47
Q

Calculate the Sensitivity i.e., the percentage of those patients with the condition whom the test correctly identifies them as having gastric cancer. This is the percentage of true positives.

A

True positives = 20. Total number of disease present observations = 25.
20/25 = 80%

48
Q

Calculate the Specificity i.e., the percent of those patients without the condition whom the test correctly identifies as not having it. This is the percentage of true negatives.

A

True negatives = 45. Total number of disease absent observations = 75.
45/75 = 60%