Midterms Flashcards

1
Q

A normal distributed sample:

A

Will follow a straight diagonal line in the Q-Q plot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In a Randomised Controlled Trial, if the risk of the outcome in the intervention and control groups is the same, then:

A

The relative risk is equal to 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

In the electrocardiogram (ECG) of a 30 year old male patient you see an ST segment elevation, a finding which is very commonly found in Acute Myocardial infraction (ami). Would you think the male patient suffers from AMI?

A

I would consider this finding in the context of the patient’s pre-test probability of having an AMI, given his risks, signs and symptoms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which one of the following can be used to describe a categorical variable?

A

A bar plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

In the United States during 2019 the top 1% earned a mean of $522,000, whereas the bottom 90% earned a mean of $39,000. If we take the income distribution for “all Americans” then:

A

The mean income would be much higher than the median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

In a two-sample t-test, if the p-value is lower than 0,05.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

A high observed chi-square statistic means that:

A

The observed counts in the contingency table are very different to those we would expect if the variables were not associated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Reproducible research:

A

Is a basic principle of good research.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The normal distribution:

A

Includes 95% of the total probability between +-1.96 standard deviation from its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

In a cohort study of factor workers, we wish to examine whether occupational exposure to mineral dust is associated with lung capacity (Forced Vital Capacity[FVC], in it) For what can we do a:

A

two-sample test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When analysing data:

A

If some values are missing we may assume they are equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

We can examine-whether a sample follows a normal distribution using:

A

a Q-Q plot

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

As a measure of location, the median

A

is more robust against skewness and outliers, compared to the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

A 95% Confidence interval for a mean:

A

Will contain the population mean and with 95% certainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The null hypothesis for a two-sample t-test states that:

A

The two population means are equal, and any difference in sample mean is due to random error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

A box and whisker plot:

A

Is appropriate to summarize countinous variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The following is an example of an continuous numeric variable:

A

Kidney function (eGFR) - estimates Glomerular filtration rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Should all medical students know statistics?

A

yes, because statistics is how most medical knowledge is generated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

The central limit theroem

A

the sample mean will follow a normal distribution, even if the underlying variable is not normally distributed in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If the p-value is higher than 0,05 then:

A

We have failed to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

The population mean and population variance

A

Are unknown, and are being estimated

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

A histogram

A

Is visualy distinct from a bar plot, as it has no space between bars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

The following is an example of an ordinal variable:

A

Socio-economic status

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

In a randomised controlled trial of a new inhaler for asthma, we wish to compare how many patients achieved asthma remission in the intervention and controlled groups For that we can do:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

In a random population sample of people living in the city. We wish to examine whether living within a kilometre of a green space (e.g. a park or forest) is associated with a reduced risk for obesity (Body mass index >30 kg/m^2) for that we can do a:

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

In a randomised controlled trial of a new treatment against a disease, if the odds ratio is lower than 1 and the 95% confidence interval for the odds ratio does not include 1 then:

A

the result is statistically significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

In a small random population sample of 25 people under treatment for hypertension, we wish to examine whether old age (defined as >=65years, vs 65 years) is associated with abnormal blood pressure (systolic blood pressure > 140 mmHg or diastolic blood pressure > 90 mmHg). For that we can do a:

A

Fisher`s exact test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Which of the following is true about the R statistical environment?

A

Missing values in R are represented by the NA special value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

The t-distribution

A

Is more fat tailed than the standard normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

A smaller standard error

A

can be achieved by taking a larger sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

In a cross sectional study with laboratory-confirmed influenza, we wish to examine whether age (in years) is associated with influenza type (type A vs type B). For that we can do a:

A

mann-whitney test

32
Q

A probability distribution

A

Lets us know how likely each possible value of variables is

33
Q

In a random population sample of people living in Larnaca province, we wish to examine whether living within 5 kilometres of the airport is associated with hearing loss (vs normal hearing). For that we can do a:

A
34
Q

Odds is

A

The probability of an event occurring, divided by the probability of it not occurring

35
Q

In a random population sample of individuals,´unvaccinated against covid 19, we wish to examine whether a past diagnosed covid 19 infection is associated with the level of neutralising antibodies against the SARS.CoV-2-virus (in log AU/ml units). For that we can do a:

A

Mann-whitney test

36
Q

In a rectangular dataset, a variable:

A
37
Q

The fisher´s exact test:

A

should be used instead of the chi-square test of any cell in a contingency table contain a value of 5 or lower.

38
Q

The “null value” for the difference between two means:

A

is zero

39
Q

The two-sample t-test:

A

Test whether the population means of two different samples are equal.

40
Q

In a small randomized controlled trial of a new anti diabetic medication, we wish to compare fasting glucose levels (in mg/dl) between the intervention and control groups. For that we can do a:

A

two-sample t-test

41
Q

What are the sample mean?

A

The sample mean is known, i.e. measured

42
Q

What is the population mean?

A

The population mean is are unknown and are being estimated.

43
Q

How is clinical research done?

A
  1. Study design.
  2. Data collection.
  3. Data processing.
  4. Data analysis.
  5. Output results.
44
Q

Unit of obervation

A

The unit is described by the data. Usually the patient.

45
Q

Observation (record)

A

The rows of the table. A set of values that refer to a particular unit of observation.

46
Q

Variable (field)

A

The columns of the table. A set of values of the same type that reflect a particular characteristic of the units of observation. Has a name.

47
Q

Primary key

A

The observation ID. A variable that uniquely defines a unit of observation.

48
Q

Nominal variables:

A
  • Blood type
  • Occupation
  • Sex
  • Race
49
Q

Ordinal variables:

A
  • Likert scales (e.g. satisfaction level, socioeconomic statis, etc): Very poor/ poor/ fair/ good/ very good
  • Educational level: Primary school/ high school/ college/ postgraduate.
50
Q

Continuous variables:

A
  • Weight (kg)
  • Body Mass Index (kg/m2)
  • Blood pressure (mmHg)
  • Blood cholesterol (mg/dl)
  • Survival time (years)
51
Q

Discrete variable:

A
  • Number of children
  • Number of deaths
  • Number of asthma attacks
52
Q

Indexing

A

Selecting elements from a vector or other object.

53
Q

Frequency table

A

For ordinal variables it`s the same – we just adhere to the ordering. In addition, we can cumulate frequencies. We can even tabulate numeric discrete variables, provided the number of categories is small And also cumulate (since numbers have ordering)

54
Q

Pie charts

A
  • Illustrate relative frequencies (up to 100%)
  • Require few categories, relatively large differences  use sparingly
  • Color gradient often used to indicate ordering, vs different colors for nominal variables
55
Q

Bar plots

A
  • A better way to illustrate categorical variables
  • Can be stacked, proportional, horizontal or vertical, etc.
  • Bars should be same width, with space in-between.
56
Q

Historgrams

A
  • Show the distribution of grouped (binned) numeric variables
  • Same bin width, NO space in between (vs bar plot)
  • To show that this is a continuous numeric variable
57
Q

Mean

A

The sum of the values divided by the number of values: (Σ x)/n

58
Q

Median

A

Splits the values in the middle – into a lower and a higher half.

59
Q

Quantile

A

splits the values by a certain proportion:
- E.g. 10th percentile is the value that separates the lower 10% from the higher 90%
- Median is the 50% quantile
- “Quartiles”: 1st (25%), 2nd (50%), 3rd (75%)

60
Q

Variance:

A

The “expectation” (average) of the squared deviation of a variable from its mean: Standard deviation: The square root of the variance

61
Q

Range

A

The difference between the highest and lowest value

62
Q

Box plots

A

Midhinge: the median.
Hinges: the 1st and 3rd quartiles
Whiskers: usually 1.5x IQR (also: range, 2nd/98th quantile, etc)
Outliers: any values further from the whiskers

63
Q

Probability distribution

A

A mathematic function that gives the probabilities of occurrence of different possible values in a “random variable”.
- Random variable = a variable whose values depend on outcomes of a random phenomenon or experiment.

64
Q

Discrete probability distribution:

A

For categorical numeric variables
- Probability of occurrence, for each value

65
Q

Continuous probability distribution:

A

For continuous numeric variables
- Probability density or cumulative probability for each value

66
Q

The normal distribution

A

Described by just two parameters: μ and σ.
“standardize” a normally-distributed variable, by subtracting its mean and dividing by its SD: convert to a z-score. Show where a value is placed in the Standard Normal distribution.

A symmetric unimodal distribution: mean = median = mode = 0. For any range of values, we can calculate its probability of occurrence (area under the curve).

67
Q

Why is the normal distribution important?

A
  1. Many variables follow a normal distribution. Variables such as height, weight, blood pressure, most physiological measurements, course grades, etc.
  2. The “central limit theorem”
    The means of random samples from any distribution, will follow a normal distribution – even if the underlying variable is NOT normally distributed in the population. The standard derivation (SD) of this distribution of sample means, is called the standard error (SE).
68
Q

Q-Q plots

A

Plots sample quantiles against the corresponding quantiles of the standard normal distribution. If variable follows normal distribution, the points should lie on the diagonal line y = x.

69
Q

Standard error:

A

The standard deviation of the sample mean. It is inversely proportional to the square root of the sample size.

70
Q

Central limit theorem:

A

The sample mean will follow a normal distribution – even if the underlying variable is NOT normally distributed in the population. E.g. coinflips.

71
Q

95% Confidence interval:

A

A fundamental consequence of the central limit theorem.
To create a 95% confidence interval for any numeric variable, we add(subtract 1.96 SEs form it mean. Due to the central limit theorem, and because in the standard normal distribution ±1.96 contains 95% of the total probability. This applies to large samples however. For smaller samples there is some uncertainty (imprecision) about the sample standard deviation and thus the standard error.

72
Q

Random error:

A

Any difference between sample mean and population mean that is attributable to the sampling.

73
Q

Two-sample t-test

A

The two-sample t-test is used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment.

74
Q

Wilcoxon-Mann-Whitney (or Mann-whitney test)

A

A non-parametric test. Assesses whether the two samples come from the same distribution. The Mann-Whitney test is the nonparametric “brother” of the two sample t-test

75
Q

Hypothesis testing:

A

Calculating 95% Cl for the mean difference is good to indicate the overall precision of it. Get a range of plausible values for the difference. Often we are just interested in choosing between the two alternatives:

Null hypothesis (H0): both population means are the same, μ1  = μ2, and any difference in sample means is due to random error. 

Alternative hypothesis (H1): population means are actually different μ1  - μ2, and that is the cause of the difference in sample means. 

This is called hypothesis testing.

76
Q

P-value

A

The p-value is the probability of getting this or a more extreme result if the null hypothesis is true.