Statistics Flashcards

1
Q

Normal distribution

A

Describes the probability of getting a certain value in a population.
Symmetric around the mean and the median.
Doesn’t change with a change in sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What portion of the population is represented by mean +/- 1 standard deviations

A

Mean +/- 1 standard deviation = 68% of population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What portion of the population is represented by mean +/- 2 standard deviations

A

Mean +/- 2 std deviation= 95% of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What portion of the population is represented by mean +/- 3 standard deviations

A

99.7%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What can you construct using normal distribution

A

Reference interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Standard error

A

Not a measure of variability in population.

It is the standard deviation of the sampling distribution. Measures the precision of the estimate (i.e how reliable is our mean)

Inversely proportional with sample sizes.

Standard error - quantifies the variation in means from multiple sets of measurements

Standard deviations - quantifies the variation within a set of measurements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define variance and standard deviation

A

Range, standard deviation and variance both measure the spread or variability of a data set i.e dispersion.

Range = biggest - smallest number

Variance and standard deviation have a close relation. Variance is the SD squared.
- average of the squared differences from the mean. Variance gives you a sense of outliers.

Standard deviation = square root of variance.
A measure of how spread out numbers are. Standard deviation is more proportionate to average distance from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Define sensitivity

A

Proportion of people with the disease who test positive
Probability of a positive test given you have the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define specificity

A

Proportion of people without the disease who test negative
Probability of a negative test given you don’t have the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define PPV

A

Of those with a positive test how many have the disease.
Not an intrinsic property of the test itself, influenced by prevalence of disease.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Define NPV

A

Of those with a negative test, how many don’t have the disease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define accuracy

A

How close a given set of measurements (observations or readings) are to their true value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a continuous diagnostic test

A

A test which gives a continuous measure.

We determine where to put cut off (somewhat arbitrary).
If we alter the cut off we change sensitivity or specificity.

Increase the value - increases specificity and decreases sensitivity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a sampling distribution

A

The distribution of that statistic, considered as a random variable, when derived from a random sample of size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you calculate the SE

A

SE = Standard deviation / square root of sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If there is no true difference between populations then what is the mean

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the P value

A

The probability we would observe a difference in the sample means this large or larger, just by chance/ if there were no true difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Define the null hypothessis

A

No true difference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Define the alternative hypothesis

A

True difference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What are the steps in a hypothesis test

A
  1. Set up a null and alternative hypothesis
  2. Set the significance value (0.05 usually)
  3. Calculate the likelihood of the observed effect under the assumption that the null hypothesis is true (p-value)
  4. If the data are too unusual consistent with the null hypothesis; conclude that it is not true. p <0.05 reject null, p>0.05 do not reject null hypothesis

(we dont reject the alternative or accept the null)

21
Q

What is type 1 error

A

The error of concluding there is a difference when there is not (false positive)

22
Q

What is type 2 error

A

The error of concluding there is no difference when inn fact there is (false negative)

23
Q

A statistically significant difference could mean either

A

There is a true difference OR there is no true difference, this study just observed unusual results (Type 1 error, false positive)

24
Q

No statistically significant difference could mean either

A

There is no difference
OR
There is a true difference but we did not detect it with this study (Type 2 error, false negative)

25
Q

What is continuous data

A

Capable of being expressed as numbers
e.g height, weight, serum bilirubin

26
Q

What is paired data

A

Two populations of numbers in which the same variable has been measured on the same population usually at two different times, or under two different conditions.
e.g before and after a treatment.

In clincal trials unpaired T tests are used because patients are randomised intor groups (e.g type of anasthetic used for an operation)

Paired t tests remove subject to subject variation

27
Q

What are ordinal scales

A

Have mutually exclusive classes but there is an order between them.

I.e Can be ranked or ordered, falls between 2 extremes

Can be given as frequncies

Mean calculates with caution.

28
Q

What is nominal data

A

Also known as categorical or qualitative.

Consists of classifying the observations into mutually exclusive classes.

I.e can be put into various categories but no specific hierarchy exists

e.g sex, colour

Can be given as frequencies

Mean cannot be calculate.d

29
Q

What is central tendency and what are two measures commonly used. What are limitations to each of them.

A

Central tendency
- The average
- Commonly used ones are mean and the median, mode is used less frequently.

  • Median: Middle number.
  • Mean: Arithmetic mean
    Best used when observations are symmetrical (i.e evenly distributed) not as good when there are outliers, can skew results.
  • Mode: Most frequent number
30
Q

How is nominal data (and most of ordinal data) best expressed

A

Series of relative frequencies

31
Q

Left tailed test

A

A left-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is less than the null hypothesis claims.

Critical value will be negative.

32
Q

What is a right tailed test

A

A right-tailed test is used when the alternative hypothesis states that the true value of the parameter specified in the null hypothesis is greater than the null hypothesis claims

Critical value will always be positive
(a threshold that is used to determine whether or not to reject the null hypothesis).

Direction of test is indicated in the alternative hypo
thesis and not in the null hypothesis.

33
Q

Two tailed test

A

Non directional
Used when population parameter is DIFFERENT from hypothesised value.
Usually has 2 critical value

34
Q

What is a measure of skewness

A

The closer the mean and median are together, the more symmetrical the distribution.

We can get a crude measure of skewness by subtracting the median from the mean.

35
Q

What are measures of dispersion (including ones used in parametric and non-parametric data)

A

Explain how the observations are spread around the central measure.

For parametric data SD describes the dispersion of values around the mean.

Non parametric - Percentiles are used to describe the values around the median value.

36
Q

How do you describe dispersion in non-normal data

A

Interquartile range.

It is defined as the difference between the 75th and 25th percentiles of the data.

37
Q

Correlation coefficent

A

The degree of association between 2 variable
expressed at -1 to +1
-1 = negative correlation
0 = no correaltion
+1 = positive correlation

The correlation coefficient is a mathematical interpretation that is devoid of any cause or effect implications.

It is best to regard the correlation technique as a type of investigative analysis because it suggests areas for further research, rather than as testing hypotheses.

38
Q

When should you question the use of standard deviation

A

A standard deviation that is greater than one-half of the value of the mean should raise questions about the adequacy of the standard deviation as a summary statistic.

39
Q

How do we represent the probability of making a type 1 and type 2 error

A

Alpha = type 1 error
Beta - Type 2 error

40
Q

What is multiple testing for statistical significance

A

any instance that involves the simultaneous testing of more than one hypothesis

41
Q

What is the power of a test

A

expressed by the statistic 1-beta
Reflects the ability to reject a false hypothesis (usually set between 70-90%)

42
Q

What is delta

A

The difference in response rates between the groups that would be of biological or clinical interest.

43
Q

Concordance

A

Agreement between measurements refers to the degree of concordance between two (or more) sets of measurements.

Statistical methods to test agreement are used to assess inter-rater variability or to decide whether one technique for measuring a variable can substitute another.

It is evaluated by tests such as Kendall’s tau.

Measurements made by two (sometimes more than two) different observers or by two different techniques produce similar results.

44
Q

Impact of multiple comparison and how to control

A

Multiple comparison
- The more statistical tests you do the more likely it is you’ll get a false positive result.
- Can use Bonferroni correction, sidak, Holms or Tukeys procedure to correct for multiple comparison

45
Q

What is positive correlation

A

Increase in one variable leads to the increase in another.

46
Q

What is negative correlation

A

Increase in one variable leads to a decrease in another.

47
Q

Tests used for parametric and non-parametric correlation

A

Parametric - Pearson
Non-parametric - Spearmen

48
Q

Linear Regression

A

The regression equation representing how much y changes with any given change of x can be used to construct a regression line on a scatter diagram

49
Q

Multiple linear regression

A

Multiple linear regression is used to estimate the relationship between two or more independent variables and one dependent variable. You can use multiple linear regression when you want to know:

How strong the relationship is between two or more independent variables and one dependent variable (e.g. how rainfall, temperature, and amount of fertilizer added affect crop growth).

The value of the dependent variable at a certain value of the independent variables (e.g. the expected yield of a crop at certain levels of rainfall, temperature, and fertilizer addition).