statistics Flashcards

1
Q

What is a variable?

A

Measure of any single characteristic

Can be assigned a number or category

Can be discrete or continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a discrete variable ?

A

A numeric variable for which we can list the possible values

Can be qualitative or descriptive

Can be dichotomous or polychotomous

Can be nominal or ordinal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are continuous variables ?

A

Can be quantitative or numerical

Truly measurable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are parametric statistical tests?

A

Used for continuous variables

makes assumptions about frequency distribution of data

more powerful than non parametric tests

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are non-parametric tests?

A

used for discrete variables

makes no assumption about the frequency distribution of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is normal distribution and describe how it looks on a graph?

A

special density curve with a bell shape

Data clusters around a central value

Continuous probability distribution

Normal distribution is closer to a bell shaped curve

Data cluster around a central value

curve is symmetric

curve not too peaked or too flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the parameter for normal distribution of a population?

A

Central tendency:
Average or arithmetic meaning - μp= ∑x/n

Dispersion:
Standard deviation -
σp= √∑(μp-x)^2/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the parameters of normal distribution of a sample?

A

Central tendency:
- Average or arithmetic meaning:
μs = ∑(μs - x)^2/(n-1)

Dispersion
- Standard deviation:
σs= √∑(μs-x)^2/(n-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How is standard deviation derived?

A
  1. Sum each observation subtracted from the mean: ∑(μ-x)
  2. Subtract sums of squares:
    SS = ∑(μ-x)^2
  3. Mean square:
    MS = ∑((μ-x)^2)/n
  4. Standard deviation:
    σ= √∑((μ-x)^2)/n
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are the mean and standard deviation shown in publications?

A

Use mean, standard deviation (σs, μs)

Mean +- standard deviation (σs +- μs)

+- should be confined to confidence intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the coefficient of variation (CV) and its equation?

A

Standard deviation expressed as a percentage of mean is CV

CV = σ/μ * 100
σ = standard deviation
μ = mean(population sample)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the equation of normal distribution?

A

Normal distribution is defined completely by its mean and standard deviation

f(x) = (1/σ√2pi) e^(-(x-μ^2/2σ^2))

F(x) = frequency of a particular value of x

σ = Standard deviation

e = exponent

μ = mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do you convert from a member of the original distribution to a member of the standard normal distributions? (formula)

A

z = +-(x - μp)/σp

z = A value of the standard normal distibution

x = your original observation

μp =Established population mean

σp = Established standard population deviation

Measured x is atypical if z>_ 1.96

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name some properties of normal distribution

A

Most visual functions follow a normal (Gaussian) distribution.

The density of observed values is greatest near the centre of the
distribution.

Outliers, at the edge (or tails) of the distribution, are relatively rare.

The normal distribution allows mathematical prediction of the chance
of a particular data value occurring.

Parametric statistical tests assume normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does a red bell shaped distribution curve depict ?

A

Normal distribution, F(x), of individual values

its dispersion is represented by sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does a blue bell shaped curve depict?

A

Normal distribution of sample means, F(x*) or F(mean)

Its dispersion is represented by 𝝈/√n

where n is the size of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the central limit theorem?

A

sample means from a normal distribution of individual values
are themselves normally distributed.

means of non-normal distributions will also be normally
distributed as long as the samples are large enough

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the 4 step process to calculate the confidence limits for a sample mean?

A
  1. There is a 95% chance that an individual observation,
    belonging to a normal distribution, lies between:
    µp ± 1.96𝝈p ………. (1)
  2. There is a 95% chance that the sample mean, belonging to
    data showing a normal distribution, lies between:
    µp ± 1.96𝝈p/√n ………. (2)
  3. Rearrangement of equation (2) shows that there is a 95% chance
    that the population mean lies between:
    µs ± 1.96𝝈p /√n ………. (3)
  4. The 95% confidence limits of a sample mean equals:
    µs ± t (𝝈s /√n) ………. (4)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What standard deviation ?

A

the variability of a single set of individual measurements.

20
Q

Whats standard error of the mean?

A

the degree of error associated with estimating a sample mean.

21
Q

What is the confidence liits?

A

the limits within which a population mean would fall.

22
Q

When a bell curve is not bell shaped what terms are used instead?

A

Positive skew

Negative skew

23
Q

Whats a positive skew?

A

tail is pulled out towards the upper end of the distribution

curve peak to the left

Higher mean than median

24
Q

Whats a negative skew?

A

tail is pulled toward the lower end of the distribution

Curve/peak is on the left side

lower mean than median

25
Q

what is kurtosis?

A

the degree to which a curve is peak or flat

the greater the kurtosis the larger the peak is

26
Q

What is negative kurtosis?

A

Compared to normal distribution

  • Flat-topped
  • Less observations near the mean
27
Q

Whats is positive Kurtosis?

A

Compared to normal distribution

  • Peaked
  • More observations near the mean
28
Q

What are the two other measures of central tendencies?

A

Mode

Median

29
Q

What is mode?

A

Value of variable 𝒙 with the highest frequency.

The maximum point of the curve.

The most common value.

Can be more than one mode.

30
Q

What is the Median?

A

The middle value of 𝒙 if all values listed in order.

Used for non-parametric tests.

31
Q

Give alternative measures of dispersion?

A

Percentiles

Interquartile range

32
Q

What is a percentile?

A

% of observations below a particular value when ranked in descending order.

50% percentile represents the median.

33
Q

What is a range?

A

An alternate measure of dispersion.

Difference between the largest score and the smallest score
of a distribution

Indicates where any value may lie, including outliers.

34
Q

What is interquartile range ?

A

The numerical distance between the 25th and 75th percentiles.

Includes 50% of distribution cantered on the median value.

Applicable to data that is not normally distributed.

35
Q

Explain how the assessment of risk is carried out for a cohort study

A

Risk factor present - diseases present - a
disease absent - b
total - a+b

Risk factor absent
disease present - c
disease absent - d
total - c+d

total
- a+c
- b+d
- a+b+c+d

Incidence = new occurrences of disease in population at risk during study.

  • Incidence in exposed group = a / (a + b)
  • Incidence in unexposed group = c / (c + d)

Relative risk (RR) = ratio of incidence in exposed group to incidence in unexposed group
- RR = [a / (a + b)]/[c / (c + d)]

95% confidence limits
- Variance = √[1/a + 1/b + 1/c + 1/d]
- Lower limit = RR∙e (-1.96∙variance)
- Upper limit = RR∙e (1.96∙variance)

36
Q

Show the equations used for case control studies

A

Risk factor present - Case(disease) - a
Control(no disease) - b
total - a+b

Risk factor absent
Case - c
Control - d
total - c+d

total
- a+c
- b+d
- a+b+c+d

Prevalence = proportion of population with disease at one point in.

Odds ratio (OR) = odds of exposure in cases divided by odds of exposure in controls
- OR = (a / c)/(b / d) = (ad) / (bc)

95% confidence limits
- Variance = √[1/a + 1/b + 1/c + 1/d]
- Lower limit = OR∙e (-1.96∙variance)
- Upper limit = OR∙e (1.96∙variance)

37
Q

what is validity, related terms, and how is it assesses?

A

Related terms: comparability, accuracy

Ability to correctly measure what is supposed to be measured

Assessed using: Bland-Altman, limits of agreement, kappa

38
Q

what is Discriminative ability, related terms, and how is it assesses?

A

Ability to distinguish normal from abnormal

Assessed using: sensitivity, specificity, likelihood ratios, predictive values and ROC (Receiver Operating
characteristic) curves

39
Q

what is repeatability, related terms,its conditions ,and how is it assesses?

A

Related terms: precision, reliability, variability, reproducibility

Consistency of repeated measurements
- Repeatability conditions: measurements repeated with same method on same patients in same clinic
by same clinician using same equipment

  • Reproducibility conditions: measurements repeated with same method on same patients but in
    different clinics by different clinicians using different equipment

Assessed using: Bland-Altman plots, coefficient of repeatability, kappa

40
Q

Show the evaluation of false-positive and false negative errors

A

Test result positive
Disease present - a
Disease absent - b
total - a + b

Test result negative
Disease present - c
disease absent - d
total - c+d

Total
disease present - a+c
disease absent - b+d
total - a+b+c+d

b = false positive test result

c = false negative test result

a + c = all people with disease

b + d = all people without disease

False-positive error rate (type I or alpha)

  • data indicate that disease is present when it is absent
  • b / (b + d)

False-negative error rate (type II or beta)

  • data indicate that disease is absent when it is present
  • c / (a + c)
41
Q

What is sensitivity?

A

Ability to detect disease when it is present

a / (a + c)

a = true positive test result
d = true negative test result
a + c = all people with disease
b + d = all people without disease

Depends on minimum false-negative errors

Sensitivity + false negative error rate = 1

‘SnNOut’ – sensitive test when Negative rules Out disease

42
Q

What is specificity?

A

Ability to indicate no disease when none is present.

d / (b + d)

a = true positive test result d = true negative test result
a + c = all people with disease
b + d = all people without disease

Depends on minimum false-positive errors

Specificity + false positive error rate = 1

‘SpPln’ – specificity test when Positive rules In disease

43
Q

Explain the evaluation test for bayes theorem, predictive values, pre- and post- test propapility

A

Test result positive
Disease present - a
Disease absent - b
total - a + b

Test result negative
Disease present - c
disease absent - d
total - c+d

Total
disease present - a+c
disease absent - b+d
total - a+b+c+d

a = true positive test result
d = true negative test result
a + c = all people with disease
a + b = all people with a positive test result
c + d = all people with a negative test result
a + b + c + d = all people tested

Pre-test or prior probability (prevalence)
- (a + c) / (a + b + c + d)

Positive predictive value
- Probability of having disease if the test result is positive
- a / (a + b)
- Is Bayes’ theorem rewritten
- Is post-test or posterior probability if the test result is positive

Negative predictive value
Probability of not having disease if the test result is negative
- d / (c + d)
- 1 – negative predictive value is post-test or posterior probability if the test result is negative: effectively c / (c + d)

Predictive values
- Depend on prevalence

44
Q

Explain the evaluation of tests for likelihood ratios, ore- and post-test odds

A

Test result positive
Disease present - a
Disease absent - b
total - a + b

Test result negative
Disease present - c
disease absent - d
total - c+d

Total
disease present - a+c
disease absent - b+d
total - a+b+c+d

a = true positive test result
d = true negative test result
a + c = all people with disease
a + b = all people with a positive test result
c + d = all people with a negative test result
a + b + c + d = all people tested

Positive likelihood ratio
- Ratio of sensitivity to false-positive error rate
- [a / (a + c)] / [b / (b + d)] = (a / b) / [(a + c) / (b + d)]
- The larger the value, the better the test

Negative likelihood ratio
- Ratio of false-negative error rate to specificity
- [c / (a + c)] / [d / (b + d)] = (c / d) / [(a + c) / (b + d)]
- The lower the value, the better the test

Likelihood values
- Independent of prevalence

Pre-test odds
- Odds of disease before the test
- (a + c) / (b + d)
- Also equals pretest probability / (1-pretest probability)

Post-test odds
- Odds of having disease after having been tested positive
- Pretest odds multiplied by positive likelihood ratio
- [(a + c) / (b + d)].{ (a / b) / [(a + c) / (b + d)]} = a / b
- Post-test probability = post-test odds / (post-test odds + 1)

45
Q

what are ROC curves and what does it show?

A

Plot of sensitivity versus specificity.

Analysis is considered as a reliable method
for evaluating the diagnostic ability.

It shows the trade-off between sensitivity
and specificity

46
Q

Explain the Kappa evaluation

A

Used to test against a gold standard

Table
Test 1
Test 2
- test positive - test positive = a
- test positive - test negative = b
total = a+b

test negative - test positive = c
test negative - test negativ = d
Total = c+d

Total
positiv - a+c
negative- b+d
total = a+b+c+d

Observed agreement (Ao) = a + d

Maximum possible agreement (N) = a + b + c + d

Overall Percentage agreement = 100 [(a + d) / (a + b + c + d)]

Cell a agreement expected by chance = [(a + b)(a + c)]/(a + b + c + d)

Cell d agreement expected by chance = [(c + d)(b + d)]/(a + b + c + d)

Total agreement by chance (Ac)

Cell a agreement expected by chance + cell d agreement expected by chance

Intraclass Kappa = (Ao - Ac)/(N - Ac)

47
Q
A