Stats Concepts Flashcards

1
Q

What is a variable?

A

A feature of individual units within a study (e.g. people); something that we can observe or measure.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the four things an outcome could be?

A

An observation (at one moment in time - attained weight of a baby at 6 months; mortality, status (dead/alive)), a time to an event (that may or may not happen-Time to death), a count (independent of time - number of measles cases), a rate (dependent on time - No of deaths per 1000 person-years)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is binary data?

A

Categorical (not numerical) data which only has 2 alternatives/options - an example would be dead/alive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is nominal data?

A

Categorical (not numerical) data which has more than 2 alternatives/options but which has no natural order - classic examples are ethnic groups or blood type

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ordinal data?

A

Categorical (not numerical) data which has more than 2 alternatives/options and a natural order to it; for example - hypertensive/borderline/normal/hypotensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is discrete data?

A

Numerical data (quantitative) that is a count - for example, number of measles cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is continuous data?

A

Numerical data (quantitative) where there is an infinite number of values the data can take - for example, blood pressure, weight, age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is positively skewed data?

A

Data who’s frequency distribution is skewed to the right on the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is negatively skewed data?

A

Data who’s frequency distribution is skewed to the left on the x axis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are some examples of continuous probability distributions?

A

Normal distribution, t-distribution, f-distribution, chi-squared distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are some examples of discrete probability distributions?

A

Binomal, poisson, uniform

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is meant by the range of data?

A

Simply the highest score minus the lower score; it is the range of scores you would see in your sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is VARIANCE?

A

The average squared distribution from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does variance tell us?

A

On average, how much the scores are distributed around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is STANDARD DEVIATION?

A

It is the square root of the variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What happens to the mean in the context of a skewed distribution?

A

It no longer gives a good impression on the central tendency of the observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is a more appropriate measure than the mean for skewed distributions?

A

When data are skewed, it is more appropriate to use the median and interquartile range (75th to 25th quartile) as your descriptive statistics. If data is positively skewed, a log transformation will also help

18
Q

What is the only time the mean is appropriate?

A

When the data is normally distributed

19
Q

What are population values?

A

The true values of a measure in a population. They define the population.
• Mean, μ
• standard deviation, σ

20
Q

What is a sample statistic?

A

The value of the measure in a sample of the population. It is calculated from the observations in the sample
• Sample mean, x̄
• Sample standard deviation (SD), s

21
Q

What is sampling error?

A

Information gained from a single sample is the “best estimate” of what is true in the population
• In truth, the sample statistic may be somewhat larger or smaller than the true population value (i.e. uncertainty)
• This is due to sampling error

22
Q

What is standard error?

A

It is the measure of the accuracy of the sample estimate. It calculates how far from the true (but unknown) population value the sample estimate is likely to be - basically, how large an error we are likely to be making.
The standard error of the mean would be calculated as: = SD/ square root of n = s/ square root of n

23
Q

What are the two main methods of statistical inference?

A
Hypothesis testing (significance testing)
Estimation (confidence intervals)
24
Q

What does low SD indicate?

A

Data points are close to the mean

25
Q

What does high SD indicate?

A

Data points are far from the mean

26
Q

95 of values lie within how many SDs of the mean?

A

2 (1.96 to be precise)

27
Q

How do you calculate a CI range of values?

A

95% of effect estimates for ‘large’ samples have values
between(effect - 1.96 SE) and (effect + 1.96 SE). SO, you would calculate as:
For 95% CI for diff = MEAN ± (1.96 x SE) = upper value
= MEAN - (1.96 x SE) = lower value
95% CI (upper value to lower value)

28
Q

Describe the relationship between p values and confidence intervals

A

If the ‘no effect’ value falls outside the CI then the result is statistically significant
• Confidence intervals and P-values present complementary information
• Confidence intervals show the range within which the true treatment effect is likely to lie
• P-values measure the strength of the evidence against a hypothesis of particular interest: the null hypothesis

29
Q

How is a CI determined?

A

CI is determined by the Standard Error – a measure that combines SD and sample size, n. Standard Error (SE) = SD/ square root of n

30
Q

What is the definition of a CI?

A

A confidence interval provides a range of plausible values for the POPULATION mean, not the SAMPLE mean

31
Q

What does a p-value measure?

A

The strength of the evidence against a

hypothesis of particular interest: the null hypothesis

32
Q

What is the definition of incidence?

A

The number of NEW cases of a disease/condition during a population at risk of developing the disease/condition during a specified time period

33
Q

How is cumulative incidence different from incidence rate?

A

Incidence rate uses person-time (the sum of the disease-free time) as the denominator, incidence uses the number of people at risk of developing disease/condition during a specified time period

34
Q

what is the definition of cumulative incidence?

A

o The cumulative incidence or risk of a disease is the
probability that the disease occurs during a specified time period.
o Equivalently, cumulative incidence can be defined as the percentage of the at risk population in which the disease occurs during a specified time period.

35
Q

What is the question that cumulative incidence answers?

A

“what is the probability or chance that an individual

develops the outcome in a defined period of time?”

36
Q

When do we use the incidence rate (sometimes called incidence density) instead of simply the incidence?

A

When all people are not observed for the full time
period or not at risk for the full time period we need to
consider “person-time” at risk and report the incidence
rate (also sometimes called incidence density

37
Q

What question does incidence rate answer?

A

“at what rate are new cases of the disease occurring within the at risk population”

38
Q

What is the definition of prevalence, and how is it calculated?

A

Prevalence= Number of cases observed at time t /

Total number of individuals at time t

39
Q

What question does prevalence answer?

A

“what fraction of the group is affected at this moment in time?”

40
Q

What is the definition of a p value

A

A p value is the probability of having observed our data (or more extreme data) given that the null hypothesis is true.