Descriptive and Inferential Statistics Flashcards

1
Q

What is descriptive statistics?

A

Deal with an entire dataset e.g. population, goal is to summarise raw data and represent graphically and not extend conclusions beyond the observed dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is inferential statistics?

A

Goal is to make inferences beyond your data. To infer something about a population based on a smaller model, or sample. Make an estimation of a population parameter from a statistic or test a hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the arithmetic mean?

A

Add all values and divide between number of values. Sensitive to outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the geometric mean?

A

Multiply values and take nth root. Can reduce the effect of outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the weighted mean?

A

Times each value but its ‘weight’, add together and divide by value of all weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What measure of centrality is best for normally distributed data?

A

Mean, median or mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What measure of centrality is best for negatively or positively skewed data?

A

Mode (3 measures of centrality will not coincide)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What measure of central tendency would we use for categorical data?

A

Mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Define variation

A

Average distance an observation is from the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you calculate variation?

A

Subtract each value from the mean, then square the result. Then work out arithmetic mean of these numbers.
‘sum of squared differences from the mean’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is standard deviation?

A

Square root of variance.

Larger sd= wider spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why and how to we adjust variance equation?

A

Divide by n-1 instead of n.

This brings variance estimation closer to true population variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Can we use variance and sd for all types of data?

A

Only for normally distributed data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What measures of variation can we use for skewed data?

A

Quartiles or box plots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the empirical rule?

A

States that 68.26% data values lie within +/- 1 sd

  1. 45% within +/- 2sd
  2. 74% within +/-3 sd
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the standard error and how do we calculate it?

A

Value that tells us the precision of a sample based estimate.
sd/ square root of n

17
Q

How would you represent continuous data?

A

histograms, box-plots, normal plots

18
Q

What is a confidence interval?

A

A confidence interval describes the amount of uncertainty associated with a sample estimate of a population parameter. It describes the margin of error either side of our point estimate.

19
Q

What is a type 1 error?

A

A false positive - falsely rejecting a null hypothesis. ‘optimist’

20
Q

What is a type 2 error?

A

A false negative - falsely accepting a null hypothesis. ‘pessimist’

21
Q

What does a p-value represent?

A

They evaluate how well the sample data support the null hypothesis. High p = data are likely with a true null. Low p = data are unlikely with a true null.

22
Q

In estimation statistics what measures of interest are there?

A

mean, prevalence of a disease (proportion), regression line, RR, OR

23
Q

What are the two categories of estimation?

A

Point: single value statistic e.g. estimated mean or proportion.
Interval: defined by two numbers, between which the population parameter is estimated to lie, with a high degree of probability. i.e. confidence intervals