4. Data, Distribution, Populations, and Samples Flashcards

1
Q

What is a categorical Variable?

A

Individuals classified into one

of several categories

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are different types of categorical variables?

A

Binary variables: Only two categories e.g. female/not female,
yes/no

Ordinal variables: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)

Nominal variables: Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Whats a binary variable?

A

Only two categories e.g. female/not female,

yes/no

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is an ordinal variable?

A

: >2 categories and they are ordered

e.g. pain (non/mild/moderate/severe)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a nominal variable?

A

Neither binary nor ordinal

e.g. ethnicity (Caucasian, Asian, Black)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a numerical variable?

A

Measured on a number scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are different types of discrete variables?

A

Discrete variables: Distinct number of values
e.g. age in years, number of transfusions, parity
Continuous variables: Any value within a certain range
e.g. blood pressure, head circumference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a discrete variable?

A

Distinct number of values

e.g. age in years, number of transfusions, parity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a continous variable?

A

Any value within a certain range

e.g. blood pressure, head circumference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is important about collecting data?

A

That you collect it in its most informative measure as you will never be able to return back to the point of collection. Data variables can be transformed into different types later on.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are different ways of summarising categoric data?

A

Proportion = Number in the category divided by the total

Percentage = proportion x 100

Rate = the number with the event per / (people or time)

Odds =Number in the category / number not in that category

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Mean?

A

Most widely known measure of centre

= sum of all measurements/total number of measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Whats the median?

A
value that falls halfway along the
frequency distribution (50th Centile)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you summarise the median in odd and even numbers?

A

Odd - exactly in the middle

Even - the average of the middle two values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When data is symmetric how will this effect the mean and median?

A

They will be exactly the same.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is bad about the mean as a comparison?

A

The mean is highly influenced by a single extreme value. This skewed distribution can unfairly affect the mean, which will then not accurately represent the data set.

17
Q

What is the aim when summarising numeric measurements?

A

Aim: to best summarise the data

 Median = always representative of the centre of the data

 Whereas, the mean is only representative if the distribution of the
data is symmetric

 Mean = each measure is directly involved in its calculation  very
sensitive to changes in the data and heavily influenced by outlying
measurements

18
Q

Describe the Range and its positives and negatives.

A

 Range: difference between the largest and the
smallest values of the distribution

 It ignores the bulk of the data

 By definition, it depends on the two most
extreme (and hence possibly ‘odd’) values

19
Q

Describe the IQR and its + and -.

A

Inter-quartile range: the range within which 50% of the
sample values lie

 More representative of the majority of the data

 Does not depend on the oddest or extreme values like
the range does

 More stable summary measure

20
Q

What is variance?

A

A measure of variation. It tells us the deviations observed from the mean.

21
Q

How is variance (S2) calculated?

A

Sum of deviations/ (n-1)

22
Q

What is the standard deviation?

A

The average deviation from the mean.

23
Q

How can the SD be calculated?

A

Square root of variance.

24
Q

Describe the SD and its + and -.

A

Standard deviation is more sensitive to changes in the
data than the range and inter-quartile range

 Standard deviation = more powerful summary measure
of the spread of the data as it makes more
comprehensive use of the entire dataset

 If the mean is NOT a meaningful summary of the centre
of the data, then the same follows for the standard
deviation as this is based on distances from the mean

25
Q

If data is evenly distributed how should it be represented?

A
  1. Mean

2. Variance or SD

26
Q

How should skewed data be represented?

A
  1. Median

2. IQR

27
Q

What is the normal distribution?

A

It is completely and fully defined by only two
parameters, mean and standard deviation. It is represented in an even, symmetrical distribution of data.

 Mean always at the centre of the distribution (at its
peak)

 Standard deviation presents how spread the
distribution is around the mean

28
Q

What is the SD with 95% of values within normal distribution?

A

95% of values will lie ±1.96 SD from the mean

29
Q

What is the SD with 68% of values within normal distribution?

A

68% of values will lie ±1 SD from the mean

30
Q

How can one test normality with data?

A

Statistically test whether the data could have come from a normal range. This can easily be performed by calculating the range about the mean of a SD of ±1.96 (testing that 95% of the population falls within this range demonstrates its normal)

Mean - 1.96
Mean + 1.96

If its lower limit is infeasible then its not normal.

31
Q

What is the target population?

A

The population that is

ideal for meeting the measurement objectives

32
Q

What is the survey population?

A

The target population
modified to take into account practical
constraints

33
Q

What is the standard error?

A

It measures how precisely the population mean is estimated by the sample mean.

34
Q

How standard error calculated?

A

Standard error of the mean is calculated by
dividing the standard deviation of the
measurements by the square root of the sample
size.

35
Q

How do we know the SD for populations when testing on samples?

A

We do not usually know the STANDARD
DEVIATION of the population whose mean we are
trying to estimate.

 We use the sample standard deviation to
approximate the population standard deviation.

 This is OK if the sample is not small (> 20) and
the sample is approximately normally distributed

36
Q

What is a confidence interval?

A

A range of values so defined that there is a specified probability that the value of a parameter lies within it.

The interval (sample±1.96SE) will contain the population mean for 95% of random samples.

The ends of the interval are known as the limit.

37
Q

What is the benefit of a CI?

A

They attach a level of precision to a sample estimate to help interpretation.