4. Data, Distribution, Populations, and Samples Flashcards by Jessica Abel

What is a categorical Variable?

Individuals classified into one

of several categories

How well did you know this?

Not at all

Perfectly

What are different types of categorical variables?

Binary variables: Only two categories e.g. female/not female,
yes/no

Ordinal variables: >2 categories and they are ordered
e.g. pain (non/mild/moderate/severe)

Nominal variables: Neither binary nor ordinal
e.g. ethnicity (Caucasian, Asian, Black)

How well did you know this?

Not at all

Perfectly

Whats a binary variable?

Only two categories e.g. female/not female,

yes/no

How well did you know this?

Not at all

Perfectly

What is an ordinal variable?

: >2 categories and they are ordered

e.g. pain (non/mild/moderate/severe)

How well did you know this?

Not at all

Perfectly

What is a nominal variable?

Neither binary nor ordinal

e.g. ethnicity (Caucasian, Asian, Black)

How well did you know this?

Not at all

Perfectly

What is a numerical variable?

Measured on a number scale

How well did you know this?

Not at all

Perfectly

What are different types of discrete variables?

Discrete variables: Distinct number of values
e.g. age in years, number of transfusions, parity
Continuous variables: Any value within a certain range
e.g. blood pressure, head circumference

How well did you know this?

Not at all

Perfectly

What is a discrete variable?

Distinct number of values

e.g. age in years, number of transfusions, parity

How well did you know this?

Not at all

Perfectly

What is a continous variable?

Any value within a certain range

e.g. blood pressure, head circumference

How well did you know this?

Not at all

Perfectly

What is important about collecting data?

That you collect it in its most informative measure as you will never be able to return back to the point of collection. Data variables can be transformed into different types later on.

How well did you know this?

Not at all

Perfectly

What are different ways of summarising categoric data?

Proportion = Number in the category divided by the total

Percentage = proportion x 100

Rate = the number with the event per / (people or time)

Odds =Number in the category / number not in that category

How well did you know this?

Not at all

Perfectly

What is the Mean?

Most widely known measure of centre

= sum of all measurements/total number of measurements

How well did you know this?

Not at all

Perfectly

Whats the median?

value that falls halfway along the
frequency distribution (50th Centile)

How well did you know this?

Not at all

Perfectly

How do you summarise the median in odd and even numbers?

Odd - exactly in the middle

Even - the average of the middle two values

How well did you know this?

Not at all

Perfectly

When data is symmetric how will this effect the mean and median?

They will be exactly the same.

How well did you know this?

Not at all

Perfectly

What is bad about the mean as a comparison?

Study These Flashcards

The mean is highly influenced by a single extreme value. This skewed distribution can unfairly affect the mean, which will then not accurately represent the data set.

What is the aim when summarising numeric measurements?

Study These Flashcards

Aim: to best summarise the data

 Median = always representative of the centre of the data

 Whereas, the mean is only representative if the distribution of the
data is symmetric

 Mean = each measure is directly involved in its calculation  very
sensitive to changes in the data and heavily influenced by outlying
measurements

Describe the Range and its positives and negatives.

Study These Flashcards

 Range: difference between the largest and the
smallest values of the distribution

 It ignores the bulk of the data

 By definition, it depends on the two most
extreme (and hence possibly ‘odd’) values

Describe the IQR and its + and -.

Study These Flashcards

Inter-quartile range: the range within which 50% of the
sample values lie

 More representative of the majority of the data

 Does not depend on the oddest or extreme values like
the range does

 More stable summary measure

What is variance?

Study These Flashcards

A measure of variation. It tells us the deviations observed from the mean.

How is variance (S2) calculated?

Study These Flashcards

Sum of deviations/ (n-1)

What is the standard deviation?

Study These Flashcards

The average deviation from the mean.

How can the SD be calculated?

Study These Flashcards

Square root of variance.

Describe the SD and its + and -.

Study These Flashcards

Standard deviation is more sensitive to changes in the
data than the range and inter-quartile range

 Standard deviation = more powerful summary measure
of the spread of the data as it makes more
comprehensive use of the entire dataset

 If the mean is NOT a meaningful summary of the centre
of the data, then the same follows for the standard
deviation as this is based on distances from the mean

If data is evenly distributed how should it be represented?

1. Mean | 2. Variance or SD

How should skewed data be represented?

1. Median | 2. IQR

What is the normal distribution?

It is completely and fully defined by only two parameters, mean and standard deviation. It is represented in an even, symmetrical distribution of data.  Mean always at the centre of the distribution (at its peak)  Standard deviation presents how spread the distribution is around the mean

What is the SD with 95% of values within normal distribution?

95% of values will lie ±1.96 SD from the mean

What is the SD with 68% of values within normal distribution?

68% of values will lie ±1 SD from the mean

How can one test normality with data?

Statistically test whether the data could have come from a normal range. This can easily be performed by calculating the range about the mean of a SD of ±1.96 (testing that 95% of the population falls within this range demonstrates its normal) Mean - 1.96 Mean + 1.96 If its lower limit is infeasible then its not normal.

What is the target population?

The population that is | ideal for meeting the measurement objectives

What is the survey population?

The target population modified to take into account practical constraints

What is the standard error?

It measures how precisely the population mean is estimated by the sample mean.

How standard error calculated?

Standard error of the mean is calculated by dividing the standard deviation of the measurements by the square root of the sample size.

How do we know the SD for populations when testing on samples?

We do not usually know the STANDARD DEVIATION of the population whose mean we are trying to estimate.  We use the sample standard deviation to approximate the population standard deviation.  This is OK if the sample is not small (> 20) and the sample is approximately normally distributed

What is a confidence interval?

A range of values so defined that there is a specified probability that the value of a parameter lies within it. The interval (sample±1.96SE) will contain the population mean for 95% of random samples. The ends of the interval are known as the limit.

What is the benefit of a CI?

They attach a level of precision to a sample estimate to help interpretation.

4. Data, Distribution, Populations, and Samples Flashcards

(37 cards)