Stats Flashcards
A characteristic of an individual measured or recorded in a study. What is this describing
A variable
What is the difference between qualitative and quantitative variables
- Categorical (or qualitative) variables which arise when an individual falls into a category. These can be subdivided into:
(a) Nominal categorical variables - which have no ordering e.g. sex (male\ female), blood group (A\B\AB\O)
(b) Ordinal categorical variables - which have an ordering e.g pain (mild \ moderate \ serve); breast cancer stage (1,2,3,4). - Quantitative (or interval scale) variables which arise when a response is measured on a scale e.g. height (in cm), temperature (in oC), blood pressure (in mmHg).
Which one of the following variables is nominal categorical:
a) Number of episodes of disease
in a patient over a year.
b) Serum bilirubin level.
c) Blood group (O/A/B/AB).
d) Severity of haemophilia ( mild/moderate/severe).
e) Reduction in blood pressure following antihypertensive treatment.
C
What does a frequency distribution show
A frequency distribution shows the frequency (or count) of the occurrence of different values of a variable, and may be presented either as a table or as a graph (called a bar chart).
What is relative frequency
relative frequency is presented which is the frequency expressed as a proportion (or percentage) of the total frequency.
What is the mean and how is it calculated
(Average)
(i) The mean is the most widely used measure of location.
i. e. the sum of all the observations divided by the total number of observations.
What is the median
The median is the middle value if a sample is arranged in increasing order
What is the range
The range is the difference between the largest and smallest observations in the sample
What is the problem with using the range
severely affected by outlying observations
What is the interquartile range
(ii) The interquartile range is the difference between the third and first quartiles.
What is the variance
The variance (s2) is approximately the arithmetic mean of the squared deviations of the values from their mean
(Distance of each observation from the mean)
The mean of a set of values:
a) Is a useful summary measure for a nominal categorical variable.
b) Coincides with the median if the distribution of the data is symmetrical.
c) Is always greater than the median.
d) Cannot be calculated if the data set contains both positive and negative values.
e) Is a useful summary measure of location if the data are skewed.
B
What is standard deviation
(iv) The standard deviation is the square root of the variance. It has an advantage of being in the original scale of measurement, and is therefore used in preference to the variance.
What is the coefficient of variation
This provides a measure of variation which is independent of the unit of measurement and hence can be used to compare the variation of variables measured on different scales.
The median:
a) Is a useful measure of the spread of the data.
b) Is a useful summary measure when the data are skewed.
c) Is always less than the mean when the data are skewed.
d) Can be distorted by outliers.
e) Is equal to the 66th percentile
B
What’s the significance of the mean and standard deviation in a normal distribution
The mean determines how far right or left the distribution sits on the x-axis. The standard deviation determines the width of the distribution; the larger the standard deviation the wider and shorter the distribution.
What does the z value represent in normal distribution
Occasionally values of a variable are converted to Z-scores. This is equivalent to counting the number of standard deviations above or below the mean a value is
What is the definition if probability
Using the relative frequency definition, the probability of an event of interest occurring in an experiment is the proportion of times the event of interest occurs (its relative frequency) when the experiment is repeated a large number of times.
Which one of the following variables is nominal categorical:
a) Number of episodes of disease in a patient over a year.
b) Serum bilirubin level.
c) Blood group (O/A/B/AB).
d) Severity of haemophilia (mild/moderate/severe).
e) Reduction in blood pressure following antihypertensive treatment.
C
Which one of the following variables is ordinal categorical:
a) Number of episodes of disease in a patient over a year
b) Serum bilirubin level
c) Blood group (O/A/B/AB)
d) Severity of haemophilia (mild/moderate/severe)
e) Reduction in blood pressure following antihypertensive treatment
d
Which one of the following variables is interval scale:
a) Height in cm.
b) Ethnic group.
c) Social class (I/II/III-N/III-M/IV/V).
d) Age categorised as young, middle-aged or old.
e) Blood group.
a
Which one of the following statements is true:
a) A nominal variable has categories that can be ordered in some way.
b) Quantitative data arises when an individual falls into categories.
c) Categorical and quantitative data are presented in exactly the same way.
d) A categorical variable can be either nominal or ordinal.
e) Nominal data are usually measurements made on a scale.
d
A histogram:
a) Can be used to display any type of variable.
b) Is the same as a bar chart but there are larger gaps between the bars.
c) Contains bars, with the height of each bar being proportional to the frequency of the observations in the range specified by the bar.
d) Can be used instead of a pie chart to display categorical data.
e) Is used to show the relationship between two variables.
c
A bar chart:
a) Is used to display categorical data.
b) Should be drawn without gaps between the bars.
c) Can only be used to display data which have a symmetrical distribution.
d) Can be used to display any type of variable.
e) Is used to show the relationship between two variables.
a
The mean of a set of values:
a) Is a useful summary measure for a nominal categorical variable.
b) Coincides with the median if the distribution of the data is symmetrical.
c) Is always greater than the median.
d) Cannot be calculated if the data set contains both positive and negative values.
e) Is a useful summary measure of location if the data are skewed.
b
The median:
a) Is a useful measure of the spread of the data.
b) Is a useful summary measure when the data are skewed.
c) Is always less than the mean when the data are skewed.
d) Can be distorted by outliers.
e) Is equal to the 66th percentile
b
Which one of the following statements is true. The standard deviation:
a) Is a measure of location.
b) Has the same units of measurement as the raw data.
c) Is a measure of spread which is equal to the range.
d) Is unaffected by outliers.
e) Is an appropriate measure of spread for skewed data.
b
What is a ‘population’
A population is any collection of individuals (or measurements made on those individuals) in which we are interested.
What is the population distribution
The frequency distribution of a variable in the population is referred to as the population distribution
What are population parameters
Summary values (e.g. means, proportions) calculated in populations are referred to as population parameters
What is a sample
A sample is any subset of a population and is ideally selected to be representative of the population
What makes a sample random
A random sample is one chosen in such a way that each member of the population has the same chance of selection.