Statistics Flashcards
Describe the different types of data
- Quantitative - A measurement which can either be discrete or continuous (Discrete are whole numbers eg counts where as continuous measurements take any value eg height)
- Qualitative - When objects are classified into groups and this can be either ordinal or nominal (In ordinal there is a numerical relationship between the groups whereas in nominal there is so order to the groups. Categorical data with only two values is binary.
What is a stratified sample?
Used in a study where certain categories need to be represented. The population is divided into strata and a random sample is chosen from each of these.
What type of data do pie charts and bar graphs usually represent?
Categorical variables
Which graphs are used to visualise the distribution of continuous data?
Histograms
Stem and leaf plots
Box and whisker plots
What are scatter plots used to visualise?
The relationship between two variables
Why is there no gaps between the bars on a histogram?
The data they represent is continuous, whereas is bar charts it is categorical or discrete
What is the total area of the colums equal to in a relative frequency histogram?
1
What do scatter plots represent?
The relationship between two quantitative variables
How do you calculate the strength of the relationship between the two variables in a scatter plot?
Calculating the coefficient of correlation
What is the line fitted to a scatter plot called?
Regression line
Why is the mean not useful in skewed data? What would be a better estimate of this?
It is very sensitive to outliers. The median is not sensitive to outliers.
What is the adjustment that must be made when calculating sample variance to make an unbiased estimate of population value?
Denominator must be n-1, not n
What does a larger standard deviation tell you about the spread of the data?
Large SD = Wide spread of data
What does standard deviation measure?
The spread of the data around the mean
What does positively skewed mean?
Most values lie towards the bottom end of the range with a tail to the right (larger end of the range)
Give two measurements in healthcare that are most often positively skewed?
Units of alcohol drunk or number of cigarettes smoked.
What does negatively skewed mean?
Most values lie towards the upper end of the range with a tail to the left.
If you get a coefficient of skewness of 0 what does this mean?
Data distribution is symmetrical
If you get a coefficient of skewness of 1 what does this mean?
Positive skew
If you get a coefficient of skewness of -1 what does this mean?
Negative skew
When do you use the normal distribution?
Continuous variables such as lengths, heights and weights.
When do you use the binomial distribution?
Binary data such as alive and dead, male and female
When do you use the poisson distribution?
Rare events and events occurring at random intervals of time and space.
What are the characteristics of the normal distribution?
- Bell shaped
- Single central peak
- Symmetrical
- Equal mean, median and mode
- Continuous
- Takes values between -ve infinity and + infinity
What is the mean and standard deviation of the standard normal distribution?
Mean 0
Standard Deviation 1
Describe how you would standardise any normally distributed variable?
Subtracting the mean and dividing by the standard deviation:
((Any of the data values) - Mean) / Standard deviation
Why do we standardise normal data?
- To allow us to compare data
- To perform more advanced statistical tests
- If 0 is in the centre the centile are easier to calculate
- There is only one table of probabilities for normal data
What is the area used the normal probability density function curve?
1
How do you calculate the 95% reference ranges of a set of normally distributed data?
mean +/- 1.96 x Standard Deviation
If a population is believed to have a normal distribution with a mean (û) and a KNOWN standard deviation (õ) then where are 95% of the data values expected to lie?
Mean +/- 1.96 x Standard Deviation
What formal statistical test measured how close the data is to normal distribution?
Shapiro Wilk statistic
Give examples of ways of transforming data and in what circumstance you would use each one?
- Logarithmic: Variances are proportional to the mean, fairly skewed data
- Square root: Fairly skewed, counts
- Reciprocal: Highly skewed data
- Cube transformation: Data relating to volumes
- Logit: Proportions
How do you make standard deviation of the sample unbiased compared to the population standard deviation?
Calculate it with denominater n - 1 not n
What is a confidence interval?
The range we would expect, given a certain level of confidence, to include the population parameter
What is wider - at 95% or 99% confidence interval?
99%
What is standard error?
The standard deviation around the mean
How do you calculate standard error?
Standard deviation/ (square root of number of items)
If you assume that the sample mean is approx normally distributed then where would you expect 95% of samples in the population to lie?
Sample mean +/- standard error of sample mean
When do you use the T distribution?
When estimating the mean in normally distributed populations when the sample size is small and the population standard deviation is unknown.
Why do we do a hypothesis test?
To assess the validity of a claim about a population parameter.
In t distribution at what value of the test statistic do you reject the null hypothesis?
Over 1.96 (+ve or -ve)
What is a type 1 error?
Rejecting a true null hypothesis
What is a type 2 error?
Accepting a false null hypothesis
What does the level of significance mean in a study in relation to errors?
The level of significance = the probability of making a Type 1 error. Usually this is set at 5% (95% confidence level)
What is the general accepted risk of making a type 2 error?
0.2 (20%)
How do you reduce the risk of a type 2 error?
Increasing sample size