Interpreting Data Flashcards
what are the two types of data
- Qualitative
- Quantitative
describe what qualitative data splits into
Qualitative data splits into nominal (unordered) and ordinal (ordered e.g. short medium tall)
- Nominal this then split into binary (yes or not questions) and categorical (e.g. different colours)
what is the name of the data that is unordered in qualitative data
nominal
What is the name of the data that is ordered in qualitative data
ordinal
What is quantitative data split into
- discrete ( 10 graduates - whole number)
- Continuous ( length in cm - doesn’t have to be a whole number)
What are two other ways in which you can summarise data
Measure of location
measure of spread
What makes up the measure of location
- Median = Middle value when the values are ordered from smallest to largest
- Mode = the most common value
- Mean = average = sum of all of the values divided by the number of values
What makes up the measure of spread
- standard deviation
- interquartile range
When is it better to use the median over the mean
- Better to use median in order to avoid the influence of outliers (large or very small numbers that can be incorrect in the data)
- Also use it when the data is skewed
When is it better to use the interquartile range over standard deviation
- Use the interquartile range in order to avoid the influence of outliers
- also used when the data is skewed
How do you work out the interquartile range
range is between the 25th and 75th percentile
e.g. 1, 2, 3, 4, 5, 6, 7 Interquartile range (IQR) = 2 to 6
How do you work out the standard deviation
- work out the mean
- then from each result subtract the mean and square the result
- then divide by N (number of participants)
- then square root it
Why is AFP levels important
If you aren’t pregnant, an AFP test can help to diagnose and monitor certain liver conditions, such as liver cancer, cirrhosis, and hepatitis.
What is an antenatal thyroid screening test
- this is a test that screens thyroid and therefore is able to prevent defects in the babies
What is another name for the Gaussian distribution
normal distribution
what two things is the normal distribution determined by
- Normal distribution is determined only by the mean and standard deviation
What happens if you change the mean to the normal distribution curve
- the curve moves left and right but stays the same height - if it decreases it moves to the left whereas if it increases it moves to the right
What happens if you change the standard deviation to the normal distribution curve
- the height of the curve changes but the area under the curve remains the same
- as the number increases the curve becomes more flattened
What are the characteristics of Gaussian distribution
• A constant proportion of values will lie within any specified number of Standard Deviations above or below the mean
What standard deviation correlates to the
- 99% range
- 95% range
- 90% range
99% range (0.5th to 99.5th centile) = mean ± 2.58 SDs
95% range (2.5th to 97.5th centile) = mean ± 1.96 SDs
90% range (5th to 95th centile) = mean ± 1.64 SDs
How do you calculate the 95% percentile
Mean +- 1.96 x standard deviation
what is statstics used for
- Statics used for our sample to tell us something about the population
What does the population contain
Population contains the true mean
What happens if the sample size is large enough
If the sample size isn’t too small then the distribution of the sample mean will be Gaussian
what is the standard deviation of the sample size
the standard error of the mean
What is standard error
The standard error is a measure of the statistical accuracy of an estimate
What is the standard error of the mean
- The standard error of the mean is the standard deviation of the distribution of all possible sample means
How do you work out the standard error of the mean
= Standard deviation/ square root of sample size
How do you work out a confidence interval
95% confidence interval = sample mean +- 1.96 x standard error
Define the confidence interval
a range of values so defined that there is a specified probability that the value of a parameter lies within it.
How would you right about the confidence internval in an exmaple
IN THE POPULATION we are 95% sure that the mean weight could be as low as 75kg or as high as 81kg
When do we use standard deviation
- use standard deviation for ranges (for individual values)
When do we use standard errors
- use standard error for confidence intervals (for means)
What happens as the sample size increases
- As the sample size increases the 95% confidence interval gets narrower, this is because the standard errors get smaller
- Increase in accuracy therefore you can be more confident in the accuracy of our estimate
Describe the different types of correlation and there numbers
- R = 0 - no correlation
- R = 1 – perfect positive correlation
- R = -1 – perfect negative correlation
What is the correlation coefficient
- R = 0 - no correlation
- R = 1 – perfect positive correlation
- R = -1 – perfect negative correlation
define the correlation coefficient
a number between +1 and −1 calculated so as to represent the linear interdependence of two variables or sets of data.
How do you work out linear regression
Y = a + bx Y = outcome (deponent variable) X = predictor (independent variable) a = the point at the line crosses the X axis
What is the dependent variable
a variable (often denoted by y ) whose value depends on that of another
What is the independent variable
a variable (often denoted by x ) whose variation does not depend on that of another
Why do you want to know if the result is statistically significant
- An observed sample difference between groups might be due to chance
- We want to know whether a result is statistically significant i.e. unlikely to be due to chance
How do you determine if the result is statistically significant
• To determine whether an observed difference was due to chance we look at confidence intervals and p-values
How do you work out the confidence intervals between two groups
95% CI = mean difference ± 1.96 × SE of mean difference
What is a P value
a p-value for a result is the probability of observing a result as or more extreme than the sample result if the underlying assumption in the population is true
When is a confidence interval result significant
- Doesn’t cross 0 therefore there is a difference in the population
- If the confidence interval crossed 0 then there might not be a difference
When is a P value statistically significant
when the value calculated is less than 0.05
When can P values be calculated
When there is a comparison
- 2 means – are they different i.e. is their difference different from 0?
- Association – are the observed results different from those expected
- Regression – is the slope different from 0?
Where does the P value come from
The p-value comes from a chi-squared test. P=0.002, so we can be confident there is an association
What is the chi squared test used for
categorical variables
What is a T test used for
Comparing continuous variables