Biostatistics Flashcards
What do we use representative samples for?
To make an inference about the population of interest. Because it is too costly and hard to investigate the entire population.
How do you calculate %?
100 x number with characteristics / total number
How do you calculate a proportion?
number with characteristic / total number
How do you find the mean?
Sum of all values / total number of observations
What is the mean?
Average value
Define categorical variables
variables that can be categorised, cannot be in-between. must be one or the other.
e.g. eye colour
Define continuous variables
Variables that are measurable, can take on any value.
e.g. height
What graph is continuous variables best shown on?
Histogram
X-axis: value
Y-axis: percentage or number of the people I that range.
What is the standard deviation?
The measure of the spread of data.
What are 2 types of errors?
> Make our answers uncertain - unavoidable
> Moves our answers away from the truth -avoidable
What is an example of uncertainty?
variability and is caused by taking a sample and measuring things imperfectly.
How is bias avoided?
Taking a random sample from the whole population
What are terminologies?
describe the centre and spread of the distributions.
What is the relationship between SD and sample size?
The larger the sample size, the smaller the standard deviation, thus less variability.
What distribution does sampling distribution follow?
Normal - bell symmetric bell-shaped curve
What percentage of the sample mean lies within ±1.96 sd of the mean
95%
What is the equation for the regression line?
y = a + b x X y = what we trying to determine a = intercept b = slope x = variable
What sample size determines if the distribution is normal and standard error can be calculated?
30
How is the standard error calculated?
SE=s/√n
What are the 95% confidence intervals used for?
If repeated sampling was carried out, 95% of the intervals would contain the true population mean.
What is the relationship between sample size and confidence intervals?
As one increases the other becomes smaller
The equation for confidence intervals
CI=Estimate±1.96×SE
The equation for the means
x̄±1.96×s/√n
What is the statement used for confidence intervals?
“We are 95% confidence that the true population mean lies between the lower and upper confidence limit”
OR
“our data support a true population mean that is between the lower and upper confidence limits”
CI formula for proportions
Proportion±1.96×SE
What does the 25th percentile represent in box plots?
25% of sample data is below and 75% above this point
What does the median represent in box plots?
50% of sample data is below and 50% above this point
What does the 75th percentile represent in box plots?
75% of sample data is below and 25% above this point