Lecture 12, 13 and 14 - Biostatistics Flashcards
Define sample
A sample is a group taken from the overall population, which we use to make estimates and generalisations about the population
The sample has to be representative of the population
If the method of sampling we use gives us an unrepresentative sample, the results won’t be the true population value.
Reports have a margin of error and confidence interval
Population define
The entire group of people or things that we want information about. Reports are a true representation of opinion.
General overview of the process of sampling
Use a representative sample of the population to make conclusions about the population.
Uses a smaller sample group to represent population
Involves summarising data using tables and graphs as well as inferencing
Census
Taking a sample from an entire population. Time consuming and expensive to test whole population.
Statistics deals with
uncertainty
Why do we take a sample from a population?
Because taking data from the whole population is difficult to investigate and very costly
Proportion
Proportion = number with characteristic/ total number
Percent
Percent = 100% x number with characteristic/ Total number
True proportion/ true population value
The true population values is the statistic we get if we could test the entirety of the population
Increasing sample size…
Would mean there is more certainty with our results
Categorical variables
A categorical variable has values that you can put into a countable number of distinct groups based on a characteristic.
For example - eye colour, stages of cancer, the colour of M&Ms
What can we summarise categorical variables as?
Summarise these types of variables by the number in each category and the percent (or proportion)
Continuous variables
Continuous variable can take on any value.
For example - height, weight, age and blood pressure
Mean
Also known as the central tendency. To find the mean, add up the values in the data set and then divide by the number of values that you added…
Mean = Sum of all/total number of observations
Central value of a discrete set of numbers
What does sampling look like for a continuous variable?
A histogram is often used as it shows the distribution/spread of data.
How to present categorical…
If categorical, we can present proportions or percentages
How to present continuous …
If continuous, we usually want to know where the centre is (central tendency/mean) and how spread out the data is. Often you the mean (central tendency) and standard deviation (spread or variability) for this.
Standard deviation
Standard deviation is a number used to tell how measurements for a group are spread out from the average (mean), or expected value. A low standard deviation means that most of the numbers are close to the average. A high standard deviation means that the numbers are more spread out.
Spread of distribution is determined bu the standard deviation
The main purpose of collecting a sample is …
to make an inference
Parameters
measures which describe a population, such as mean, median, IQR
What is bias and how do you avoid it?
Sampling bias is where there is a specific preference towards on group over other being selected for the sample
An unbiased sample means samples are taken at random, with no preference over certain groups in the population and everyone has a fair chance of being chosen for the study
The sample is not representative if it has bias. Has too many people from a particular group within the population or a group is completely excluded
To avoid bias - then the experiment must gives everyone a chance of being included in the sample for it to be fair and representative of the population
Simple random sampling
The basic sampling technique where we select a group of subjects (a sample) for study from a larger group (a population). Each individual is chosen entirely by chance and each member of the population has an equal chance of being included in the sample.
Systematic sampling
Systematic sampling is a statistical method involving the selection of elements from an ordered sampling frame (It is a list of all those within a population who can be sampled, and may include individuals, households or institutions).