Why do we need statistics? Flashcards
What is data?
Observations that have been collected e.g. measurements
What is a population?
The COMPLETE collection of subjects studied
What is a sample?
A SUBCOLLECTION of subjects from the population
What is random sampling?
Sampling so that each subject in the population is EQUALLY likely to be selected
Sample mean will be biased if subjects are not sampled randomly from the population
What is replication?
Repeats of the experiments to identify the degree of variation and confidence in results
What are the two different types of data?
Quantitative data: Data which can be counted, discrete or continuous e.g. temperature, height, lengths
Qualitative data: Data which can be separated into different categories and are distinguished by some non-numeric characteristics
What is the difference between nominal data and ordinal data?
Nominal: CANNOT be ordered e.g. yes or no , different colours, male or female
Ordinal: CAN be ordered e.g. short, medium, tall or none, few, many
What is the difference between observational studies and experimental studies?
Observational studies: Just monitoring= Not changing the patients, which allows us to say something about the correlation. There is no attempt to modify the subjects being studied
Experimental: Apply one or more treatments and observe their effect
In general, it is easier to infer causation from experimental study whereas observational studies tend to only reveal correlations
What is a parameter?
Measurement describing some characteristic of a POPULATION (what we would ideally like to know)
μ= population mean
What is a statistic?
Measurement describing some characteristic of a SAMPLE = Used as an estimate (what we find out from the study)
x̅= Sample mean
Population parameter is estimated by the sample statistic
What is a sampling error?
The difference between the statistic and the parameter
What are histograms used for?
X axis: Classes of data
Y axis: Frequency of data or portion of the data in the category
Portion: Number of observations in the category/ Total number of observations and the heights of the categories will sum to 1
Indicate variation, can be used to visualise continuous data
What are the different ways of measuring the centre of the data?
Mean
Median: Middle value of the data when arranged in order- It is robust from large data that does not fit the trend
Mode: Value that occurs most frequently
Bimodal: Two different values with the same greatest frequency
Multimodal: If more than two values occur with the same greatest frequency
What are the different ways of measuring similarity or dissimilarity in the data?
Range: The difference between the smallest and largest value in the data
Sample variance: Measure of squared distance of each data point from the mean
How do you measure the sample variance?
FORMULA FOR IT
Add together all the data values - the sample mean and divide by the number of data - 1
s² is a STATISTIC and high values= high dissimilarity in the data
σ² is a PARAMETER = If all members of the population are measured then it is the population variance and you divide by n instead of n - 1
s² can estimate σ²