Why do we need statistics? Flashcards
What is data?
Observations that have been collected e.g. measurements
What is a population?
The COMPLETE collection of subjects studied
What is a sample?
A SUBCOLLECTION of subjects from the population
What is random sampling?
Sampling so that each subject in the population is EQUALLY likely to be selected
Sample mean will be biased if subjects are not sampled randomly from the population
What is replication?
Repeats of the experiments to identify the degree of variation and confidence in results
What are the two different types of data?
Quantitative data: Data which can be counted, discrete or continuous e.g. temperature, height, lengths
Qualitative data: Data which can be separated into different categories and are distinguished by some non-numeric characteristics
What is the difference between nominal data and ordinal data?
Nominal: CANNOT be ordered e.g. yes or no , different colours, male or female
Ordinal: CAN be ordered e.g. short, medium, tall or none, few, many
What is the difference between observational studies and experimental studies?
Observational studies: Just monitoring= Not changing the patients, which allows us to say something about the correlation. There is no attempt to modify the subjects being studied
Experimental: Apply one or more treatments and observe their effect
In general, it is easier to infer causation from experimental study whereas observational studies tend to only reveal correlations
What is a parameter?
Measurement describing some characteristic of a POPULATION (what we would ideally like to know)
μ= population mean
What is a statistic?
Measurement describing some characteristic of a SAMPLE = Used as an estimate (what we find out from the study)
x̅= Sample mean
Population parameter is estimated by the sample statistic
What is a sampling error?
The difference between the statistic and the parameter
What are histograms used for?
X axis: Classes of data
Y axis: Frequency of data or portion of the data in the category
Portion: Number of observations in the category/ Total number of observations and the heights of the categories will sum to 1
Indicate variation, can be used to visualise continuous data
What are the different ways of measuring the centre of the data?
Mean
Median: Middle value of the data when arranged in order- It is robust from large data that does not fit the trend
Mode: Value that occurs most frequently
Bimodal: Two different values with the same greatest frequency
Multimodal: If more than two values occur with the same greatest frequency
What are the different ways of measuring similarity or dissimilarity in the data?
Range: The difference between the smallest and largest value in the data
Sample variance: Measure of squared distance of each data point from the mean
How do you measure the sample variance?
FORMULA FOR IT
Add together all the data values - the sample mean and divide by the number of data - 1
s² is a STATISTIC and high values= high dissimilarity in the data
σ² is a PARAMETER = If all members of the population are measured then it is the population variance and you divide by n instead of n - 1
s² can estimate σ²
What is the sample standard deviation?
Square root of the sample variance
Sample variance has units different to the data whereas the sample standard deviation has the same units
s= Sample standard deviation s²= Sample variance
σ= Population standard deviation σ²= Population variance
Rule of thumb: About 95% of the data should lie within 2 standard deviations from the mean
What does statistical inference make use of?
Information from a sample to draw conclusions about the population from which the sample was taken
Problem: Wish to know something about a population of interest, but cannot examine every member of the population
How can you increase the likelihood of the sample mean being a better estimate?
1) Increase sample size
2) The variation in observed is low
What test statistic would you carry out if a population mean was claimed to be μ and you knew that the population mean is?
t statistic
If the data really were generated from a distribution having mean μ
then the test statistic is consistent with the t-distribution
Expect t to be small (positive or negative)
What do parametric and non-parametric test assume?
Parametric: Some aspect of the data has a normal distribution
Non-parametric: Usually do not make such strict assumptions about the shape of the data= Less likely to detect patterns in the data, therefore less likely to reject the null even if it is false