Descriptive and Inferential Statistics Flashcards
Difference between descriptive and inferential statistics
Descriptive statistics - Methods for organising and summarising a set of data that help to describe the attributes of a group or population.
Inferential statistics - Statistical methods used to draw conclusions from a sample and make inferences to the entire population.
Differentiate between the three types of variables
- Nominal (or Categorical)
- with categories that are not ordered
- e.g. gender, race, smoking status, blood group - Ordinal
- with categories that are ordered
- e.g. cancer stages, pain rating, Likert scale data - Continuous (or interval)
- with real values that reflect order and relative magnitude
- e.g. age, height, weight
Appropriate way to describe nominal data numerically and graphically
Numerically summarised as frequency (n) and proportion (%).
Graphically can be presented as pie chart, bar chart.
Appropriate way to describe ordinal data numerically and graphically
a) . For most ordinal data:
- Numerically summarised as frequency (n) and proportion (%).
- Graphically, can be presented as pie chart, bar chart.
b). For Likert scale data:
- Numerically summarised as frequency (n) and proportion (%).
- Graphically, can be presented as pie chart, bar chart.
OR
- Numerically summarised as median and interquartile range (IQR)
- Graphically presented as box plot (box-and-whiskers plot).
Appropriate way to describe continuous data numerically and graphically
- Numerically, summarised as a measure of central tendency (e.g. mean, median) with measure of variability (standard deviation, IQR).
- Graphically, can be presented as a histogram, box plot.
- Normally distributed continuous data are numerically summarized as mean and SD, while non-normally distributed continuous data is numerically summarized as median and IQR.
Difference between parameter estimation and hypothesis testing
Parameter Estimation
- Seeks an appropriate calculation of a population parameter.
- E.g. By how much does this new drug reduce blood pressure?
- Methods: Point estimate, interval estimate
Hypothesis Testing
- Seeks to validate a supposition based on limited evidence, inferred using a sample from the population.
- Eg. Does this new drug reduce blood pressure?
- Methods: Null hypothesis, alternative hypothesis
What are the different components of parameter estimation?
- Sampling distribution of the mean
- Central Limit Theorem
- Point Estimate
- Interval estimate (confidence interval, CI)
sampling distribution of the mean
- Repeated random samples of size n are taken
- The mean is computed for each sample
- The means of all random samples are used as the data
- The mean of the sample means is equal to the population mean (μ)
- The standard deviation of the sample means is equal to the population standard deviation (σ) divided by the square root of the sample size, also known as the standard error of the mean (SEM).
» Quantification of the variability of the sample mean values
» Used to estimate the precision or reliability of a sample, and is used in the calculation of confidence intervals
central limit theorem
For sufficiently large sample sizes, the sampling distribution of the mean is approximately normally distributed, even if the underlying distribution of the individual observations is not.
point estimate
- Involves the use of sample data to calculate a single number
- E.g. sample mean (x̄) to estimate population mean (μ)
interval estimate (confidence interval, CI)
- Provides a range of reasonable values that are intended to contain the parameter of interest (e.g population mean (μ))
- 95% CI: if data collection and analysis could be replicated, the CI should include within it the true value of the measure 95% of the time.
- Provides information on the precision of the point estimate.
- Width of CI is influenced by 3 factors:
» Confidence level (e.g 90%, 95%, 99%)
» Sample size (n)
» Standard deviation (σ)
the narrower the 95% confidence interval, the ____ precise the point estimate.
more
Principles of hypothesis testing
- Null hypothesis (H0): no difference or no relationship or no effect
- Alternative hypothesis (H1): there is a difference or relationship or effect
- Statistical decisions based on p-value:
» p < 0.05 leads to rejection of the null hypothesis and acceptance of the alternative hypothesis. Result is statistically significant at significance level of 0.05.
» p ≥ 0.05 leads to retention of the null hypothesis. Result is not statistically significant at significance level of 0.05. - The smaller the p-value, the stronger the evidence against H0.
What is:
- Type I error
- Type II error
- Statistical power
- Type I error: an error that occurs during the hypothesis testing process when a null hypothesis is rejected, even though it is accurate and should not be rejected.
- Type II error: an error that occurs when one fails to reject a null hypothesis that is actually false.
- Statistical power:
Why is confidence interval more informative than p-value?
- CI is more informative than p-value, as CI provides information on:
» Precision of the point estimate (e.g. mean difference, odds ratio)
» Statistical significance