Quanti - Descriptive Statistis Flashcards
What is statistics?
“the practice or science of collecting and analysing numerical data in
large quantities, especially to make inferences on a population based on
a representative sample.”
To helps us turn data into information that can be interpreted, understood and used to improve evidence-based healthcare
What are the 2 broad classifications of statistics?
Descriptive and inferential statistics
What are descriptive statistics?
to provide description of population through numbers, graphs and tables
(COLLECT, SUMMARIZE, DESCRIBE).
What are inferential statistics?
to provide meaningful inferences/conclusions on
the population based on data collected from a sample
(INTERPRET, GENERALIZE, PREDICT).
How are continuous variables usually presented?
- Measure of central tendency
- Measure of dispersion/variability
How are categorical variables usually presented?
Measure of frequency
What kinds of statistical methods are used for inferential statistics?
- Hypothesis testing
- Regression analysis
What do the following terms mean?
- Population
- Parameter
- Variable
- Sample
- Statistic
Population: Collection of entire set of individual objects or events of interest
Parameter: Numerical characteristic of population
Variable: Characteristic that is being measured.
Sample: Subset of a population
Statistic: Measure that describes the sample
Descriptive statistics for categorical variables - Measure of frequency
How are they presented?
- Frequency (%):
e.g. Males 70 (70%); females 30 (30%) - Cross-tabulation
(present frequency in a table form) - Use of pictogram
- Use of pie, bar, column, line, scatter, all sorts of charts
Descriptive statistics for continuous variables - Measure of central tendency
Usually, mean, median and mode are used.
Can be illustrated via histogram (barchart stuck together) or boxplot
What is mean?
- arithmetic average of a set of values
- more suitable for symmetric
distribution - OFTEN REPORTED WITH STANDARD DEVIATION
e.g. mean (SD)
Formula:
Add all values divided by total number of values (frequency)
What is median?
- middle value of a data set when arranged in ascending or descending order
- more suitable for SKEWED distribution
- OFTEN REPORTED WITH INTERQUARTILE RANGE
e.g. median (IQR))
Formula:
If n (frequency)= 9,
Median: ((9+1))/2 th term: 5th term
If n (frequency)= 10,
Median: (5th term + 6th term)/2
What is mode?
value that occurs most frequently in a data set
Example:
Data set {2, 3, 4, 4, 5, 6, 6, 6, 7}.
Mode = 6
What is a normal distribution?
Mean, median and mode coincides at one point on the x-axis. (see slide 12)
What is a left skewed distribution?
Mean is on the left side of x-axis. (see slide 15)
What is a right skewed distribution?
Mean is on the right side of the x-axis. (see slide 15)
What is the normality of distribution?
- All kinds of naturally occurring variables are usually normally distributed (Bell curve)
e.g. height, weight
Central limit theorem: If sample size > 30, should follow a normal distribution
- Main property: mean, median and mode are the same (symmetrical curve)
What is skewness in a normal distribution?
lack of symmetry
- tells us direction of variability
- different from variance, which tells magnitude of variability
What is variance?
- average of squared differences of each datapoint
from mean (squared unit of mean) - if sigma, σ2=0: all data values are the same
- Small variance= data are close to mean and each other
- Large variance= data are far from mean and each other
What is standard deviation?
square root of variance (same unit as mean)
What is the 68-95-99.7 rule?
a statistical principle that applies to bell-shaped, normal distributions
tells you where most of your values lie in a normal distribution:
~68% of values are within 1 SD from the mean
~95% of values are within 2 SD from the mean
~99.7% of values are within 3 SD from the mean