Lecture 2: descriptive statistics Flashcards
What are the goals of analysis?
1- To summarise data from a sample included in an experiment or observational study
2- To test hypothesis, and make interferences to the larger population from which a sample was drawn
What are the types of statistical analysis?
Descriptive statistics
Inferential statistics
What is descriptive statistics?
Methods used to summarise or describe the main features of a collection of data
Describe the characteristics of a sample
What is Inferential statistics?
Methods used to make inferences from the sample to the larger population
What are the mothers of Descriptive statistics?
Graphical techniques- Diagrams : Histograms, box-and-whisker plots, scatterplots, bar charts, pie charts
Numerical techniques- Summary Statistics: Mean, standard deviation, range, median, inter-quartile range (IQR), mode, frequencies, percentages (incl. incidence, prevalence, risk, odds)
What type of Diagrams are used for Numerical data in Descriptive Statistics?
–Histogram
–Box-and-whisker plots (boxplots) for comparison by a categorical variable (e.g. sex)
–Scatterplots – relationship between two interval variables
What type of Diagrams are used for Categorical data in Descriptive Statistics?
–Bar charts, pie charts
–Clustered or stacked bar charts for comparison by a second categorical variable
What type of Diagrams are used for Numerical data?
- Histogram (for continuous data)
- Box-and-whisker plots (boxplots)
- Scatterplots
What are the characteristics of a Normal distribution?
–Symmetrical or bell-shaped
–Exactly half of the values are to the left of the center and the other half to the right
What are the characteristics of a Skewed distribution?
–Asymmetric distribution
–Right or positive skew – extreme values to the right
–Left or negative skew – extreme values to the left
What is the 5-number summary used in Box-and-whisker plot ?
Minimum = Min 1st Quartile= Q1 Median= Q2 3rd Quartile= Q3 Maximum = Max
Boxplots are useful for?
Comparing groups
Scatter plots are useful for?
Showing correlation
What are the Diagrams used for Categorical data (and Quantitative discrete)?
- Bar charts
- Clustered or stacked bar charts
- Pie charts
What is the simplest way to present data?
By using Frequencies (counts) or percentages
What are the different ways you can display frequencies and percentages?
Table
Bar chart
Frequency distribution
Pie chart
What are the two preferred methods for numerical summaries in Descriptive Statistics?
- Measures of central tendency
* Measures of dispersion/spread
What are Measures of central tendency?
Also know as AVERAGES
Used to identify the “centre” around which data are distributed.
–Mean: arithmetic average
–Median: middle value of a data set
–Mode: most frequently occurring value
What is the Mean?
Arithmetic average
Mean=sum of data point/ number of data points
What is the Median?
Middle value of a data set
Divides the data into 2 equal sets
- If there is an odd # of elements, median is the middle number
- If there is an even # of elements, median is the average of 2 middle numbers
What is the Mode?
Most frequently occurring value
What does Numerical descriptive statistics measure?
Measures of central tendency
–Mean
–Median
–Mode
The choice of summary measure is determined by?
The distribution of the data
In a symmetric distribution, mean and median?
Are the same
If median and mean are different, this indicates that?
The data are Skewed
What are Measures of variability/dispersion?
The spread of the distribution - how widely the observations are spread out around the measure of central tendency
What are the commonly used measures of dispersion to indicate how spread-out the data is?
- Range (min , max)
–Interquartile Range IQR (the 25th and 75th percentiles)
–Standard Deviation SD (measure of variability around the mean)
What is the Range?
The difference between the highest and lowest value
Max - min
What is the cons if the Range?
Not very representative
What is the Interquartile range- IQ?
Splits ordered data into 4 quartiles and measures the range covered by 50% of the distribution.
IQ= Q3 - Q1
What is the standard deviation?
Average difference of all data points from the sample mean
What is the Empirical rule?
For data with symmetric shape, the standard deviation has the following characteristics:
•68% of sample data falls within ± 1 SD
•95% of sample data falls within ± 2 SD
For data with symmetric shape, the standard deviation has the following characteristics?
- 68% of sample data falls within ± 1 SD
* 95% of sample data falls within ± 2 SD
Empirical rule example
•At a 1 year review weight loss meeting, the mean weight lost by patients was 10kg
•The standard deviation of the group was calculated to be 2.5kg
Therefore:
± 1 S.D
= 10kg ± 2.5kg
= 7.5kg and 12.5 kg
Therefore, we can state that 68% of patients lost between 7.5 and 12.5kg
± 2 S.D
= 10kg ± 2(2.5kg)
= 5kg and 15 kg
Therefore, we can state that 95% of patients lost between 5kg and 15kg
What kind of distribution it is when Mean & standard deviation are present?
Normal distribution
What kind of distribution it is when Mercian & Interquartile range are present?
Skewed
–Left skew – extreme values on the left
–Right skew – extreme values on the right