Lecture 2: descriptive statistics Flashcards
What are the goals of analysis?
1- To summarise data from a sample included in an experiment or observational study
2- To test hypothesis, and make interferences to the larger population from which a sample was drawn
What are the types of statistical analysis?
Descriptive statistics
Inferential statistics
What is descriptive statistics?
Methods used to summarise or describe the main features of a collection of data
Describe the characteristics of a sample
What is Inferential statistics?
Methods used to make inferences from the sample to the larger population
What are the mothers of Descriptive statistics?
Graphical techniques- Diagrams : Histograms, box-and-whisker plots, scatterplots, bar charts, pie charts
Numerical techniques- Summary Statistics: Mean, standard deviation, range, median, inter-quartile range (IQR), mode, frequencies, percentages (incl. incidence, prevalence, risk, odds)
What type of Diagrams are used for Numerical data in Descriptive Statistics?
–Histogram
–Box-and-whisker plots (boxplots) for comparison by a categorical variable (e.g. sex)
–Scatterplots – relationship between two interval variables
What type of Diagrams are used for Categorical data in Descriptive Statistics?
–Bar charts, pie charts
–Clustered or stacked bar charts for comparison by a second categorical variable
What type of Diagrams are used for Numerical data?
- Histogram (for continuous data)
- Box-and-whisker plots (boxplots)
- Scatterplots
What are the characteristics of a Normal distribution?
–Symmetrical or bell-shaped
–Exactly half of the values are to the left of the center and the other half to the right
What are the characteristics of a Skewed distribution?
–Asymmetric distribution
–Right or positive skew – extreme values to the right
–Left or negative skew – extreme values to the left
What is the 5-number summary used in Box-and-whisker plot ?
Minimum = Min 1st Quartile= Q1 Median= Q2 3rd Quartile= Q3 Maximum = Max
Boxplots are useful for?
Comparing groups
Scatter plots are useful for?
Showing correlation
What are the Diagrams used for Categorical data (and Quantitative discrete)?
- Bar charts
- Clustered or stacked bar charts
- Pie charts
What is the simplest way to present data?
By using Frequencies (counts) or percentages
What are the different ways you can display frequencies and percentages?
Table
Bar chart
Frequency distribution
Pie chart
What are the two preferred methods for numerical summaries in Descriptive Statistics?
- Measures of central tendency
* Measures of dispersion/spread
What are Measures of central tendency?
Also know as AVERAGES
Used to identify the “centre” around which data are distributed.
–Mean: arithmetic average
–Median: middle value of a data set
–Mode: most frequently occurring value
What is the Mean?
Arithmetic average
Mean=sum of data point/ number of data points
What is the Median?
Middle value of a data set
Divides the data into 2 equal sets
- If there is an odd # of elements, median is the middle number
- If there is an even # of elements, median is the average of 2 middle numbers
What is the Mode?
Most frequently occurring value
What does Numerical descriptive statistics measure?
Measures of central tendency
–Mean
–Median
–Mode
The choice of summary measure is determined by?
The distribution of the data
In a symmetric distribution, mean and median?
Are the same