Research Skills 6 : Introduction to analysing data Flashcards
Inspecting and Plotting your data
- ALWAYS start by looking at your raw data before calculating statistics.
- Look at the actual numbers.
- Check for obvious mistakes, missing values and outliers.
- Think about the best graphical representation for your data
- Graph the results and look at them.
Statistics only represent your data and describe your data. They do not substitute the results
Summarising Data: Descriptive Statistics
You can use descriptive statistics (e.g. average) to simplify your data
MEASURES OF “AVERAGE“ (“Measures of central tendency”)
-Mean
“Common average”, “arithmetic mean”
-Median
The middle value. Put all the observations in order of size. Find the middle value- the value which has the same number of observations larger than it, as smaller than it.
Disadvantages of descriptive stats
The mean is strongly affected by outliers
The median is insensitive to outliers and to skewed distributions
Name two measures of spread
- Range
2. Standard Deviation
What is the Range?
- Smallest to largest value
- But only tells you about the largest and smallest value, nothing about the spread of all the other observations.
What is Standard Deviation?
- A mathematical measure of the spread of data around the mean.
- Notice that SD is a measure of the spread
- It does not show the actual spread or range
- ± 1 SD around the mean will include a lot of the data
- ± 2 SD around the mean will include most of the data
- But some results will be even further out
What is Variance?
The average of the squared differences from the Mean, or the square of the Standard Deviation (SD2)
What is Interquartile Range ?
Divide the data into the top 25% next 25% next 25% bottom 25%
Interquartile range covers the middle two groups. Used by population scientists with large datasets. Not useful with small numbers of observations.
Graphing Data: what are the two types of data?
- Numerical (quantitative) data
2. Categorical Data
What is Standard Error of the Mean (sem) ?
This does not measure the spread of the data. It measures our confidence in the estimate of the mean.
Standard Deviation vs SEM
- Standard deviation is a measure of the spread in your data. As you get more data the spread will stay about the same- the s.d. will change only slightly.
- S.e.m. is a confidence interval , a measure of the uncertainty in your estimate of the mean..
What is a disadvantage of SEM?
- this only works for large numbers of observations (>20-30)
- For small numbers of observations, the s.e.m. is too optimistic
- You could find the true confidence intervals using a t-test
What are properties of Normal Distribution?
Among the properties of - the normal distribution: it is symmetrical about the mean - it extends to + and to – infinity - however ~ 95% of observations lie within ± 2 standard deviations of the mean
What is Normal Distribution?
The “normal distribution” is a particular mathematical distribution with two parameters, the mean and the standard deviation.
What is the Central Limit Theorem?
- If a variable is affected by a lot of different random factors
- Each has a small effect
- And their effects are additive
- The distribution will approximate to a normal distribution
Summary I - Own your data
- Look at the raw data
- Plot the raw data
- Think about what the data it means
Summary II - Descriptive Statistics
- Measures of average
- Mean: works best for mathematicians
- Median: sometimes gives a more sensible answer when there are outliers, or a skewed distribution
- Measures of spread
- Range(only tells you about smallest and largest observation)
- Standard deviation (s.d.) (more useful measure of overall spread)
- Variance (=s.d.2)
- Interquartile range (only useful if large number of observations)
Summary III - Error Bars
- Could mean anything
- So must be defined in the figure legend
- S.D. error bars are a measure of the spread of the data
- S.E.M. error bars are an indication of your confidence in the estimate of the mean
Summary IV - Standard Error of the Mean
- S.e.m. is a confidence interval
- We can be ~60% confident that the “true” mean is ± 1 s.e.m. distant from the experimental mean
- And 95% confident that the “true” mean is approx. ± 2 s.e.m. distant from the experimental mean
- As the number of observations gets larger the s.e.m. gets smaller
- Our confidence in the estimate of the mean is higher