Research Skills 6 : Introduction to analysing data Flashcards

1
Q

Inspecting and Plotting your data

A
  • ALWAYS start by looking at your raw data before calculating statistics.
  • Look at the actual numbers.
  • Check for obvious mistakes, missing values and outliers.
  • Think about the best graphical representation for your data
  • Graph the results and look at them.

Statistics only represent your data and describe your data. They do not substitute the results

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Summarising Data: Descriptive Statistics

A

You can use descriptive statistics (e.g. average) to simplify your data

MEASURES OF “AVERAGE“ (“Measures of central tendency”)

-Mean
“Common average”, “arithmetic mean”

-Median
The middle value. Put all the observations in order of size. Find the middle value- the value which has the same number of observations larger than it, as smaller than it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Disadvantages of descriptive stats

A

The mean is strongly affected by outliers

The median is insensitive to outliers and to skewed distributions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name two measures of spread

A
  1. Range

2. Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Range?

A
  • Smallest to largest value

- But only tells you about the largest and smallest value, nothing about the spread of all the other observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is Standard Deviation?

A
  • A mathematical measure of the spread of data around the mean.
  • Notice that SD is a measure of the spread
  • It does not show the actual spread or range
  • ± 1 SD around the mean will include a lot of the data
  • ± 2 SD around the mean will include most of the data
  • But some results will be even further out
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Variance?

A

The average of the squared differences from the Mean, or the square of the Standard Deviation (SD2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Interquartile Range ?

A
Divide the data into the 
top 25%
next 25%
next 25% 
bottom 25%

Interquartile range covers the middle two groups. Used by population scientists with large datasets. Not useful with small numbers of observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Graphing Data: what are the two types of data?

A
  1. Numerical (quantitative) data

2. Categorical Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Standard Error of the Mean (sem) ?

A

This does not measure the spread of the data. It measures our confidence in the estimate of the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard Deviation vs SEM

A
  • Standard deviation is a measure of the spread in your data. As you get more data the spread will stay about the same- the s.d. will change only slightly.
  • S.e.m. is a confidence interval , a measure of the uncertainty in your estimate of the mean..
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a disadvantage of SEM?

A
  • this only works for large numbers of observations (>20-30)
  • For small numbers of observations, the s.e.m. is too optimistic
  • You could find the true confidence intervals using a t-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are properties of Normal Distribution?

A
Among the properties of 
- the normal distribution:
it is symmetrical about the mean
- it extends to + and to – infinity
- however ~ 95% of observations lie within   ± 2 standard deviations of the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Normal Distribution?

A

The “normal distribution” is a particular mathematical distribution with two parameters, the mean and the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the Central Limit Theorem?

A
  • If a variable is affected by a lot of different random factors
  • Each has a small effect
  • And their effects are additive
  • The distribution will approximate to a normal distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Summary I - Own your data

A
  1. Look at the raw data
  2. Plot the raw data
  3. Think about what the data it means
17
Q

Summary II - Descriptive Statistics

A
  1. Measures of average
  • Mean: works best for mathematicians
  • Median: sometimes gives a more sensible answer when there are outliers, or a skewed distribution
  1. Measures of spread
  • Range(only tells you about smallest and largest observation)
  • Standard deviation (s.d.) (more useful measure of overall spread)
  • Variance (=s.d.2)
  • Interquartile range (only useful if large number of observations)
18
Q

Summary III - Error Bars

A
  1. Could mean anything
  2. So must be defined in the figure legend
  3. S.D. error bars are a measure of the spread of the data
  4. S.E.M. error bars are an indication of your confidence in the estimate of the mean
19
Q

Summary IV - Standard Error of the Mean

A
  1. S.e.m. is a confidence interval
  • We can be ~60% confident that the “true” mean is ± 1 s.e.m. distant from the experimental mean
  • And 95% confident that the “true” mean is approx. ± 2 s.e.m. distant from the experimental mean
  1. As the number of observations gets larger the s.e.m. gets smaller
  2. Our confidence in the estimate of the mean is higher