lecture 2 - descriptive (summary) statistics Flashcards

1
Q

what is central tendency?

A

n average and is the easiest way to summarise data i.e. its where most of the scores are
Most common measure of central tendency is the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

mean

A

mean = sum of all scores/ no of scores
x̄ = ∑ x / N
∑X means ‘add up’ or total or ‘sum’ all of X
Use x̄ to signify the mean of a sample
E.g. the mean height of people in this class
Use µ to signify the mean of a whole population
E.g. the mean height of everybody in Cardiff uni.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

calculating means

A

Mean is unique and very sensitive to data so change any score and so does the mean.
The mean only makes sense with interval or ratio measurement as adding things up needs equal intervals.
The arithmetic mean doesn’t need to be on the scale the data was taken from.
The mean doesn’t fully describe the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

mode

A

Most frequent score, not unique, could have 2 or more scores with same and highest frequency ( called multimodal data). Mode is not very sensitive to changes in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

median

A

Middle score, the score above and below which 50% of the data points lie. It is at the (N +1)/2 position ( when scores arranged in order). The Median is unique but not very sensitive to changes in data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

central tendency and measurement

A

Because of link to inferential stats and normal distribution (more later in the course) mean is most common measure of central tendency for interval & ratio scales.
But median & mode also fine depending on what information is being conveyed.
With ordinal scales can’t use mean therefore median most common (but can use mode).
Nominal scales can’t use median or mean therefore mode most common measure of central tendency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

variability

A

The degree of ‘spread’ about an average
Mode - no associated measure of variability
Median - interquartile range
Mean - variance and standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

interquartile range

A

the difference between the 1st quartile (score that has 25% of data below it and 75% above) and 3rd quartile (score has 75% of data below and 25% above) so is the middle 50% of data.
* For small samples it can be easier to take the median and then find the middle of the lower & upper halves.
* Order data (low to high)
7,1,2,6,3,4,6,3,4,5,1,8 becomes 1,1,2,3,3,4,4,5,6,6,7,8
* Divide data into two groups using median (“median split”)
1,1,2,3,3,4,4,5,6,6,7,8 becomes 1,1,2,3,3,4 and 4,5,6,6,7,8
* Find median of lower-rank group: 1st quartile (Q1)
Median of 1,1,2,3,3,4 is 2.5
* Find median of high-rank group: 3rd quartile (Q3)
Median of 4,5,6,6,7,8 is 6
* Interquartile range = Q3 – Q1
IQR is 6 – 2.5 = 3.5

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

variance

A

the average squared deviation.
The variance is the average distance of scores from the mean. It is the sum of squares divided by the number of scores. It tells us about how widely dispersed scores are around the mean.
On average how far away from the mean is each score?
Two steps-
1 - find out how far away each score is from mean ie how deviant is the score?
2 - what is the average deviation?
* To find the average deviation same as any other average.
* Add up all the scores and divide by the number of scores
* Problem: The sum of any set of scores X - x̄ is ZERO
* Solution: Square the deviation scores so they are all positive
* The average squared deviation is the variance
* Divide the sum of your squared deviation scores by the total number of scores
So, on average, the scores are spread out 5.06 squared therapy sessions around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

standard deviation

A

The standard deviation is the square root of the variance. It is the variance converted back to the original units of measurement of the scores used to compute it. Large standard deviations relative to the mean suggest data are widely spread around the mean, whereas small standard deviations suggest data are closely packed around the mean.
* Need to get the scores back into the original units
* Take the square root of the variance
* The square root of the variance is called the standard deviation
* The standard deviation is a measure of the average deviation of the scores from the mean, in the original units of the scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

estimating the population SD from a sample

A

Sample mean a good estimate of the population mean.
But, SD taken from a sample using this formula is not the best estimate of population SD.
Lets look at an example:
Population is 1, 2, 3
Population mean = 2, Population SD = 0.816
Take 3 samples of size 1 from the population
1 & 2 & 3.
Sample means are 1, 2, & 3 so average of sample means is 2 – same as population mean.
Sample SDs all 0. This underestimates the population SD!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

population variance

A

σ ² x = ∑( X - µ ) ² / N
where
X are the data
µ is the population mean
N is the number of data points that make up the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

sample variance

A

s ² x =∑ (X - x̄) ² / N - 1
where
X are the data
x̄ is the sample mean
N is number of data points that make up sample
Divide by N-1 to get Sample variance as this gives an unbiased estimate of the population variance.
∑ (X - x̄) ² sometimes known as sum-of-squares (SS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

standard deviation - population and sample

A
  • Variance has ‘squared’ units
    • if data are heights in metres, variance is in metres-squared (i.e.. is an area not a height!)
  • So, take the square-root
    • new measure of variability now has same units at original data
      Square-root of variance is the Standard Deviation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

population standard deviation

A

σ x = √σ ² x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

sample standard deviation

A

Sx = √s² x

17
Q

sample variance and sd

A
  • Calculate sum of squares as before.
  • Divide the sum of squares by number in sample – 1 (i.e. N – 1).
    And sample standard deviation (i.e. the square root of the variance).
18
Q

standard error of the mean

A

A related measure is the “Standard Error of the Mean” (or SEM). This is the SD divided by the square root of the number of datapoints in the sample.
It is actually an estimate of the variability of the distribution of sample means taken from the population
the SEM is another way to represent variability related to a mean.

19
Q

feature of frequency distributions

A

useful for assessing properties of the distribution of scores

20
Q

kurtosis

A

refers to the degree to which scores cluster at the ends of the distribution and this tends to express itself in how pointy a distribution is. in a normal distribution the values of skew and kurtosis are 0. if a distribution has values of skew or Kurtosis above or below 0 then this indicates a deviation from normal. sometimes no kurtosis is expressed as 3.

21
Q

a distribution with positive kurtosis

A

has many scores in the tails - a heavy tailed distribution and is pointy. a leptokurtic distribution

22
Q

distribution with negative kurtosis

A

relatively thin in the tails and tends to be flatter than normal. this distribution is called platykurtic.

23
Q

how to calculate the centre of a frequency distribution or central tendency

A

using 3 measures - the mean, mode and median

24
Q

how to spot the mode on a graph

A

the tallest bar, a distribution with two highest bars is bimodal, data sets with more than two modes are multimodal

25
Q

range

A

looks at dispersion of scores - take the largest score and subtract it from the smallest score

26
Q

IQR

A

the range of the middle 50% of scores . The advantage of the interquartile range is that it isn’t affected by extreme scores at either end of the distribution. However, the problem with it is that you lose a lot of data (half of it, in fact).

27
Q

quartiles

A

the three values that split the sorted data into 4 equal parts. first calculate the median which is also called the second quartile, which splits our data into two equal parts. The lower quartile is the median of the lower half of the data and the upper quartile is the median of the upper half of the data. As a rule of thumb the median is not included in the two halves when they are split. Like the median, if each half of the data had an even number of values in it, then the upper and lower quartiles would be the average of two values in the data set. Once we have worked out the values of the quartiles, we can calculate the interquartile range, which is the difference between the upper and lower quartile. Quartiles are quantiles that split the data into four equal parts, but there are other quantiles such as percentiles (points that split the data into 100 equal parts), noniles (points that split the data into nine equal parts) and so on.

28
Q

quantiles

A

Quantiles are values that split a data set into equal portions

29
Q

sum of squared errors (SS)

A

∑( xi - x̄)²

30
Q

sum of squares

A

We can use the sum of squares as an indicator of the total dispersion, or total deviance of scores from the mean. The problem with using the total is that its size will depend on how many scores we have in the data. The sum of squared errors is the total amount of error in the mean. The errors/deviances are squared before adding them up.

31
Q

population and samples

A

scientists are usually interested in finding results that apply to an entire population of entities. psychologists cannot collect data from every human being therefore we collect data from a smaller subset of the population known as the sample.