Descriptive Statistics Flashcards

Familiarity with basic descriptive stats

1
Q

What type of variable are rating scales?

A

Ordinal - gaps between numbers are not meaningful e.g. gap between 1 and 2 may be very different from the gap between 2 and 3, may take a lot more to move between one pair than the other

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can rating scales such as Likert scales be used?

A

Numbers can be aggregated across many questions and resulting numbers treated as INTERVAL data - cannot say one person is “twice as…” using this data, but can say a person is “three points higher than…” for example

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a key difference between interval and ratio data?

A

Ratio data has an absolute zero e.g. temperature in Celsius is interval data while weight is ratio data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 2 main types of descriptive statistics used?

A

Measures of central tendency - describes central position of a frequency distribution; MEAN, MEDIAN or MODE
Measures of spread - Describes how spread out scores are; RANGE, QUARTILES, ABSOLUTE DEVIATION, VARIANCE, SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Is it possible for the same data to be treated as both ordinal and ratio?

A

Yes - data can be transformed from one level to another to allow different types of testing but this can only happen “downwards” e.g. going from ratio down to ordinal. In simple terms, we can make detailed data more simple but we cannot do the opposite

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are 3 possible advantages of simplifying data?

A

Easier reading/understanding of data
Decisions/actions based on data become clearer e.g. pass or fail
Sweeping generalisations - classifying people into comparable groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are inferential statistics useful for?

A

When it isn’t practical to measure every member of a population - use a representative sample and make generalisations
Methods are an ESTIMATION of population parameters as sampling errors mean that a given sample will never be fully representative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Ordinal data

A

Variation along a continuum, difference between numbers NOT meaningful
Can only say bigger or smaller than, no direct ratio comparisons e.g. twice as big as

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Interval data

A

Variation along a continuum where the difference between numbers is meaningful/equal/fixed
No true zero though so ratios between numbers are not meaningful

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Compare and contrast descriptive and inferential statistics

A

Descriptive - measurements certain but cannot be generalised

Inferential - measurements can be generalised but purely estimation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When converting interval to ordinal data, how do you treat data which is the same e.g. if three participants have the same score?

A

Rather than saying, for example, that these three people are “equal second”, we would take the median of the second, third and fourth rank and that would be the designated rank
e.g. if three people score the second highest score, the median would be 3 and there would be no Rank 2 in this particular set of ordinal data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why can numbers used as measures be misleading?

A

e. g. TEMPERATURE - a temperature scale will appear as an interval scale, but the EXPERIENCE of heat change is arguably better considered as ordinal as an increase of 3 degrees in a room at 14 will likely be more noticeable than the same change at 30 degrees i.e. the gaps between the numbers are not meaningful as it takes more to get between one pair of numbers than another
e. g. “PLASTIC INTERVALS”/QUASI-INTERVAL SCALES - attitude scales, for example, have meaningless intervals along the scale but misleading familiarity of the 0-10 number scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a good rule of thumb for how to treat different measurement scales?

A

Using a published, standardised scale –> treat data as interval
Using an unstandardized invented scale –> safer to treat as ordinal
This means using different descriptive stats to summarise the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can we change interval data into nominal?

A

Median split method

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is meant by a quasi-interval scale?

A

Numerically appearing intervals on the scale do not measure equal amounts of construct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In practice, how do we use a truly interval scale?

A

e.g. when using a tape measure and someone is 175cm tall, we can only truthfully assert that they are 174.5-175.5cm tall (exact limits determined by tape measure or by convenience)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

When can we create frequency histograms?

A

When data is ordinal/interval/ratio NOT nominal i.e. the data needs to be able to be meaningfully ordered
(For nominal we would use bar charts, leaving spaces between the bars)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How and why do we group data for a frequency histogram?

A

When we have a large range of scores and individual data points are time consuming and uninformative

1) Decide group size - usually between 5 and 10, we are aiming for a size that enables us to have less than 10 groups in total
2) Start the interval with a multiple of 5 or 10 - we can use our group size possibilities here e.g. if we have scores ranging from 55-99 we can try an interval of 5 and divide the difference between scores i.e. 45 by 5 –> gives us 9 groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the 4 key features of histograms?

A

Columns equal width
Column area proportional to frequency represented + all columns sum to total area
No space between columns
All categories represented even if empty

20
Q

What do positive and negatively skewed uni-modal frequency distributions look like?

A
POSITIVE = tail to the right of highest value
NEGATIVE = tail to the left of highest value
21
Q

What are three advantages of mode as a measure of central tendency?

A

1) Only measure useful for nominal data
2) Also useful for discrete measurement scales e.g. number of legs
3) Only logical measure for bi-modal data

22
Q

What are three disadvantages of mode?

A

1) Depends on freq data grouping i.e. if we change a group size, this changes the distribution and thus changes the mode
2) Doesn’t account for range of data
3) Not very useful for small data sets where several data values may be equally frequent

23
Q

What are 2 advantages of median?

A

1) Unaffected by extreme scores/skewed distribution so more representative of group values than mean or mode
2) Useful for ordinal, interval and ratio data

24
Q

What are 2 disadvantages of median?

A

1) No simple equation to calculate exact value so trickier to work with
2) Doesn’t account for exact distances between variables and can be unrepresentative in small samples e.g. if our sample values are 2,3,5,9,8,112 –> median is 5

25
Q

What are 4 advantages of using the mean?

A

1) Easily manipulated algebraically
2) Most stable measure i.e. most consistent when replicated
3) Powerful in estimating parameters (parametric tests)
4) Most sensitive and accurate - accounts for distances between values

26
Q

What are 4 disadvantages of using the mean?

A

1) Influenced by outliers
2) Value may not actually exist within data set
3) Only able to use for continuous data
4) Poor measure for bi-modal data and discrete measurement scales e.g. number of children

27
Q

What is a “trimmed mean”?

A

Mean of the least extreme 95% of data i.e. remove top and bottom 2.5%
This is helpful when worried about outliers/skew
Need to keep note of what removed and make it transparent in any publications

28
Q

Why is the range not particularly useful?

A

Highly influenced by presence of outliers and doesn’t give any idea of distribution BETWEEN the extremes i.e. how close the data are to the mean

29
Q

What is the inter-quartile range?

A

Remove top and bottom quarters of data to avoid outlier problem, more concentrated on central grouping
Useful for ordinal data

30
Q

When can IQR be helpful?

A

Instead of using a trimmed mean, to remove outliers individually
e.g. if Q1 =24 and Q3=34, our IQR is 10
We consider a value an outlier if 1.5xIQR above Q3 or below Q1 so any value below 9 or above 49 in this example can be dismissed as outliers

31
Q

How does standard deviation differ from mean deviation?

A

1) Mean deviation cannot be used to estimate population parameters but SD can
2) Mean deviation ignores negative signs but SD squares them

32
Q

Define Variance

A

The expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value.

33
Q

Distinguish between variance and SD

A

The standard deviation is the square root of the variance. The standard deviation is expressed in the same units as the mean is, whereas the variance is expressed in squared units, but for looking at a distribution, you can use either just so long as you are clear about what you are using

34
Q

Distinguish between SD and SE

A

The standard deviation, or SD, measures the amount of variability or dispersion for a subject set of data from the mean, while the standard error of the mean, or SEM, measures how far the sample mean of the data is likely to be from the true population mean. The SEM is always smaller than the SD

35
Q

When should IQR be used as a dispersion measure?

A

Where median used as central tendency

Where population skewed so ordinal level preferred

36
Q

When should SD/variance be used as dispersion measures?

A

Interval level data and above

Associated with use of mean as central tendency

37
Q

What is the general rule of thumb with rounding

A

Round to one place below original intervals

38
Q

How is range calculated?

A

(Top value of data set - bottom value) + 1

39
Q

What is the semi-interquartile range?

A

Half IQR

40
Q

What is mean deviation?

How is it calculated?

A

The mean of all absolute deviation values in a data set, ignoring negative signs

Find mean value –> subtract mean from each value in data set to obtain set of deviations –> add all these deviations up ignoring negative signs –> divide result by N

41
Q

What is meant by “degrees of freedom”?

A

If we know the mean of a sample, all values making up the sample are free to vary except one

42
Q

Why is sample variance almost always smaller than population variance?

A

Think of normal distributions and how most values fall close to the mean
The sample is most likely to draw individuals from this region
This is why we divide by N-1 for variance calculations - dividing by a smaller number means the resulting estimate for population variance will be larger, compensating to some extent for the error

43
Q

Bearing in mind that, for a normal distribution, values of mean, median and mode are very often the same, suggest why the mean may be higher than the median in a particular sample

A

Remember that the mean is affected by outliers while the median is not

44
Q

List the measure(s) of central tendency and dispersion used for nominal, ordinal and interval level data

A

NOMINAL - Mode, and variance
ORDINAL - Median, IQR/SIQR
INTERVAL - Mean, Range/SD/variance

45
Q

When may we get a negatively skewed distribution?

A

When a test is particularly easy (the ceiling effect)
Substantial number can score below the mean i.e. more extreme values below the median which shifts the mean towards the tail

46
Q

What is the central tendency like for skewed distributions?

A

NEGATIVE - Mode>Median>Mean
POSITIVE - Mean>Median>Mode
Median is always in the middle and mode is always at highest point