Topic 2 Flashcards

1
Q

What is the purpose of descriptive statistics?

A

To describe or summarise the overall pattern of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do you describe numerical data?

A

The three S’s - shape, centre and spread (plus outliers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you describe categorical data?

A

Table of frequencies or proportions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a symmetrical shape?

A

Right and left side mirrored, can also be bell-shaped

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a shape that is skewed to the left?

A

Left side extends further out than the right side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a shape that is skewed to the right?

A

Right side extends further out than the left side

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a symmetrical, bimodal shape?

A

Symmetrical with two peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a symmetrical, uniform shape?

A

Symmetrical and flat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is an outlier? What may be the cause of them?

A

Observations that deviate from the overall pattern of distribution. They may be caused by natural variation or measurement error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are numerical summaries for centre or location? (3)

A

Mode, median, mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are numerical summaries for spread? (3)

A

Range, inter-quartile range (IQR), standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is mode?

A

The most common value or peak of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is median?

A

The middle; the value that divides an ordered data set into two equal halves

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

For what types of variables would you find the median?

A

Ordinal, discrete and continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is mean?

A

The average of the data, found by adding all values and dividing by the number of cases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What does the ‘x bar’ symbol represent?

A

Mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Is mean or median resistant to outliers/skewness and why?

A

Median, because it is always the middle. Mean can be more affected by outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Mean ? median in symmetrical data?

A

Mean = median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mean ? median in skewed left data?

A

Mean < median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Mean ? median in skewed right data?

A

Mean > median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is the ‘range’ of data?

A

The difference between the largest and smallest values in the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are the first, second and third quartiles?

A

Q1 - 25% of data below Q1
Q2 - 50% of data below Q2 - aka the median
Q3 - 75% of data below Q3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you calculate quartiles? (4 steps)

A
  1. Arrange data from lowest to highest
  2. Calculate the median (M)
  3. Calculate Q1 - median of the first half of data (excluding M)
  4. Calculate Q3 - median of the second half of the data (excluding M)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you find the interquartile range (IQR)?

A

IQR = Q3-Q1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the 1.5IQR rule used for?

A

A criteria used to identify outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

How do you find the lower threshold to identify any low outliers?

A

Q1 - 1.5IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How do you find the upper threshold to identify any high outliers?

A

Q3 + 1.5IQR

28
Q

What is the 5-number summary?

A

Summary of the minimum, Q1, median, Q3 and maximum

29
Q

What two values are represented by the sides of a box on a boxplot?

A

Q1 and Q3

30
Q

What does the line in a box of a boxplot indicate?

A

The median

31
Q

What value is s squared?

A

Variance

32
Q

What do you do to the value of variance to get the standard deviation?

A

Find the square root of variance

33
Q

What does a small standard deviation imply?

A

The data is concentrated around the mean

34
Q

What does a large standard deviation imply?

A

The data is widely spread around the mean

35
Q

Is standard deviation or IQR used more commonly? Which is resistant and sensitive to outliers?

A

Standard deviation is used more commonly however it is sensitive to outliers. IQR is resistant to outliers.

36
Q

What measure of centre is used for symmetrical data?

A

Mean

37
Q

What measure of spread is used for symmetrical data?

A

Standard deviation

38
Q

What measures of centre are used for data that is skewed or with outliers?

A

Median and mean

39
Q

What measures of spread are used for data that is skewed or with outliers?

A

Standard deviation and IQR

40
Q

What graphs are used with one categorical and one numerical variable? (3)

A
  • Side-by-side
  • Histograms/boxplots
41
Q

What graph is used with two numerical variables?

A

Scatterplot

42
Q

What descriptive statistics number/data is used with two numerical variables?

A

Correlation coefficient - r

43
Q

What does a response variable measure/record? On which axis is it plotted?

A

A response variable measures the outcome of a study. It is plotted on the y-axis

44
Q

What does an explanatory variable measure/record On which axis is it plotted?

A

An explanatory variable explains the changes in the response variable. It is plotted on the x-axis

45
Q

What is an independent variable compared to a dependent variable?

A

A variable that can be controlled to determine the value of a dependent variable

46
Q

What are some synonymous terms for independent variable? (6)

A
  • Explanatory variable
  • Predictor variable
  • Controlled variable
  • Regressor
  • Manipulated variable
  • Input variable
47
Q

What are some synonymous terms for dependent variable? (6)

A
  • Outcome variable
  • Response variable
  • Measured variable
  • Regressand
  • Observed variable
  • Output variable
48
Q

Does correlation always imply causation?

A

No

49
Q

What graphs would be used for a continuous Y variable and a categorical X variable? (2)

A
  • Side-by-side boxplots
  • Vertically aligned histograms
50
Q

What graph would be used for a continuous Y variable and a continuous X variable?

A

Scatterplot

51
Q

What graph would be used for a categorical Y variable and a categorical X variable?

A

Clustered bar chart

52
Q

What is the correlation coefficient a measure of?

A

It is a measurement of the strength of the linear relationship between two continuous variables, X and Y

53
Q

With what graph do you always use the correlation coefficient?

A

Scatterplot

54
Q

If the correlation coefficient r > 0, what does this mean for the linear relationship between X and Y?

A

r > 0 means as X increases, Y tends to increase

55
Q

If the correlation coefficient r < 0, what does this mean for the linear relationship?

A

r < 0 means as X increases, Y tends to decrease

56
Q

If r=0, what does this mean?

A

Existence. There is no linear relationship between X and Y. There could be some other kind of relationship

57
Q

What values of r indicate a stronger linear relationship?

A

The closer r is to 1 or -1, the stronger the linear relationship

58
Q

What would the graph show if r = -1 or r = 1?

A

The observations lie exactly on a line, with no scatter

59
Q

Is r sensitive to outliers?

A

Yes

60
Q

Can r be used for curved relationships?

A

No

61
Q

Does r (correlation) distinguish between a predictor variable and a response variable?

A

No

62
Q

What four characteristics should be asked from a scatterplot?

A
  • Does a relationship exist between the two variables?
  • What is its form? (linear, curved etc.)
  • Is it increasing or decreasing?
  • How strong is the relationship? (Correlation coefficient r)
63
Q

What descriptive statistics can be used for one categorical variable?

A

Frequency table

64
Q

What graphs can be used with two categorical variables? (2)

A
  • Clustered bar chart
  • Stacked bar chart
65
Q

What three characteristics should be asked from clustered/stacked bar charts?

A
  • Does a relationship exist between the two variables
  • Is it increasing or decreasing
  • How strong is the relationship?