1. Summary Statistics Flashcards

1
Q

What is exploratory data analysis (ELA)?

A

the part of statistics concerned with taking a first look at some data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are two aspects of EDA

A
  • summary statistics
  • data visualisation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is meant by summary statistics?

A

Calculating numbers that briefly summarise the data
ie central values of the data, how spread out the data is, or about the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is meant by data visualisation?

A

drawing a picture based on the data to show the shape (centrality and spread) of data, or the relationship between two variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What important questions should you ask about data before calculating summary statistics or drawing a plot?

A
  • What is the data? “What variables were measured and how, how many datapoints etc”
  • How was the data collected? “Sample or whole population?
  • Are there any outliers?
  • Ethical questions like are there any ethical or privacy issues, should the data be confidential?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Two types of summary statistic

A
  • Statistics of centrality, which tell us where the middle of the data is
  • Statistics of spread, which tells us how far the data typically spreads out from the middle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the mode of a dataset x?

A

the most common value of x(i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the median of a dataset x?

A

If the data is ordered, it is the central value in the ordered list
- if n is odd, this is x((n+1)/2)
- if n is even, this is 1/2(x(n/2) + x((n/2)+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the mean of a dataset x?

A

x(hat) = 1/n(x1 + x2 + … + xn)

= 1/n sum(x(i), 1,n)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What quantile is the median

A

q(1/2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What quantile is the maximum

A

q(1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What quantile is the minimum

A

q(0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the lower quartile

A

q(1/4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the upper quartile

A

q(3/4)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is meant by the number of distinct observations

A

The number of different datapoints we have after removing any repeats

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the interquartile range?

A

Difference between the upper and lower quartiles

17
Q

What is the sample variance?

A

s^2(x) = 1/(n-1) ((x1- x(hat))^2 + … + (x(n) - x(hat))^2)

= 1/(n-1) sum((x(i) - x(hat))^2, 1, n)

18
Q

What is the standard deviation?

A

Square root of the sample variance

19
Q

What is the computational formula for sample variance?

A

s^2(x) = 1/(n-1) sum(x^2(i) - nx^2(hat), 1, n)