Data representation & Regression and Correlation Flashcards

1
Q

What do we need on a box plot?

A
  1. Minimum and maximum value
  2. Median
  3. Upper and lower quartile (Q1 and Q3)
  4. Outliers represented as crosses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we find outlier boundaries?

A

Q1-1.5xIQR
Q3+1.5xIQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of data do we use for histograms?

A

Continuous data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What equation do we need to remember with histograms?

A

Kf=cw x fd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we find frequency density with a histogram?

A

Frequency/ class width

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the area of a histogram equal to?

A

Frequency x k
(you must find out what k is in the question)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What do we do when we are comparing sets of data?

A
  1. Compare a measure of location (median, mean)
  2. Compare a measure of spread (variance,standard deviation)
  3. Pu them into context
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the correlation coefficient?

A

It is a value which measures the strength and positivity/negativity of correlation on scatter graphs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What values is the correlation coefficient between?

A

-1≤ r ≤ 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What value is r when there is no correlation and what would that look like on a scatter graph?

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What would a scatter graph with a positive correlation look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What would a scatter graph with a negativecorrelation look like?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is a regression line and what is its formula?

A

Line of best fit
comes in the form y=mx+c

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is interpolation?

A

Estimating inside the data range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is extrapolation?

A

Estimating outside the data range

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is interpolation better than extrapolation?

A

Because it is more reliable.

17
Q

What are the advantages and disadvantages of using box plots?

A

+Highlights outliers
+Makes it easy to compare data sets
-Data is grouped into only 4 categories so some detailed analysis isn’t possible

18
Q

What are the advantages and disadvantages of using histograms?

A

+Clearly shows shape of distribution
-Doesn’t always highlight outliers
-Not easy to estimate Q1,Q2,Q3

19
Q

What are the advantages and disadvantages of using cumulative frequency curves?

A

+Easy to find the five number summary (Q1,median, etc)
-Doesn’t always hgihlight outliers
-If interval boundaries aren’t shown. the degree of detail is not clear.

20
Q

What does it mean when a box plot is positively skewed and what does this look like?

A

The median is closer to the LQ than the UQ

21
Q

What does it mean when a box plot is negatively skewed and what does this look like?

A

The median is closer to the UQ than the LQ