Representations Of Data Flashcards

1
Q

The number of caravans on Seaview caravan site on each night in August last year is summarised as follows: the least number of caravans was 10.
The maximum number of caravans on this site was 64.
The three quartiles for this site was 33, 41 and 52 respectively.

During a month, the least number of caravans on Northcliffe caravan site was 31.
The maximum number of caravans on this site on any night that month was 72.
The three quartiles for this site were 38, 45 and 52 respectively.

On graph paper and using the same scale, draw box plots to represent the data for both caravan sites.
You may assume that there are no outliers.

A

Seaview:

  • Lowest = 10
  • LQ = 33
  • Median = 41
  • UQ = 52
  • Highest = 64

Northcliffe:

  • Lowest = 31
  • LQ = 38
  • Median = 45
  • UQ = 52
  • Highest = 72
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Compare and contrast these two box plots.

( The number of caravans on Seaview caravan site on each night in August last year is summarised as follows: the least number of caravans was 10.
The maximum number of caravans on this site was 64.
The three quartiles for this site was 33, 41 and 52 respectively.

During a month, the least number of caravans on Northcliffe caravan site was 31.
The maximum number of caravans on this site on any night that month was 72.
The three quartiles for this site were 38, 45 and 52 respectively. )

A
  • Median of Northcliffe is greater than median of seaview
  • ( Compare medians )
  • Upper quartiles are the same
  • ( Compare quartiles ) ( Either upper or lower one )
  • IQR of Northcliffe is less than seaview’s IQR
  • ( Compare IQR’s )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Give an interpretation to the upper quartiles of these two distributions.

( The number of caravans on Seaview caravan site on each night in August last year is summarised as follows: the least number of caravans was 10.
The maximum number of caravans on this site was 64.
The three quartiles for this site was 33, 41 and 52 respectively.

During a month, the least number of caravans on Northcliffe caravan site was 31.
The maximum number of caravans on this site on any night that month was 72.
The three quartiles for this site were 38, 45 and 52 respectively. )

A
  • On 75% of the nights that month, both has no more than 52 caravans on site
  • ( Interpret the graph with contents )
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Aeroplanes fly from City A to City B.
Over a long period of time the number of minutes delay in take-off from City A was recorded.
The minimum delay was 5 minutes and the maximum delay was 63 minutes.
A quarter of all delays were at most 12 minutes, half were at most 17
minutes and 75% were at most 28 minutes.
Only one of the delays was longer than 45 minutes.

An outlier is an observation that falls either 1.5 x ( interquartile range ) above the upper quartile or 1.5 x ( interquartile range ) below the lower quartile.

On graph paper, draw a box plot to represent these data.

A
  • IQR = 28 - 12
  • IQR = 16

UQ outlier:

  • UQ + 1.5 x ( IQR )
  • 28 + 1.5 x ( 16 )
  • UQ outlier = 52
  • So 63 is a outlier

LQ outlier:

  • LQ - 1.5 x ( IQR )
  • 12 - 1.5 x ( 16 )
  • LQ outlier = - 12
  • No LQ outliers

Boxplot:

  • Lowest = 5
  • LQ = 12
  • Median = 17
  • UQ = 28
  • Highest = 52
  • 63 is plotted as an “ X “ mark
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Suggest how the distribution might be interpreted by a passenger who frequently flies from City A to City B.

( Aeroplanes fly from City A to City B.
Over a long period of time the number of minutes delay in take-off from City A was recorded.
The minimum delay was 5 minutes and the maximum delay was 63 minutes.
A quarter of all delays were at most 12 minutes, half were at most 17
minutes and 75% were at most 28 minutes.
Only one of the delays was longer than 45 minutes. )
- ( - Lowest = 5

  • LQ = 12
  • Median = 17
  • UQ = 28
  • Highest = 52
  • 63 is plotted as an “ X “ mark )
A
  • Many delays are small so passengers should find these acceptable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the main features and uses of a box plot.

A
  • Maximum value
  • Median values
  • Outliers
  • Allows comparisons
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Children from schools A and B took part in a fun run for charity.
The times, to the nearest minute, taken by the children from school A are summarised in Figure 1.

( Figure 1 shows a boxplot, with values; lowest = 20, LQ = 25, median = 30, UQ = 37, highest = 50 and we have two “ X “ plots, showing outliers, at 53 and 57 )
( Boxplot is labelled School A and the x - axis shows time )

Write down the time by which 75% of the children in school A had completed the run.

A
  • 37
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

State the name given to this value.

( Children from schools A and B took part in a fun run for charity.
The times, to the nearest minute, taken by the children from school A are summarised in Figure 1.

( Figure 1 shows a boxplot, with values; lowest = 20, LQ = 25, median = 30, UQ = 37, highest = 50 and we have two “ X “ plots, showing outliers, at 53 and 57 )
( Boxplot is labelled School A and the x - axis shows time ) )
( Value is 37 )

A
  • Upper quartile
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain what you understand by the two crosses ( x ) on Figure 1.
( Two crosses on a boxplot )
( Children from schools A and B took part in a fun run for charity.
The times, to the nearest minute, taken by the children from school A are summarised in Figure 1.

( Figure 1 shows a boxplot, with values; lowest = 20, LQ = 25, median = 30, UQ = 37, highest = 50 and we have two “ X “ plots, showing outliers, at 53 and 57 )
( Boxplot is labelled School A and the x - axis shows time ) )

A
  • Outliers, these observations are very different to the observations, so need to be treated with caution
  • The two children probably took too long
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For school B the least time taken by any of the children was 25 minutes and the longest time was 55 minutes.
The three quartiles were 30, 37 and 50 respectively.

On graph paper, draw a box plot to represent the data from school B.

A

School B:

  • Lowest = 25
  • LQ = 30
  • Median = 37
  • UQ = 50
  • Highest = 55
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Compare and contrast these two box plots.
( School A:

  • lowest = 20
  • LQ = 25
  • Median = 30
  • UQ = 37
  • Highest = 50
  • And we have two “ X “ plots, showing outliers, at 53 and 57 )

( School B:

  • Lowest = 25
  • LQ = 30
  • Median = 37
  • UQ = 50
  • Highest = 55 )
A
  • Children from School A generally took less time
  • ( Comparing IQR’s )
  • 50% of B took less than 37 minutes
  • ( Median )
  • A has outliers
  • Upper quartile of A is less than B’s
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A teacher recorded, to the nearest hour, the time spent watching television during a particular week by each child in a random sample.
The times were summarised in a grouped frequency table and represented by a histogram.

One of the classes in the grouped frequency distribution was 20 - 29 and its associated frequency was 9. 
On the histogram the height of the rectangle representing that class was 3.6 cm and the width was 2 cm.

Give a reason to support the use of a histogram to represent these data.

A
  • Time is a continuous variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Write down the underlying feature associated with each of the bars in a histogram.

A
  • Area of the bar is proportional to the frequency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Show that on this histogram each child was represented by 0.8 cm^2.

( A teacher recorded, to the nearest hour, the time spent watching television during a particular week by each child in a random sample.
The times were summarised in a grouped frequency table and represented by a histogram.

One of the classes in the grouped frequency distribution was 20 - 29 and its associated frequency was 9. 
On the histogram the height of the rectangle representing that class was 3.6 cm and the width was 2 cm. )
A
  • ( Area of the bar is proportional to the frequency )
  • ( Height x Width = cw x FD )
  • 3.6 x 2 = 9 x y
  • y = 0.8
  • So each child is represented as 0.8
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

The total area under the histogram was 24 cm^2.

Find the total number of children in the group.

( A teacher recorded, to the nearest hour, the time spent watching television during a particular week by each child in a random sample.
The times were summarised in a grouped frequency table and represented by a histogram.

One of the classes in the grouped frequency distribution was 20 - 29 and its associated frequency was 9. 
On the histogram the height of the rectangle representing that class was 3.6 cm and the width was 2 cm. )
A
  • 24 = 0.8 x y
  • ( Area of the bar graph = representation per child x total number of children )
  • y = 30
  • Total number of children = 30
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The box plot shows a summary of the weights of the luggage, in kg, for each musician in an orchestra on an overseas tour.

( A figure shows a boxplot, labelled weight on its x - axis )
( Lowest = 25, LQ = 36, Median = 45, UQ = 54, highest = 70 and we have an outlier at 85 )

The airline’s recommended weight limit for each musician’s luggage was 45 kg.

Given that none of the musician’s luggage weighed exactly 45 kg,
state the proportion of the musicians whose luggage was below the recommended weight limit.

A
  • 1 /2 ( As 45 is the median result )
17
Q

A quarter of the musicians had to pay a charge for taking heavy luggage.
State the smallest weight for which the charge was made.

( A figure shows a boxplot, labelled weight on its x - axis )
( Lowest = 25, LQ = 36, Median = 45, UQ = 54, highest = 70 and we have an outlier at 85 )

A
  • 54
  • ( 1 /4 of them had to pay for heavy luggage, so its the upper quartile and above )
  • ( So smallest weight of which is charged is the UQ )
18
Q

Explain what you understand by the “ x “ on the box plot in Figure 1, and suggest an instrument that the owner of this luggage might play.

( A figure shows a boxplot, labelled weight on its x - axis )
( Lowest = 25, LQ = 36, Median = 45, UQ = 54, highest = 70 and we have an outlier at 85 )

A
  • The “ x “ is an outlier

- The musician could have been carrying a drum set

19
Q

Figure 2 shows a histogram for the variable t which represents the time taken, in minutes, by a group of people to swim 500 m.

( Histogram shows 5 bar graphs )
( first one has a cw between 5 - 10 and a FD of 2 )
( Second one has a cw between 10 - 14 and a FD of 4 )
( Third one has a cw between 14 - 18 and a FD of 6 )
( Fourth one has a cw between 18 - 25 and a FD of 5 )
( Fifth one has a cw between 25 - 40 and a FD of 1 )

Copy and complete the frequency table for t.
( Finding the frequency of the fourth and fifth bar graph )

A

Fourth:

  • 7 x 5 = 35

Fifth:

  • 15 x 1 = 15
20
Q

Estimate the number of people who took longer than 20 minutes to swim 500 m.

( Histogram shows 5 bar graphs )
( first one has a cw between 5 - 10 and a FD of 2 )
( Second one has a cw between 10 - 14 and a FD of 4 )
( Third one has a cw between 14 - 18 and a FD of 6 )
( Fourth one has a cw between 18 - 25 and a FD of 5 )
( Fifth one has a cw between 25 - 40 and a FD of 1 )

( Value of 20 is in the fourth bar graph )

A

Finding the area on fourth graph above 20:

  • 5 x 5
  • ( ( 25 - 5 ) x FD )
  • Area = 25

Area above 20:

  • 25 + 15 = 40
  • ( Area of the remaining fourth graph + area of fifth graph )
  • Number of people who took more than 20 minutes = 40
21
Q

Find an estimate of the mean time taken.

( Histogram shows 5 bar graphs )
( first one has a cw between 5 - 10 and a FD of 2, F = 10 )
( Second one has a cw between 10 - 14 and a FD of 4, F = 16 )
( Third one has a cw between 14 - 18 and a FD of 6, F = 24 )
( Fourth one has a cw between 18 - 25 and a FD of 5, F = 35 )
( Fifth one has a cw between 25 - 40 and a FD of 1, F = 15 )

A
  • Mean = ( midpoint of cw x frequency )… / total frequency
  • Mean = ( 7.5 x 10 ) + ( 12 x 16 ) + ( 16 x 24 ) + ( 21.5 x 35 ) + ( 32.5 x 15 ) / 10 + 16 + 24 + 35 + 15
  • Mean = 18.91
22
Q

Find an estimate for the standard deviation of t

( Histogram shows 5 bar graphs )
( first one has a cw between 5 - 10 and a FD of 2, F = 10 )
( Second one has a cw between 10 - 14 and a FD of 4, F = 16 )
( Third one has a cw between 14 - 18 and a FD of 6, F = 24 )
( Fourth one has a cw between 18 - 25 and a FD of 5, F = 35 )
( Fifth one has a cw between 25 - 40 and a FD of 1, F = 15 )

A
  • SD = Root Sum of Fx^2 / Sum of F - ( Sum of Fx / Sum of F )^2
  • Sum of Fx^2 = ( midpoint of cw^2 x frequency )…
  • Sum of Fx^2 = ( 7.5^2 x 10 ) + ( 12^2 x 16 ) + ( 16^2 x 24 ) + ( 21.5^2 x 35 ) + ( 32.5^2 x 15 )
  • = 41033
  • Sum of Fx = ( Midpoint of cw x frequency )…
  • Sum of Fx = ( 7.5 x 10 ) + ( 12 x 16 ) + ( 16 x 24 ) + ( 21.5 x 35 ) + ( 32.5 x 15 )
  • = 1891
  • Sum of F = 100
  • SD = root ( 41033 ) / 100 - ( 1891 / 100 )^2
  • SD = 7.26
23
Q

Find the median and quartiles for t.

A
  • ( Use of interpolation )

Median:

  • ( Total frequency / 2 )
  • 100 / 2 = 50
  • ( So we look for the 50th value from the CF that we made next to the data table )
  • ( This value lies in the 14 - 18 cw with a frequency of 24 )
  • Interpolation value = Lower bond of cw + ( Median value - lower bond of CF ) / group frequency x cw
  • Median = 14 + ( 50 - 26 ) / 24 x 4
  • = 18
24
Q

The histogram shows the time taken, to the nearest minute, for 140 runners to complete a fun run.

( Histogram shows 8 bar graphs )
( “ Frequency density “ against “ time “ )
( cw on graph = 1, 1, 4, 2, 3, 5, 3 and 12 consecutively )
( FD on graph = 6, 7, 2, 6, 5.5, 2, 1.5 and 0.5 consecutively )

Use the histogram to calculate the number of runners who took between 78.5 and 90.5 minutes to complete the fun run.

A
  • ( Frequency = cw x FD )
  • Frequency = ( 90.5 - 78.5 ) x 0.5
  • = 6
  • Total amount of runners = 140
  • Total area of the graph = ( 1 x 6 ) + ( 1 x 7 ) + ( 4 x 2 ) + ( 2 x 6 ) + ( 3 x 5.5 ) + ( 5 x 2 ) + ( 3 x 1.5 ) + ( 12 x 0.5 )
  • = 70
  • 6 x 140 / 70 = 12
  • = 12 runners
25
Q

In a study of how students use their mobile telephones, the phone usage of a random sample of 11 students was examined for a particular week.

The total length of calls, y minutes, for the 11 students were

17, 23, 35, 36, 51, 53, 54, 55, 60, 77, 110

Find the median and quartiles for these data.

A
  • 17, 23, 35, 36, 51, 53, 54, 55, 60, 77, 110
  • Median = 53
  • LQ = 11 / 4 = 2.75 = 3rd value
  • LQ = 35
  • UQ = 11 / 4 x 3 = 8.25 = 9th value
  • UQ = 60
26
Q

A value that is greater than Q3 + 1.5 × ( Q3 – Q1 ) or smaller than Q1 - 1.5 × ( Q3 – Q1 ) is defined as an outlier.

Show that 110 is the only outlier

( LQ = 35 )
( UQ = 60 )

A

UQ outlier:

  • 60 + 1.5 x ( 60 - 35 )
  • = 97.5
  • 110 > 97.5, so 110 is an outlier
27
Q

Draw a box plot for these data indicating clearly the position of the outlier.

( Lowest value = 17 )
( LQ = 35 )
( Median = 53 )
( UQ = 60 )
( Highest value = 97.5, ( UQ outlier ) )
( Outlier = 110 )
A

( Drawn diagram )

28
Q

In a shopping survey a random sample of 104 teenagers were asked how many hours, to the nearest hour, they spent shopping in the last month.
The results are summarised in the table below.

Number of hours:

1 ) 0 - 5

2 ) 6 - 7

3 ) 8 - 10

4 ) 11 - 15

5 ) 16 - 25

6 ) 26 - 50

Midpoint:

1 ) 2.75

2 ) 6.5

3 ) 9

4 ) 13

5 ) 20.5

6 ) 38

Frequency:

1 ) 20

2 ) 16

3 ) 18

4 ) 25

5 ) 15

6 ) 10

A histogram was drawn and the group ( 8 - 10 ) hours was represented by a rectangle that was 1.5 cm wide and 3 cm high.

Calculate the width and height of the rectangle representing the group ( 16 - 25 ) hours.

A
  • ( First of all plot the cw values, as the original cw’s have gaps, e.g 0 - 5 and 6 - 7, there’s a gap of 1 )
  • cw’s = - 0.5, 5.5, 7.5, 10.5, 15.5, 25.5, 50.5

8 - 10:

  • cw = 3 ( 7.5 - 10.5 )
  • FD = F / cw
  • FD = 18 / 3 = 6

CW : W

  • 3 : 1.5
  • ( CW for 16 - 25 = 25.5 - 15.5 = 10 )
  • 1 : 0.5
  • 10 : 5 cm

FD : H

  • 6 : 3
  • ( FD for 16 - 25 = 15 / 10 = 1.5 )
  • 1 : 1 / 2
  • 1.5 : 0.75 cm

16 - 25:

  • Width = 5 cm
  • Height = 0.75 cm
29
Q

Use linear interpolation to estimate the median and interquartile range.

( Number of hours:

1 ) 0 - 5

2 ) 6 - 7

3 ) 8 - 10

4 ) 11 - 15

5 ) 16 - 25

6 ) 26 - 50

Midpoint:

1 ) 2.75

2 ) 6.5

3 ) 9

4 ) 13

5 ) 20.5

6 ) 38

Frequency:

1 ) 20

2 ) 16

3 ) 18

4 ) 25

5 ) 15

6 ) 10 )

( cw’s = - 0.5, 5.5, 7.5, 10.5, 15.5, 25.5, 50.5 )

A
  • CF = 20, 36, 54, 79, 94, 104

Median:

  • 104 / 2 = 52nd value
  • Lies in 8 - 10 value
  • Interpolation value = lower bound of CW + ( Median value - lower bound of CF ) / grouped frequency x CW
  • Median value = 7.5 + ( 52 - 36 ) / 18 x 3
  • = 10.16666667
  • = 10. 2

LQ:

  • 104 / 4 = 26st value
  • Lies in 6 - 7 value
  • Interpolation value = lower bound of CW + ( LQ value - lower bound of CF ) / grouped frequency x CW
  • LQ value = 5.5 + ( 26 - 20 ) / 16 x 2
  • = 6.25
  • = 6.3

UQ:

  • 104 / 4 x 3 = 78th value
  • Lies in 11 - 15 value
  • Interpolation value = lower bound of CW + ( UQ value - lower bound of CF ) / grouped frequency x CW
  • 10.5 + ( 78 - 54 ) / 25 x 5
  • = 15.3
  • IQR = 15.3 - 6.3 = 9
30
Q

Estimate the mean and standard deviation of the number of hours spent shopping.

( Number of hours:

1 ) 0 - 5

2 ) 6 - 7

3 ) 8 - 10

4 ) 11 - 15

5 ) 16 - 25

6 ) 26 - 50

Midpoint:

1 ) 2.75

2 ) 6.5

3 ) 9

4 ) 13

5 ) 20.5

6 ) 38

Frequency:

1 ) 20

2 ) 16

3 ) 18

4 ) 25

5 ) 15

6 ) 10 )

A
  • ( Mean = Frequency x midpoint of CW )…. / total frequency
  • Mean = ( 20 x 2.75 ) + ( 16 x 6.5 ) + ( 18 x 9 ) + ( 25 x 13 ) + ( 15 x 20.5 ) + ( 10 x 38 ) / 104
  • Mean = 12.82211538
  • Mean = 12.8
  • SD = Root sum of Fx^2 / Sum of F - ( Sum of Fx / Sum of F )^2

Sum of Fx^2:

  • ( Fx^2 = frequency x midpoint cw^2 )
  • Sum of Fx^2 = ( 20 x 2.75^2 ) + ( 16 x 6.5^2 ) + ( 18 x 9^2 ) + ( 25 x 13^2 ) + ( 15 x 20.5^2 ) + ( 10 x 38^2 )
  • Fx^2 = 27254

Sum of Fx:

  • Sum of Fx = ( 20 x 2.75 ) + ( 16 x 6.5 ) + ( 18 x 9 ) + ( 25 x 13 ) + ( 15 x 20.5 ) + ( 10 x 38 )
  • Sum of Fx = 1333.5
  • SD = root 27254 / 104 - ( 1333.5 / 104 )^2
  • SD = 9.881854551
  • SD = 9.88
31
Q

State, giving a reason, which average and measure of dispersion you would recommend to use to summarise these data.

( Mean = 12.8 )
( SD = 9.88 )

A
  • Use median and IQR

- since data isn’t affected by outliers

32
Q

The company claims that for 75 % of the months, the amount received per month is greater than £10 000. Comment on this claim, giving a reason for your answer.

( UQ = 14 )
( LQ = 7 )
( in 1000’s )

A
  • Not true

- the LQ is 7,000 therefore 75% is above 7000 not 10 000