representation of data Flashcards

1
Q

Outliers

A

An outlier is an extreme value that lies outside the overall pattern of the data.

There are different ways of calculating outliers which depend on the nature of the data and the calculations you have made.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outliers Definition involving quartiles

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Outliers Definition involving mean and standard deviation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Cleaning the data

A

Sometimes outliers are legitimate values which could still be correct

However, there are occasions when an outlier indicates an error which should therefore be removed from the data as keeping it in would be misleading. These data values are known as anomalies.

Anomalies can be due to experimental error, recording error or could be data values which are not relevant to the investigation.

The process of removing anomalies from a data set is known as cleaning the data.

Data values should only be removed if that can be justified, not just because they do not fit the pattern of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Box and whisker plots

A

The median and quartiles can be displayed graphically by means of a box and whisker plot, or box plot. This gives an extremely useful summary of the data, and can be used to compare sets of data. It clearly shows the location and spread of a set of data.

In this diagram, a box is drawn from the lower to the upper quartile (ie. it represents the middle 50% of the data), and a line drawn in the box showing the position of the median. Whiskers extend from the lowest value to the highest:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Box and whisker plots example

A

Solution

The ranges of marks are similar, but class A has a lower inter-quartile range than class B, which suggests that the majority of the marks are less spread out for Class A.

The median and quartiles for class A are higher than those for class B, so on average class A did slightly better on the test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cumulative frequency tables and curves

A

Cumulative frequency curves enable us to estimate how many of the items of data fall below any particular value. For large data sets, they are also used to estimate medians, quartiles and percentiles for the data.

For grouped data, cumulative frequencies must be plotted against the upper class boundaries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Percentiles

A

75% percent of the data lies below the upper quartile. 25% of the data lies below the lower quartile. This concept can be generalised to give the value below which any percentage of the data lies. These are called percentiles.

For example, the 10th percentile is the value below which 10% of the data lie.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Histograms

A

To display grouped continuous data, a histogram is a suitable diagram to use.
This will give a good overall view of how the data is distributed. It will show the general shape, a rough location and how spread out the data is.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Histograms - unequal class width

A

The picture given by the second diagram is very different to the first one. This is caused by the data being put into unequal classes. It makes it look like there is more data at the larger values which is not the case.

This diagram is not useful as it gives a distorted picture of the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

frequency density

A

To overcome this problem, we need to take into account the unequal class widths so that the diagram still gives an accurate impressions of the overall distribution of the data.

In a histogram, the area of the bar is proportional to the frequency in each class.

To do this, you calculate the frequency density by dividing the frequency by the width of the class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly