Statistics 3 Flashcards

1
Q

What is an outlier?

A

An extreme value that lies outside the overall pattern of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an anomaly?

A

When an outlier is not a legitimate values and cannot still be correct. They should be removed from the data set, being a clear error, and it would be misleading to keep such value in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the process of removing anomalies from the data set?

A

The process of cleaning the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How can anomalies arise?

A

-Experimental error
-Recording error
-Data value irrelevant to investigation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why are boxplots used?

A

To represent important features of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Advantage of boxplots?

A

They can be used to compare 2 sets of data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why are histograms used?

A

They are representations of grouped continuous data, giving a good picture of the nature of the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What 3 features of graphs do histograms show?

A

-Rough location of data
-General shape
-Spread of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

The area of a bar in a histogram? What is the advantage of this?

A

Is proportional to the frequency of the class.

It allows for a representation of grouped data with uneven class intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can we calculate the height of a bar in a histogram (ie. its frequency density)?

A

Freq. density= k (scale factor) x frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

If k=1, what is the calculation for the freq. density of a class of the data?

A

Frequency density= Frequency/class width.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How can we create a polygon from the histogram?

A

Joining the middle of then top of each bar together in a line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What can we comment on in a comparison of 2 data sets?

A

-Measure of location (e.g. mean/median
-Measure of spread (e.g. standard deviation/IQR)
-Outliers (presence? Number?)
-Range
Skewness.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the data set contains extreme values?

A

A comment on the median and IQR are more appropriate statistics to compare.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are stem and leaf diagrams used for?

A

It is a method of organising numerical data based on the value of the sampling units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What must be done before the stem-leaf diagram is collated?

A

-The data values are ordered in ascending order.
-A key is formulated to decide how each value is divided into the stem and the leaf.

17
Q

How is the data usually collated?

A

-First digit(s) in the stem column
-Last digit in the leaf column

18
Q

Disadvantage of stem and leaf diagrams?

A

Requirement that the 2 data sets are closely related for the diagram to be utilised as an effective comparison of both sets.