Statistics 3 Flashcards

Question 1

Q

What is an outlier?

Answer

A

An extreme value that lies outside the overall pattern of data.

Question 2

Q

What is an anomaly?

Answer

A

When an outlier is not a legitimate values and cannot still be correct. They should be removed from the data set, being a clear error, and it would be misleading to keep such value in.

Question 3

Q

What is the process of removing anomalies from the data set?

Answer

A

The process of cleaning the data.

Question 4

Q

How can anomalies arise?

Answer

A

-Experimental error
-Recording error
-Data value irrelevant to investigation.

Question 5

Q

Why are boxplots used?

Answer

A

To represent important features of data.

Question 6

Q

Advantage of boxplots?

Answer

A

They can be used to compare 2 sets of data.

Question 7

Q

Why are histograms used?

Answer

A

They are representations of grouped continuous data, giving a good picture of the nature of the data set.

Question 8

Q

What 3 features of graphs do histograms show?

Answer

A

-Rough location of data
-General shape
-Spread of data

Question 9

Q

The area of a bar in a histogram? What is the advantage of this?

Answer

A

Is proportional to the frequency of the class.

It allows for a representation of grouped data with uneven class intervals.

Question 10

Q

How can we calculate the height of a bar in a histogram (ie. its frequency density)?

Answer

A

Freq. density= k (scale factor) x frequency

Question 11

Q

If k=1, what is the calculation for the freq. density of a class of the data?

Answer

A

Frequency density= Frequency/class width.

Question 12

Q

How can we create a polygon from the histogram?

Answer

A

Joining the middle of then top of each bar together in a line.

Question 13

Q

What can we comment on in a comparison of 2 data sets?

Answer

A

-Measure of location (e.g. mean/median
-Measure of spread (e.g. standard deviation/IQR)
-Outliers (presence? Number?)
-Range
Skewness.

Question 14

Q

If the data set contains extreme values?

Answer

A

A comment on the median and IQR are more appropriate statistics to compare.

Question 15

Q

What are stem and leaf diagrams used for?

Answer

A

It is a method of organising numerical data based on the value of the sampling units.

Question 16

Q

What must be done before the stem-leaf diagram is collated?

Answer

Study These Flashcards

A

-The data values are ordered in ascending order.
-A key is formulated to decide how each value is divided into the stem and the leaf.

Question 17

Q

How is the data usually collated?

Answer

Study These Flashcards

A

-First digit(s) in the stem column
-Last digit in the leaf column

Question 18

Q

Disadvantage of stem and leaf diagrams?

Answer

Study These Flashcards

A

Requirement that the 2 data sets are closely related for the diagram to be utilised as an effective comparison of both sets.

Statistics 3 Flashcards

(18 cards)