Statistics 3 Flashcards
What is an outlier?
An extreme value that lies outside the overall pattern of data.
What is an anomaly?
When an outlier is not a legitimate values and cannot still be correct. They should be removed from the data set, being a clear error, and it would be misleading to keep such value in.
What is the process of removing anomalies from the data set?
The process of cleaning the data.
How can anomalies arise?
-Experimental error
-Recording error
-Data value irrelevant to investigation.
Why are boxplots used?
To represent important features of data.
Advantage of boxplots?
They can be used to compare 2 sets of data.
Why are histograms used?
They are representations of grouped continuous data, giving a good picture of the nature of the data set.
What 3 features of graphs do histograms show?
-Rough location of data
-General shape
-Spread of data
The area of a bar in a histogram? What is the advantage of this?
Is proportional to the frequency of the class.
It allows for a representation of grouped data with uneven class intervals.
How can we calculate the height of a bar in a histogram (ie. its frequency density)?
Freq. density= k (scale factor) x frequency
If k=1, what is the calculation for the freq. density of a class of the data?
Frequency density= Frequency/class width.
How can we create a polygon from the histogram?
Joining the middle of then top of each bar together in a line.
What can we comment on in a comparison of 2 data sets?
-Measure of location (e.g. mean/median
-Measure of spread (e.g. standard deviation/IQR)
-Outliers (presence? Number?)
-Range
Skewness.
If the data set contains extreme values?
A comment on the median and IQR are more appropriate statistics to compare.
What are stem and leaf diagrams used for?
It is a method of organising numerical data based on the value of the sampling units.
What must be done before the stem-leaf diagram is collated?
-The data values are ordered in ascending order.
-A key is formulated to decide how each value is divided into the stem and the leaf.
How is the data usually collated?
-First digit(s) in the stem column
-Last digit in the leaf column
Disadvantage of stem and leaf diagrams?
Requirement that the 2 data sets are closely related for the diagram to be utilised as an effective comparison of both sets.