Statistics 2 Flashcards
What is a measure of location?
Single value that describes a position within a data set.
If this value is describing the centre of the data?
This is a measure of central tendency.
Mode/modal class?
Value/class that occurs most often.
Median?
Middle value when all values ordered.
Mean calculated by?
Sum of data values (∑x)/ number of data values (n)
When is the mode an appropriate measure?
-Qualitative/quantitative data
-Single mode/bimodal data.
Inappropriate to measure mode?
Each value only occurs once.
Median usage?
Quantitative data only.
Advantage of median vs mean?
Not affected by extreme values, so can be used in data with such values.
Mean usage?
-Utilises all pieces of data, giving a true measure of the data.
-Used for quantitative data only.
-Is affected by extreme values.
For data values of a frequency table, mean calculated by?
Frequency Density (Midpoint x Frequency) ∑xf/
Frequencies (∑f
Median?
-Describes the middle of the data set, splitting the data into 2 50% halves.
For effects on measure by a new data value, how is this evaluated?
Compare the previous value with the new one, if its larger, it increases etc.
Lower quartile?
1/4 of the way through the data set.
Upper quartile?
3/4s of the way through the data set.
Percentiles?
-Split data into 100 parts.
(e.g. 10th percentile is 10/100 (1/10) of the way through the data).
Calculate lower quartile for discrete data?
-n/4
-If integer, lower quartile halfway between this data point + next above.
-If not integer, round up and utilise this data point.
Upper quartile of discrete data?
-3/4 of n
-If integer, upper quartile halfway between this data point + one above.
-If not integer, round up and utilise this data point.
If data is presented in a group frequency table, how can medians, quartiles and percentiles be estimated?
Using process of linear interpolation.
Why is there assumption involved in process of linear interpolation?
Assumed that data values are evenly distributed within each class/range.
Lower + upper quartile + median in grouped continuous/cumulative frequency data calculation?
Q1: n/4th value
Q2: n/2th value
Q3: 3n/4th value
Measure of spread?
Measure of how spread out data is.
Range?
Difference between largest and smallest values of data set.
IQR?
Difference between the upper quartile and lower quartile, Q3-Q1.
Range qualities?
-Takes into account all data values.
-Can be affected by extreme values.
IQR qualities?
-Not affected by extreme values.
-Only considers spread of the middle 50% of the data.
If one data set has a higher IQR than another, what is the difference in data?
Higher IQR= increased variability in data (its more variable).
Inter-percentile range?
Difference between values of two given percentiles.
Frequently used IPR?
10-90th, as not affected by extreme values whilst still interpreting spread of 80% of data in the calculation.
If asked to interpret the meaning of the value (like the IPR etc).
Just utilise the meaning of the measure (e.g. for mean, 50% of data larger, 50% smaller.)
Variance?
-Another measure of spread of data.
-Utilises the fact that each data point deviates from the mean by the amount of (the value-the mean) x-x-bar.
Variance known as.
Mean of sqaures minus squares of mean.
Variance eqns.
Refer to book.
Standard deviation?
Square root of variance.
Variance eqns for grouped frequency table.
Refer to book.
If data in a grouped frequency table?
Using linear interpolation, you can calculate estimates for the variance and standard deviation of the data, with the midpoint of each class interval used in calculations.
Coding?
Way of simplifying statistical calculations.
Coding formula?
y=x-a
b
Why are values coded?
When coded, they create a new data value which is easier to work with.
For the mean of coded data:
y-bar (coded mean)=x-bar (mean) - a/b (coding)
For the mean of original data?
X-bar=by-bar + a
For the standard deviation of the coded data?
sigma-y (coded standard deviation) =sigma-x (new standard deviation) /b
Standard deviation of original data?
sigma-x=b x sigma-y.
If asked to formulate code, and values decreased by 20% e.g, how is this interpreted.
Like normal
0.8(… etc.)