Numerical Measures Flashcards by Henry Crichton-Allen

What is the formula for the mean of a data set?

(Σx)/n where x represents the values of data and n is the number of values

How well did you know this?

Not at all

Perfectly

How is the median value found for non-grouped data?

(n+1)/2

How well did you know this?

Not at all

Perfectly

How is the median value found for grouped data?

n/2

(If you get an n^th value ending in .5, work out the mean between the value in front and value behind to get the value the median corresponds too)

How well did you know this?

Not at all

Perfectly

What is the mode?

The mode is the most common value in a data set.

Note for grouped data, there is a modal class. Which is defined as the class in which the modal value is contained.
Also note, not all samples have a mode.

How well did you know this?

Not at all

Perfectly

When working out the mean, median, mode or quartiles what information do you need?

You need a cumulative frequency column

How well did you know this?

Not at all

Perfectly

What is the range?

The range is the difference between the highest and lowest value in a data set

How well did you know this?

Not at all

Perfectly

What is the lower quartile?

The data at the 25th percentile of the sample.

For non-grouped data, the n^th value that represents the lower quartile is found by 0.25(n+1) where n is the cumulative frequency

For grouped data, the n^th value that represents the lower quartile is found by 0.25(n) where n is the cumulative frequency

(if you get an n^th value ending in .5 work out the mean between the value in front and the value behind to get the value the lower quartile corresponds to)

How well did you know this?

Not at all

Perfectly

What is the interquartile range?

The difference between the upper and lower quartiles

(Q₃ - Q₁)

(this is a value)

How well did you know this?

Not at all

Perfectly

What is the upper quartile?

The data value at the 75th percentile of the sample.

For non grouped data, the n^th value that represents the upper quartile is found by 0.75(n+1) where n is the cumulative frequency

For grouped data, the n^th value that represents the upper quartile is found by 0.75(n) where n is the cumulative frequency

(if you get an n^th value ending in .5 work out the mean between the value in front and the value behind to get the value the upper quartile corresponds to)

How well did you know this?

Not at all

Perfectly

How are variance and standard deviation related?

Variance = Standard Deviation ²

How well did you know this?

Not at all

Perfectly

What actually is variance?

A measure of how far each data point squared is from the mean, and therefore represents the spread of the data

How well did you know this?

Not at all

Perfectly

How is variance found?

Find the mean of the data points
Calculate the difference between each data point and the mean value (write this as a new list of values)
Square the difference between each data point and the mean
Find the sum of your new list of values
Write the final answer as the relevant unit squared

How well did you know this?

Not at all

Perfectly

How do you find the variance, SD, median and quartiles with your calculator?

MENU (6)
1-Variable (1)
Enter data and frequency
AC
OPTN
1-Variable Calc (2)

For grouped data, find the midpoint of the class and put it into the calculator

Note do not use cumulative frequency in the calculator

How well did you know this?

Not at all

Perfectly

How do you deal with grouped data when inputting into the calculator to find numerical measures?

Use the midpoint of the data as the value to input

How well did you know this?

Not at all

Perfectly

What is grouped and non-grouped data?

Grouped data refers to data given in class intervals (e.g 10-20)

Non-grouped data refers to individual pieces of data (e.g 6,24,69,420)

How well did you know this?

Not at all

Perfectly

How can you convert grouped data into non-grouped data?

Write out the heading of the group as many times as the frequency states

(e.g a group of 3 people with 4 cats each,
becomes, 4,4,4)

Note this also works in reverse

How well did you know this?

Not at all

Perfectly

When there are gaps in a continuous grouped data set (lengths 0-9, 10-19, 20-29), what do you always do first?

Adjust class widths to the value for which they would no longer round to the original values

(0-9, 10-19) becomes (0-9.5, 9.5-10.5)

Then find the midpoint column

How well did you know this?

Not at all

Perfectly

When there are gaps in discrete grouped data sets (ages 0-5, 6-10, 11-15), what do you always do first?

Study These Flashcards

Adjust class widths so that the final value of the width is the first value of the next width

(0-5, 6-10 11-15 … ) becomes (0-6, 6-11, 11-16 etc)

Then find the midpoint column

What is continuous data?

Study These Flashcards

Data which can take up any value (e.g girth, length and height)

What is discrete data?

Study These Flashcards

Data which can be counted and has finite values (e.g sausages, boys and pens)

What is the ‘formula’ for linear interpolation?

Study These Flashcards

(UB-LB)/(UF-LF) = (Q-LB)/(N-LF)

This basically states the proportion of the boundaries range to frequency range is the same as the proportion of the median - lowest boundary value to the median - lowest frequency

State an assumption of linear interpolation

Study These Flashcards

Data is evenly spread within the boundaries

How do you find ‘N’ in linear interpolation?

Study These Flashcards

For median n is the (cumulative frequency / 2)
For LQ n is the (cumulative frequency / 4)
For UQ n is 3 x (cumulative frequency / 4)

What are the steps of linear interpolation?

Study These Flashcards

– Adjust class widths of grouped data for any gaps
– Add a cumulative frequency column
– Input the cumulative frequency as n and sub into relevant equation (median / quartile)
– Find the class in which this value for n falls
– Draw interpolation diagram
– Find UB and LB by reading class width
– Find UF and LF by finding cumulative frequency on either side of the class
– Sub these values including N into the equation and solve for Q

What is the equation for standard deviation as coded data?

S_y = S_x / b Where y is coded data and x is the original data where b is a constant

What are the advantages and disadvantages of using the median as a measure of location?

Advantages: - Useful for non-numerical data - Always an observed data value Disadvantages: - Affected by an outlier - Does not use all data

How do you draw a linear interpolation diagram?

- Draw a horizontal straight line -Draw 3 vertical lines at the top, bottom and middle of your line - Write upper and lower boundaries on the top as well as Q - Write upper and lower frequencies on the bottom as well as the value of n (calculated by frequency equation initially) - Solve for Q using the equation

True or False data given in linear interpolation questions the data given is always grouped

True you will never be given non grouped data

What is the equation for variance?

S_xx/n or ((Σx²)/n) - x̄²)

What is the equation for standard deviation?

(S_xx/n)^1/2 (((Σx²)/n) - x̄²))^1/2

What is the general equation for coded data?

y = (x-a) / b where y is the coded data value, x is the original data and a and b are constants

What is the equation to find the coded mean?

ȳ = (x̄ - a)/b Where y is the coded mean and x is the original mean and a and b are constants

What are the advantages and disadvantages of using the mode as a measure of location?

Advantages: - Not affected by an outlier - Useful for non-numerical data Disadvantages: - Does not use all data - May be multiple modes

What are advantages and disadvantages of using the mean as a measure of location?

Advantages: - Large data set makes outliers negligible - Uses all data values Disadvantages: - Affected by outliers in small data sets

When you have discrete data with gaps do you amend the gaps or not?

You do not amend the gaps. You only amend gaps in continuous grouped data

What are the advantages and disadvantages of using the range as a measure of spread?

Advantages: - Reflects the full data set Disadvantages: - Affected by outliers

What are the advantages and disadvantages of using the interquartile range as a measure of spread?

Advantages: - Not affected by outliers Disadvantages: - Does not reflect the full data set

What are the advantages and disadvantages of using the standard deviation as a measure of spread?

Advantages: - Outliers are negligible in large data sets Disadvantages: - Outliers have a big impact on small data sets

What does sigma (Σ) notation express?

Sigma (Σ) refers to the 'sum of' For example, sigma x (Σx) means the sum of all the values of x

What does standard deviation actually mean?

A measure of how far each data point is from the mean, and therefore represents the spread of the data

Does addition/subtraction (when coding data) affect the mean and standard deviation? (skip this card)

Adding or subtracting will affect the mean of the data but not the standard deviation. This is because all data points have increased/decreased by the same value and so the distance from the mean is no different. The mean will change by the same value as the addition or subtraction

Does multiplication/division (when coding data) affect the mean and standard deviation?

Multiplying or dividing affects both the mean and standard deviation The mean will change by the same factor as the division or multiplication

What is the mean?

The mean is the sum of all data divided by the number of pieces of data It is calculated in the same way for grouped and non-grouped data.

What is the equation for standard deviation?

(S_xx/n)^1/2

Numerical Measures Flashcards

(44 cards)