paper 2 Flashcards
NORMAL DISTRIBUTION
What are the features of a normal distribution curve?
- a bell shape curve
- a single peak
- symmetrical about the mean
• 50% above and 50% below the data - most of the data is within 1 s.d. of the mean
What proportions of the sample is at which point?
68% = within 1 s.d. of the mean (µ + 1∂ and µ - 1∂)
95% = within 2 s.d. of the mean (µ + 2∂ and µ - 2∂)
99.7% = within 3 s.d. of the mean (µ + 3∂ and µ - 3∂)
What are the notations for normal distribution?
µ = mean
∂ = standard deviation
What is the notation for a random X that is normally distributed?
X ~ N (µ,∂)
What does ∂² mean?
∂² is the standard deviation squared and is called VARIANCE
What are the rules for probability of normal distribution?
If X<15 LEAVE IT
(if it is pointing at number (less than))
If X>15 subtract it from 1
(pointing at X (greater than))
- if the answer is negative then DO THE OPPOSITE
How do you calculate the probability of normal distribution on a calculator?
- Press the MENU button and press 7
- Press ‘2: Normal CD’
- If X>15 (less than) put:
LOWER = -10000000
UPPER = 15 (or whatever number it is)
If X<15 (greater than) put:
LOWER = 15
UPPER = 10000000
What are some other points about finding ND on the calculator?
- calculator value is ALWAYS LESS THAN
- the same rules still apply about whether to subtract from zero or not
INVERSE DISTRIBUTION
What is the inverse normal?
INVERSE NORMAL = Area = probability/percentile
INVERSE DISTRIBUTION
What is the inverse normal?
INVERSE NORMAL = Area = probability/percentile
(e.g. 95% = 0.95
Area = 0.95 )
How do you find inverse normal on a calculator?
- Press the MENU button and press 7
- Press ‘3: Inverse Normal’
- Then input the area (to the left of the boundary), the standard deviation and the mean
EXAMPLE QUESTION OF INVERSE NORMAL
X~N (25,4)
- Find ‘a’ given that P(X = 0.27
- Input the information into the calculator:
Area :0.27
∂ :2
µ. :25
XInv = 23.77 - write as a=23.77
EXAMPLE QUESTION 2 OF INVERSE DISTRIBUTION
X~N (25,4)
P(24
1. find P(X<24) with NORMAL distribution LOWER: -100000 UPPER: 24 ∂: 2 µ: 25 P=0.30854 (the area on the left of the 24 boundary)
- Find P(X
CONFIDENCE INTERVALS
What will CONFIDENCE be based on?
THE SIZE OF THE SAMPLE = the larger the size of the sample, the closer the estimate is likely to be to the true population mean
THE VARIANCE = If readings are generally more varied then the estimate will be less reliable
How do you you calculate the standard error? what is it?
Standard error = ∂/√n
Standard error is how different the population mean is likely to be from a sample mean
(How different the population mean is from the point estimate)
What is the formula for confidence intervals?
x̅ ± 1.96 ∂/√n
(with µ in middle)
x̅ = sample mean n = sample size ∂ = population standard deviation
How does this formula look written out in full?
x̅ - 1.96 ∂/√n < µ < x̅ + 1.96 ∂/√n = 95%
(the numbers will change based on your level of confidence)
- = lower confidence limit \+ = upper confidence limit
What are the decimal numbers that are substituted into the formula for different confidence intervals?
90% = 1.64 95% = 1.96 98% = 2.33 99% = 2.57
EXAMPLE CONFIDENCE INTERVALS QUESTION
A sample of 16 fish with a mean length in the sample of 28cm. The standard deviation of this length is 4cm. Show a 95% confidence interval for the mean length of the fish in the length.
- x̅ = 28
n = 16
∂ = 4
CI = 95% - UPPER = 28 + 1.96 (4/√16)
= 29.96
LOWER = 28 - 1.96 (4/√16)
= 26.04 - Confidence interval = 26.04 < µ < 29.96
What does PMCC stand for?
Product moment correlation coefficient
How is the PMCC notated?
It is usually notated with the letter ‘r’
What is the letter ‘r’ (PMCC)
- r is a number between -1 and 1
(- 1< r < 1)
\+1 = perfect positive correlation -1 = perfect negative correlation 0 = no correlation
How do you calculate the PMCC on the calculator?
- Press the MENU button and press 6 (statistics)
- Press ‘2: a+bx’
- input all the x and y data points into the table
- press option (OPTN)
- press ‘4: regression calculation)
- use ‘r’ for the PMCC
can the ‘r’ value be affected by outliers?
yes it can
What is the equation for the regression line?
y = a + bx
a = y - intercept b = gradient
(substitute the letters from the question into the formula swell as the numbers e.g. if the letters were w and l the equation would be W = a + bl)
How do you calculate regression line of the calculator?
- Press the MENU button and press 6 (statistics)
- Press ‘2: a+bx’
- input all the x and y data points into the table
- press option (OPTN)
- press ‘4: regression calculation)
- use ‘a’ and ‘b’ for regression line
What do you need to do when answering the question?
- write the a and the b value
- substitute these numbers into the formula
- then answer the question by drawing the line or explaining what it shows
MEAN AND STANDARD DEVIATION
How is mean represented and worked out with listed data and frequency?
x̅ = ∑fx / ∑f
x = individual data entries f = frequency
How is mean represented and worked out with grouped data and frequency?
x̅ = ∑fx / ∑f
x = grouped data MIDPOINTS f = frequency
What is the advantage of the mean?
It is the most used average and uses every item of data.
What is the disadvantage of the mean?
It might not be representative if there is an extreme value (affected by outliers)
what is standard deviation?
- A measure of SPREAD that uses all of the data
- a HIGHER s.d. means that the data is MORE SPREAD OUT (and the opposite if it is low)
What is the advantage of using standard deviation?
It uses all of the data
What is the disadvantage of using standard deviation?
It takes longer to calculate and is therefore time consuming
How do you calculate the standard deviation of a set of LISTED data?
- find the mean of the data
- Square all of the values SEPARATELY then add them together
- use the formula:
√∑x̅i²/n - x̅²
n = the number of values x̅² = mean squared
- get s.d.
How to find the standard deviation of grouped data?
- find the mean of the data
- find the MIDPOINTs of the group
- multiply midpoints by the FREQUENCY
- add all of the values up
- use formula:
√∑fx²/∑f - x̅²
∑fx² = value from above ∑f = sum of frequency x̅² = mean squared
What is the variance?
Standard deviation squared (∂²)
How do you calculate standard deviation on a calculator? (listed data)
- Press the MENU button and press 6 (statistics)
- Press ‘1: 1-variable’
- Then press ‘SHIFT’ ‘MENU’, go down a page and press ‘3: statistics’
- press (2 : OFF)
- input your data
- then press option (OPTN)
- then press 3: 1-variable calc’
- find ∂x for standard deviation
How do you calculate standard deviation on a calculator? (grouped data)
- Press the MENU button and press 6 (statistics)
- Press ‘1: 1-variable’
- Then press ‘SHIFT’ ‘MENU’, go down a page and press ‘3: statistics’
- press (1 : ON)
- input your data (for x input the MIDPOINTS and enter the frequencies)
- then press option (OPTN)
- then press 3: 1-variable calc’
- find ∂x for standard deviation
What factors do you need to look out for when doing critical analysis?
- Is there any data to back up statements made?
- Use of vague or emotive language.
- Has the writer assumed too much either about the subject matter or the readers knowledge?
- how is the sample size and is it proportional to the research that they are doing?
- (if a graph) does it have axis/ are the axis misleading?
- Is it showing what it is meant to?
- Is there errors in the data?
- Is it even possible?
- Are the scales distorting the data?
- Is it the best type of graph?
What is the rule for outliers?
AN OUTLIER = an extreme value
- it is generally when we’re 1.5 IQRs beyond the lower and upper quantities
What is an example of an outlier question?
IQR = 7 UQ = 22 LQ = 15
7 x 1.5 = 10.5
= 22 + 10.5 = 32.5
= 15 - 10.5 = 4.5
What is a point estimate?
- the process of finding an approximate value of some parameter—such as the mean (average)—of a population from random samples of the population.
- point estimation involves the use of sample data to calculate a single value which is to serve as a “best guess” or “best estimate” of an unknown population parameter. (e.g. finding the mean)
- knowing that the mean of a sample is called a ‘point estimate’ for the mean of the population
How do you calculate point estimate?
- A point estimate of the mean of a population is determined by calculating the mean of a sample drawn from the population.
- The calculation of the mean is the sum of all sample values divided by the number of values.
How do you increase the accuracy of a point estimate?
The accuracy of the point estimate is likely to be improved by increasing the sample size
what is the equation for standardising?
= first find the area from the numbers (e.g. 0.45)
∂
N = number in probability
= you then look at the statistical tables