S3 Flashcards
How do you find Sxx from a list of numbers
Square all of them add together. Then subtract the mean squared multiplied by the number of numbers.
What does PDF stand for?
Probability density function
How would you show that a statement is constant with f(x) being a PDF?
Integrate between the range and it will be 1
Negative probabilities won’t occur
How do you find the mean of a continuous random variable from the PDF?
∫xf(x) between the given limits. This is expectation of x
How do you find the variance of a continuous random variable from the PDF?
∫x²f(x) between the limits
Subtract E(x)²
Var(X)= E(X²)-E(X)² State this explicitly.
If you found the mean and variance of a random variable X, what would the mean and variance of Y be if Y=4X-1
Multiply the mean by 4 and subtract 1
Multiply variance by 4²
How do you find the probability of a random variable being between two limits when given a PDF
Intergrate between the two limits.
How is mode found with a PDF?
It is the highest point.
Differentiate to find it.
How is the median found from a PDF?
The integral between m and one of the limits is 0.5
What is a uniform distribution?
How is mean and variance found?
Looks rectangular when drawn
Mean is (a+b)/2
Variance = (1/12)(b-a)²
If given a PDF of f(x), how would you find the CDF?
The CDF a F(x) is the integral of the PDF between 0 and x
What so you have to say when writing down a PDF?
F(x) =………..
Give the limits of the function
Say that it is 0 otherwiise
What do you assume when doing a chi squared test?
Observed frequencies are approximately normally distributed about the expected frequencies.
When are groups put together with chi ² tests?
To make sure all groups have expected frequencies greater than 5
How are the degrees of freedom found with chi² squared test?
= no. of classes - no. of estimated parameters -1
What is equal to what with a 🐠 distribution ?
Mean and variance
What is assumed for a 🐠 distribution ?
Events take place randomly, independently and at a constant overall mean rate.
What do you assume for a binomial distribution
Events are random and independent.
The mean is np
What are confidence intervals a measure of?
In repeated sampling
90% of intervals generated in this way would contain the true population mean.
How are confidence limits found?
Sample mean ± K*(σ÷√(size of sample))
What is the difference between s and σ
S is the sample SD
σ is the population SD
When finding confidence limits you often have to approximate ⍬ with S.
When n≥50 it is a good aproximation.
What is the standard error of the mean?
The standard deviation of the sample means is σ÷√n
What does the central limit theorem say?
For a sample of size n drawn from a distribution with mean μ & variance σ² the distribution of the sample mean is approximately
N(μ,(σ²÷n)) for sufficiently large values of n.(25+)
When do you use 1.645, 1.96, and 2,576 values of K?
When you already know the parent population SD σ
If you estimate σ from the sample, you use the t distribution and not the normal provided the parent population is normally distributed. Thus you use the values of k further up the table.
If you had a large sample and you wanted to produce confidence intervals but had to estimate the parent population SD (σ) what values of K would you use?
If n is large enough (50+) then confidence intervals worked out using the normal distribution will be accurate enough and you can use the values of k at the bottom of the table.
When conducting a t test with paired data, what is done with significance levels?
When looking for the critical value in the table, look in the column with double the significance level given in the question. If it is a two tailed test, just use the given level.
This is because the tables are made to give each tail a probability of 0.5p%
Assumptions for wilcoxen test
The data is symmetrically distributed about the mean/median
When conducting a Wilcoxen test, what is done with any data that is equal to the median?
Removed from sample.
What are W₊ and W₋?
W₊ sum of the ranks above the median.
W₋ sum of the ranks below the median.
How do you check W₋ and W₊ are correct?
W₋+W₊ = 0.5n(n+1)
When conducting a Wilcoxen test when is W+ and W- used?
If two tailed the smaller of W- and W+
If one tail test and you are testing if the true median is less than the believed median, use W+
Where are the critical values found for the Wilcoxen test?
Page 30 of the formula booklet.
For the Wilcoxen test, how do you compare W and the critical value?
If W is less than or equal to critical value, reject H0 and accept H1
When do you use the normal distribution?
When the sample is large
When the sample is small, the distribution is normal and the population variance is known.
When do you use the Wilcoxen test?
When a sample is small and nothing is known about the distribution of the background population.
When do you use the t-distribution ?
The sample is small, the distribution is normal and the population variance isn’t known.
What is opportunity sampling?
Sampling which selects from those that are (easily) available
If a PDF is f(x)=x/50 for 0 to 10, how would you find E(3x+4)?
Multiply (3x+4) by x/50 for 1/50*(3x² +4x) and integrate between limits.
How would you find E(x²)
Multiply the PDF by x² and intergrate between range
What is subtracted from what with a Wilcoxen test?
Values - median being tested
If when doing a Wilcoxen test and the critical value is equal to the W value, what do you do?
Significant. Accept H1
Why is paired data appropriate on occasion?
To remove differences between groups, people, authorities
What is the significance level?
The significance level, also denoted as alpha or α, is the probability of rejecting the null hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of concluding that a difference exists when there is no actual differe
If the mean duration of an event can be normally distributed with ( μ,σ^2), what distribution would give the mean time for three events to occur?
It would be Normally distributed and the mean would be the same.
Variance of total time is 3σ^2, but when you have to divide by 9 to get the variance for the mean.
You divide by 9 and not 3 as 9=3^2 and you have to square constants when multiplying by a variance.
Give two reasons why an investigator might need to take a sample in order to obtain information about a population.
For example, need to take a sample because the population might be too large for it to be sensible to take a complete census.
Because the sampling process might be destructive
State two requirements of a sample.
Sample should be unbiased
Sample should be representative (of the population)
data should not be distorted by the act of sampling;
data should be relevant.
Discuss briefly the advantage of the sampling being random.
A random sample … enables proper statistical inference to be undertaken …… because we know the probability basis on which it has been selected
How is the median found from a CDF?
solve the equation F(m) = 0.5
In what form is a confidence interval given?
If you found it was 99 to 101 you would write it (99,101)
What do you assume when using a t test?
The sample is taken from a normally distributed population.
How would you conduct a cluster sample?
Identify clusters which are capable of representing the population as a whole
Choose a random sample of clusters
Randomly sample within the chosen clusters.
What is a simple random sample?
A simple random sample is one where every sample of the required size has an equal chance of being chosen.
Why might you conduct a cluster sample over a simple random sample ?
If you were investigating school chlidren, simple random would mean talking to a few pupils at lots of different schools which is more work than making a few schools clusters and sampling them.
What is stratified sampling and why might you do it?
There are identifiable subgroups or strata that might exhibit different characteristics.
Each stratum is randomly sampled.
Use it to obtain a representative sample.
Can get information on the individual strata.