Statistics 3 Flashcards
Define probability
A quantitative description of the likelihood of an event occurring within dataset
What are the three ways of calculating probability?
Subjective: Judgement based on individual opinion and experiences
Theoretical: Calculation based on reasoning using knowledge of the specific dataset characteristics
Experimental: Measurement of an observation
What is sample space?
All the outcomes that could occur within a distribution
What does it mean if events are mutually exclusive?
Zero probability of two events occurring together (e.g. on a dice there is no way of getting a 5 and a 3)
What does it mean if events are independent?
Events that have no influence on each other
What should you typically assume about events?
That they are independent but not always
In a normal distribution what does the area under a curve represent?
the probability of different events occurring within the dataset
In a normal distribution what does the curve represent and what is underneath?
the probability density function and underneath are all the associated probabilities
Why cant you compare different normal distributions?
Because they typically have different means and standard deviations
What do you need to do to compare different normal distributions?
Standardisation
What is the standardization formula?
X-u/ standard deviation
What does the standardization formula do to the mean and standard deviation of normal distributions?
Mean: makes it 0
Standard deviation: makes it 1
What happens as you move away/to from the mean?
there is losses or gains in the mass of the dataset which is captured which then affects probability
What is hypothesis testing?
Testing to see whether the hypothesis around outcomes of a dataset are statistically significant or not
What is the initial hypothesis called and how is it phrased?
Null hypothesis always negatively phrased.
What is the alternative hypothesis and when is it formed?
The alternative to the null that is formed as the test has been carried out and after the null hypothesis has been rejected or accepted based on the test
What are 3 examples of null hypothesis? No statistically significant…
difference between sample and population
difference between samples a, b, c
relationship between variables a, b, c
How is the significance threshold determined?
By you - you decide what is a suitable level for a probability to confidently not have occurred due to chance
What are 4 typical levels of significance?
90%, 95%, 99%, 99.9%.
What does the result mean in regards to the statistical significance threshold?
If the result falls outside of the significance threshold then it did occur due to chance, but if it does surpass then you can be confident that it did not occur due to chance and it is indeed a statistically significant result
Why is it crucial to test for normality in a distribution?
To determine whether to carry out parametric or non-parametric data analysis
What are the three ways of determining normality in a distribution?
Observation
Q-Q plot
K-S test
What is the q-q plot?
Determining whether there is an even scatter of data that is also closely centred around the drawn straight line
What is the K-S test?
Observational with empirical analysis of whether a distribution is normally distributed
What does the K-S test involve?
Comparing the measured/observed distribution against a theorised normal distribution of that dataset.
What does it mean visually if the observed distribution resembles that of the theorised distribution?
that it is probably normally distributed
Why is the K-S similar to hypothesis testing?
Because you accept the null or alternative hypothesis
What is the null hypothesis and alternative hypothesis for a k-s test?
Null: there is no statistically significant difference between the observed dataset and the theorised dataset i.e. it resembles the normal distribution
Alternative: there is statistical significant difference i.e. it does not resemble the normal distribution
How do you analyse a K-S test in SPSS?
Analyse > non-parametric tests > legacy dialogs > one sample K-S > tick “normal” box
What is the product of the K-S test on SPSS?
Table - the column titled “sig.” tells you what the level of significance is and therefore enables you to determine whether to accept the null or alternative hypothesis
What is the importance of the “statistic” column in the SPSS table produced for K-S?
Nothing