Lecture 6 Flashcards
If P is bigger than 0.05, is it significant?
No - it needs to be smaller than 0.05. Therefore we can be 95% sure the results didnt occur by chance and can then reject the null.
What does the P value represent?
The probability of the null hypothesis being true
What are the 4 types of probability distributions?
1) Normal Distribution
2) Log-normal distribution
3) Binomial distribution
4) Poisson distribution
What are probability distributions important for?
Event probabilities (p values) and outliers Confidence in uncertainty
What are the properties of Log-normal distribution?
- Where logarithm (mathematical function) is normally distributed
- Asymmetrical with a right skew (postive skew)
What is log-normal distribution common in?
- Biology
- Natural events
- Finance
- Human reaction times?
What transformation yields normal data from log-normal distributions?
log-transformation
What data is binomial distribution used for? and define it!
Binary data - only with 2 values, recorded as a 0 or a 1
What is the key parameter in binary data?
Probability of success (p) - often a %
- from 0-1, if p=0.5, there is an equal amount of 0’s and 1s
What are the properties of Poisson distribution
- Asymmetrical (related to binomial)
- Single parameter: lambda
Define ‘lambda’
Expected number of events, based on average rate and interval size
What are the key parameters in poisson distribution?
Lambda
What does poisson distribution show?
The probability that a given number of events occur independently in a fixed time/ space interval.
What type of data does poisson distribution show?
Count data - key parameter = number of events counted
Define count data (poisson distribution)
Count of events in a fixed period of time/ space - only whole numbers are possible. For instance - eye blinks per minute. It is asymettrical when count is very low.
What are the criteria for choosing the appropriate probability density function?
Type of data (e.g. binary) and match to the shape of data distribution
What is the name of the t-test equivalent for binomial data?
binomial test
What stats tests is used for poisson distribution?
Exact rate ratio tests
If data is asymetric, what should your confidence intervals be?
Asymmetric as well
What % is 3 sigma equivalent to?
99.7%
What confidence interval is most commonly used?
95%
What is the 95% confidence interval based on?
2 SD away from the mean (2 sigma)
What are data values outside of confidence intervals called?
Outliers
What are the methods of dealing with outliers?
- Exclude them
- Transform data
- Run analyses with and without outliers to see if they are having an effect on your data
What are the purposes of data transformations?
To match the shape of data distribution to a known, plausible pdf, giving us new data
What are the benefits of data transformations?
- doesnt always change distribution shape (e.g. largest remains so)
- Makes data better for analysis
When would you use log transformation?
When data has a right hand skew and could be log normal
How does log transformations reduce skew?
Reduces larger values more than smaller ones
How do you do a log transformation on SPSS?
Transform - compute variable - lg10 (logname of variable)
Why would use a Z-Score transformation/
to normalise the scale of distribution to mean = 0, SD =1. Basically puts all values on a scale from -1 to 1 with a mean of 0
Whats a benefit of z-score transformations?
Doesnt affect shape of distribution
How do you do a z-score transformation on SPSS?
Analyse - descriptive statistics- enter variables - select ‘save standardised values’
What is the process of a rank transformation?
Assings 1 to the lowest value, 2 to the next. Giving the same scores for the same values
what is a benefit of a rank transformation?
Deals with heavily skewed/ otherwise difficult data
How do you do a rank transformation on SPSS?
Transform - rank scores - enter variables
What is the 1st step of data cleaning?
Check for reasonable values, e.g. height shouldnt contain 0 or 390
What is the 2nd step of data cleaning?
Check for floor/ ceiling effects - where data points bunch up around lowest/ highest possible value
What is the 3rd step of data cleaning?
Check distribution shape, apply transformations if neccessary, deal with outliers.