Week 4 Flashcards
Can a median be obtained for nominal variables?
Median cannot be obtained for nomimal variables, it can be obtained only on ordered variables – ordinal, interval, ratio
What types of data can the mode be used for
Can be used for all types of variables – and often used for nominal and ordinal variables e.g. the most frequent answer was ‘extremely satisfied.’
Can there be multiple modes
Yes
What variables can mean be defined in
Ratio and interval
Difference between mean and median
Mean depends on the actual values, whike median is not affected e.g. one extreme outlier can hugley affect mean but not median
How would you go about creating equal sections of data (use coins)
Divide coins into sections containing the same number of data and report where the sections are located
The cut off points dividing these sections are called quantiles
if there were 200 coins - 20 sections of 10 coins each
When there are 4 sections, they are called _____, and the median is the ____
quartiles (1sr-3rd)
2nd quartile
When there are 100 sections, they are ______ (1st-99th) and the median is the ___
percentiles
50th percentile
What is the 2nd moment
(distance from mean)^2 to each data point / number of data points
This is the variance
How hard it would be to spin the coins around the mean (torque)
What does a small variance mean
This is when the data has a small spread
Its concentrated more towards the mean
What is standard deviation
Square root of variance is called the standard deviation (SD)
The standard distance from the mean
What do z scores enable
Fair comparisons of deviations
What does skewness measure
The degree of asymmetry
Skewness and the 3rd moment
3rd moment = (distance from mean)^3 to each data point / no of data points
To make it dimensionless, this is divided by SD^3 ie. Skewness = 3rd moment/ SD^3
What does zero and high skewness mean
Zero skewness means data are symmetrically distributed, high skewness means distribution is highly asymmetrical.
What does positive and negative skewness indicate
-Positive/negative skewness indicates which direction data are skewed (see graphs for reference)
What does kurtosis meaure
Kurtosis meaures the sharpness
Kurtosis and the 4th moment
4th moment = (distance from mean)^4 to each data point / no of data points
To make it dimensionless, this is divided by SD^4 ie. Kurtosis= 4th moment/ SD^4
What do we do to the kurtosis
Kurtosis is always positive, but we normally subtract 3 (the kurtosis of the normal distribution). This is called the excess kurtosis
Reasons for outliers
Can be due to inaccuracies in data processing, problems with the methodology (e.g. measurements, instruments, participants not following instructions, an actual extreme value from an unusual participant
2 ways of measuring an outlier
- Based on z score – outlier if z score is more than 3 or less than –3 I.e. when the distance from the mean is more than 3 times of SD
- based on the IQR; width between the 1st and 3rd quartile – outlier if value is greater than 1.5 IQR above the 3rd quartule or smaller than 1.5 IQR below the 1st quartie
Cumulative probability
Using bionomial distribution, you can also calculate the probability that you get heads in a certain range e.g. what is the probability you get no more than 3 heads in 10 tosses?
What is the IQR
width of between the 1st and the 3rd quartile
Two tailed cumulative probability
Sometimes you may want to check the probability that a data is deviated from the centre or mean.
In this case, you need to take the cumulative probability at both ends. Checking this probability is called the two tailed probability
What do you need to use to describe a distribution of a continuous variable
Continuous distribution
In continuous distribution, what indicates the probability
And the area under the distribution in that range indicates the probability
Y axis on continuous distribution
Probability density