Statistics Chapter 3 (and a bit of 2) (Variability and Association) Flashcards
Learn about associations and variablilty
What is a range in statistics
A range is the difference between the largest and smallest observation.
It is very affected by outliers.
What is deviation and how is it measured
The deviation of an observation x is the difference between x and the sample mean __
X
The formula is (observation value - sample mean)
It can be positive and negative.
The sum of all deviation is always 0.
What is variable and how is it measured.
What is it’s relation to deviation
Variance is the average of all the squared deviations.
The square has no direct meaning.
It is measured by:
__
[Sign of sum of] (x - x ) ^2
—————————————– = s^2
n - 1
What is Standard deviation and how is it measured.
How is it related to variance
Standart deviation is the square root of variance.
It displays the the average distance of an observation from the mean.
It is called S.
What does it mean if s = 0?
What is a disadvantage of s?
If s=0 all observation have the same value.
A disadvantage of s is that it is very affected by outliers because it uses the mean.
What is the empirical rule.
The empirical rule states that in a bell shaped distribution
- 68% of all observations fall within 1 Standart deviation s from the mean, so one value below and one above the mean.
- 95% of all observations fall within 2 standard deviations s from the mean.
- almost 100% fall within 3 standard deviations from the mean
What are quartiles and how do you find them.
Quartiles divide the range into 4 quarters, which means the first quarter represents the lowest 25% etc…
The median is always the second quartile, as it is the 50th percentile, the median between the lowest and normal median score is the first quarter etc…
These quartiles tell you about the shape of the curve
What is the interquartile range IQR?
The interquartile range IQR is the range between the third and the first quartile, so the middle half of the data.
It is resistant to outliers, as it does not include any data of the first or fourth quartile.
How do you find out if a value is an outlier?
An observation is an outlier if it falls more than 1.5 IQR below the first or over the third quartile.
Another way is by the z score, where you calculate the number of standard deviation a observation is away from the mean. If it is more than 3 standard deviations, it can be considered an outlier.
Observation - mean
——————————- = z
Standard deviation
What is a box plot and what does it contains.
The box plot is a diagraphical display of data which describe median and variability.
It contains the minimum value and maximum value as a line (expect outliers), the first and third quartile as a box and the median as a line in the box.
What are the disadvantages and advantages of a box plot compared to a histogramm.
A box plot does not contain mounds and gaps, but it displays outliers
What is the explanatory and response variable
The explanatory variable is the variable that is used to predict and which you can change (independent variable).
The response variable is the variable that you want to study (dependent variable)
Define association
Two variables are associated if a particular value for one variable is more likely to occur with certain values of the other variable
How do you summarize the association of two categorical variables?
You use contingency tables to collect all the data of each variable. You can then use bar graphs to plot conditional proportions, which shows the proportion of the response variable for one level of explanatory variable.
How do you summarize the association between two quantitative variables? Explain the associations
A common way is the scatterplot, where the x and y axis represent a variable.
The association is positive if both variables go up the same and negative if they go different ways.