Lecture 9 - Statisitcs 1 Flashcards
Nominal
Response categories cannot be placed in a specific order – impossible to judge ‘distance’ between categories.
Ordinal
Response categories (values) can be placed in rank order – distance between categories cannot be measured mathematically
If lots of categories, we sometimes treat them as continuous for analysis purposes.
Quantative
Responses measured on a continuous scale with rank order – assuming uniform distance (same interval) between responses.
Treated as continuous.
Mean
- Average
- Denoted by 𝑥̅ (or x-bar).
- Take the sum (∑) of all values of a variable (x₁, x₂,…, xₙ) in a sample and divide them by the number of observations (n).
Median
The observation in the middle when we rank all observations from lowest to highest.
If we have an even number of observations, take the mid-point between the two middle values.
Appropriate for both interval and ordinal variables, but not nominal variables.
Mode
- The value that occurs most frequently.
- If there are values that occur equally frequently, and more than any other values, this is called a bimodal distribution,
i.e. there are two modes - Appropriate for interval, ordinal, and nominal variables.
Median vs Mean
The mean is heavily influenced by outliers (observations that have extreme values), and where there are strong outliers, the median might be a better measure of central tendency, or of a ‘typical observation’.
Range
- Measure of dispersion
- Largest value - smallest value
- very sensitive to outliers, mat not represent the spread of majority of data
Percentiles
- Percentiles divide the distribution in 100ths
- The first 1% of the data = the first p-percentile, the first 2% of data, the second percentile, etc.
- The median is the 50th percentile.
Quartiles
- Divides the data into quarters, and gives more information than range.
- Often presented as a boxplot
Variance
- Deviations from the mean (i.e. the ‘typical observation’)
- Think about the example of 10 respondents again and the distance from the mean of 28 for each respondent.
- We want to measure that summarizes all these differences from the mean.
- Try adding the distances. What is their sum?
- Not very informative!
Frequencies
- Useful for categorical data
- How many observations in each category
Relative frequencies
Can be represented with a bar graph showing the relative frequency distribution.
The height of the bar shows the frequency or relative frequency in that category.
Bars separate to emphasize that it is a categorical variable.
Proportions
Proportions are particularly helpful when we have a dichotomous/dummy variable.
- Code the vales of this variable as 0 = ‘no’, 1 = ‘yes’
Imagine our 10 respondents: 0, 1, 0, 0, 1, 1, 0, 0, 0, 1
In this case, the proportion is a special case of the mean: add up all 10 given values and divide by 10 (number of respondents) = 0.4 = proportion of respondents that answered “Yes”.
Nominal variables with only 2 categories (yes/no; male/female; true/false).
dichotomous/dummy variables
Histograms
Frequency distributions for quantitative variables.
Values of the variable on the x (horizontal) axis and how often each value occurs on the y (vertical) axis.
As the sample size increases, the sample distribution looks more like the population distribution.