Variables, Data and Statistics Flashcards
What are variables?
Feature of population which is of interest
What kinds of qualitative data are there?
- Nominal
- Eye colour, job
- Ordinal (inherent order)
- Rank teaching as poor/fair/good/verygood
- Order needs to be preserved
- Age (18-25, 25-30)
What kinds of quantitate data are there?
- Discrete (count)
- Almost every case - whole numbers
- Number of people
- Age (as of last birthday)
- Bar charts
- Continuous (interval)
- Things we’ve measured
- Height, weight, exam marks, incomes,
- Age (exact)
- Histograms
- Ratio data
- Data that have all the characteristics of continuous data but also have a true zero point
What is the notation for population average?
µ
What is the notation for population variance?
σ2
What is the notation for population standard deviation?
σ
What is the notation for sample average?
X̅
What is the notation for sample variance?
s2
What is the notation for sample standard deviation?
s
What is the notation for sample correlation?
r
What is the notation for sample proportion?
p̂
What is the notation for population proportion?
p
Which measure of centre is best?
- Mean generally most commonly used but it is sensitive to extreme values
- If data skewed/extreme values present, median better (median is robust to outliers) (real estate prices)
- Mode generally best for categorical data (ratings etc)
Describe the mean median relationship
- If symmetric, mean = median
- If positive skew, mean > median
- If negative skew, mean
What is the coefficient of variation?
- A measure of spread of data as a proportion of the level of the centre
- Equal to the standard deviation devided by the mean multiplied by 100%
- Called cv
- cv = s/X̅
- Sometime x 100 and reported as a percentage
- Not same units as data
- Especially useful when comparing two or more sets of data that are measured in different units
How are percentiles found?
- Location of the Pth percentile = ((number of data points + 1) x P)/100
- Lp = ((n+1)P)/100
What is a covariance?
- Measures the strength of the linear relationship between X and Y
- Sign indicates direction of slope (negative vs positive relationship), but magnitude of covariance is dependent on units of measurement (so cannot indicate strength of relationship) - doesn’t tell if big or small
Describe what covariance’s tell us
- If cov>0, then as X increases, Y increases; as X decreases, Y decreases
- If cov
What is the coefficient of correlation?
- Also measures strength of linear relationship between X & Y
- Is bounded between -1 and +1 - extreme relationship
- If covariance = 0 , correlation = 0 - no linear relationship
- Units cancel out
- Correlation ≠ Causation