Statistics Theory L7 = Statistics Basics Flashcards
Basic principles of statistics? (4)
- PSDI diagram has statistical population parameters (μ, σ, β, δ) & sample parameters (x bar, s, b, cursive s).
- Not only do we want the estimate, but we also want a measure of how good that estimate is.
- We’re forced to look at the subset of the population, because the population is large & sometimes vaguely defined.
- What is the consequence of this “subsettedness” on our ability to get a clear view of the population of interest?
Scales of measurement? (4)
- Ratio.
- Interval.
- Ordinal.
- Nominal.
Ratio attributes? (3)
- Are meaningful.
- Zero means absence.
- Can be continuous data, or discrete/integer.
Eg of how Ratios are meaning?
20 elephants = 0.5 x 40 elephants.
Interval attributes? (3)
- Differences are meaningful.
- Zero is arbitrary (compare degree C & F).
- Can be continuous data, or discrete/integer.
Eg of how differences can be meaningful?
30 degrees - 10 degrees = 20 degrees.
Ordinal attributes? (2)
- Are categorical data.
- Order of categories are meaningful.
Eg of Ordinal?
Education level: primary, secondary, undergraduate, postgraduate.
Nominal attributes? (2)
- Are also categorical data.
- There’s no inherent order.
Egs of Nominal scale of measurement? (2)
- Colour.
- Sex.
Why does the type of data matter? (3)
- Affects the type of analysis we can do.
- Affects interpretation of the analysis.
- As we go down in scale (ratio -> nominal), the data contain less information. Therefore, where possible, we want to stick to higher scales.
Population?
= the entire collection of entities about which we want to make an inference or draw a conclusion.
Sample?
= subset of the population, drawn because we can’t measure all entities in the populations.
Simple random sample?
= a sample of n entities drawn from a population so that each entity has the same chance of selection.
Parameter?
= the true value of something in the population that we want to know about (usually Greek letters).
Parametric statistics?
= statistical methods/models that focus on estimating parameters.
Non-parametric statistics?
= statistical methods/models that don’t focus on estimated parameters.
Statistic?
= any value calculated from sample data (usually Latin letters).
Statistics?
= tools used to make conclusions/inferences about an unknown population from a known sample.
Probability?
= tools used to make conclusions/inferences about an unknown sample from a known population.
Model?
= an approximation/simplification of reality for the purpose of improving understanding.
Statistical model?
= mathematical expression to summarise the relationship between a response (Y) variable & one or more explanatory (X) variables.
Kinds of measurement? (2)
- Measures of central tendency/location.
- Measures of dispersion.
Measures of central tendency/location? (7)
- Mean.
- Median.
- Quantile (percentile).
- Mode.
- Symmetrical, unimodal distribution.
- Skew.
- Kurtosis.