1. Data and Models Flashcards
Summarising numerical data, summarising attribute data, fitting a model
Population
Definition
-a collection of individuals/items of interest
Sample
Definition
-the subset of the population for which observations are available
Variable/Variate
Definition
-a quantity or attribute whose value varies between individuals
Observation
Definition
-a recorded value of a variate for an individual
Data
Definition
-a collection of observations
Statistic
Definition
-a function of the data
Summarising Numerical Data
min and max
-the minimum and maximum values of the data
Summarising Numerical Data
Measures of Location
- summary statistics which try to capture the location of the centre of the sample
1) sample mean
2) mode
3) median
Summarising Numerical Data
Sample Mean
-the sample mean or sample average of x1,…,xn∈R is given by:
1/n Σ xi
Summarising Numerical Data
Mode
- the mode of a sample x1,…,xn is the value of the variate which occurs most frequently
- in cases where different values occur with the same frequency the mode may not be unique
Sumarising Numerical Data
Meidan
-a median of x1,…,xn∈R is any number m∈R such that:
a) at least half of the observations are less than or equal to m
AND
b) at least half of the observations are greater than or equal to m
-if the number of samples is odd, there is a unique median
-if the number of samples is even even, the median can fall anywhere in the interval between the middle two values, we usually choose the midpoint
Summarising Numerical Data
Measures of Spread
- statistics which characterise the spread of the sample
1) range
2) sample variance
3) sample standard deviation
4) interquartile and semi-interquartile range
Summarising Numerical Data
Range
-the range of a sample of numeric observations x1,…,xn∈R is the interval:
[min xi , max xi]
-i.e. the smallest interval which contains all the data
Summarising Numerical Data
Sample Variance
-the sample variance of x1,…,xn∈R is given by:
sx² = 1/(n-1) Σ(xi-x^)²
- where x^ is the sample mean
- the sample variance is nearly the average squared distances between samples and the sample mean, only the denominator is is n-1 instead of n
Summarising Numerical Data
Sample Standard Deviation
- sample standard deviation is the square root of the sample variance
- large values of sx indicate that the samples are spread out, while small values of sx indicate that the samples are concentrated around the sample mean