Introduction Flashcards
What does a geographical data frame comprise?
Cases + geographical references + variables (attributes/measurements about the cases)
Salary in £s per week is an example of which measurement?
Ratio
What are the three broad purposes of data analysis?
- Description/exploration
- Probabilistic inference and confirmation
- Modelling relationships
What measures of central tendency can be used to describe ratio data?
Mode, Median, arithmetic mean, geometric mean
What measures of spread can be used to describe ordinal data, and which cannot?
% in the mode and IQR can both be used, but standard deviation and the coefficient of variation cannot.
What is the varience, and why does this explain its low resistance?
The average of the squared deviations - therefore any outliers are made worse by squaring.
By dividing the standard deviation by the mean, and multiplying the outcome by 100 what do I get?
The coefficient of variation - a dimensionless measure that gives relative spread in comparison to the mean.
How do you standardize values?
Subtract the mean and then divide by the standard deviation.
What is the coefficient of variation particularly useful for?
Comparing distributions with very different means and comparing variables measured in different units.
Boxplots are good for comparing multiple batches of data, but what do they show about each batch?
The middle, extremes, IQR and identifies outliers.
What equation(s) can be used to find outliers on a boxplot?
> UQ + 1.5*IQR
<LQ - 1.5*IQR
What do stem and leaf plots enable us to see about the data?
The frequency distribution and overall shape of the data, the centre of the data and marked deviations.
What is important about the sum of absolute deviations from the mean?
It will be less than from any other number.
How can the arithmetic mean be applied to nominal data?
Binary variables can be split into categories of 0 and 1, with the mean giving us the proportion of data in the ‘1’ class.
Describe the difference between inferential, explanatory and relational statistics.
Inferential - Go beyond the data to say something about a population.
Relational - why two events coincide
Explanatory - What caused an event