Introduction Flashcards
What does a geographical data frame comprise?
Cases + geographical references + variables (attributes/measurements about the cases)
Salary in £s per week is an example of which measurement?
Ratio
What are the three broad purposes of data analysis?
- Description/exploration
- Probabilistic inference and confirmation
- Modelling relationships
What measures of central tendency can be used to describe ratio data?
Mode, Median, arithmetic mean, geometric mean
What measures of spread can be used to describe ordinal data, and which cannot?
% in the mode and IQR can both be used, but standard deviation and the coefficient of variation cannot.
What is the varience, and why does this explain its low resistance?
The average of the squared deviations - therefore any outliers are made worse by squaring.
By dividing the standard deviation by the mean, and multiplying the outcome by 100 what do I get?
The coefficient of variation - a dimensionless measure that gives relative spread in comparison to the mean.
How do you standardize values?
Subtract the mean and then divide by the standard deviation.
What is the coefficient of variation particularly useful for?
Comparing distributions with very different means and comparing variables measured in different units.
Boxplots are good for comparing multiple batches of data, but what do they show about each batch?
The middle, extremes, IQR and identifies outliers.
What equation(s) can be used to find outliers on a boxplot?
> UQ + 1.5*IQR
<LQ - 1.5*IQR
What do stem and leaf plots enable us to see about the data?
The frequency distribution and overall shape of the data, the centre of the data and marked deviations.
What is important about the sum of absolute deviations from the mean?
It will be less than from any other number.
How can the arithmetic mean be applied to nominal data?
Binary variables can be split into categories of 0 and 1, with the mean giving us the proportion of data in the ‘1’ class.
Describe the difference between inferential, explanatory and relational statistics.
Inferential - Go beyond the data to say something about a population.
Relational - why two events coincide
Explanatory - What caused an event
How do you calculate the standard deviation?
The square root of the varience.
When calculating the trimmed mean, what is g? And how can g:0 and g:1 be otherwise described?
g = robustness factor.
g:0 is the arithmetic mean (no values trimmed) whilst g:1 represents guarding against 2 outliers.
For data with a positive skew, the median, mode and mean occur in which order?
Mode<Mean
Starting with x^3 as a transformation for negatively skewed data, write out the ladder of power.
x^3…..x^2…….x…….sqrtx……ln(x)…..-1/sqrtx……-1/x…….-1/x^2
What skew is found in most distributions and what transformation is used to correct this?
Positive - log scale corrects this.
If an outlier is a mistake, you can correct or omit it. If not, what two steps should you follow?
Try a transformation. If there is still no apparent reason, do a with/without analysis.
What does the multiplication theorem state?
That for independent events, the probability of them occurring together is given by the product of their individual probabilities.
A permutation is a set of objects in a given order, whereas in a combination the order matters. How can we find the number of possible combinations of a sample?
c(population size, sample size) = c(3,2) = 3!/2!(3-2)!
What is probabilistic inference?
Making an inference from a sample to a population and calculating the chances of this being wrong.
After stating H1, collecting representative data and stating H0, what are stages 4,5 and 6 in hypothesis testing?
4 - specify significance level
5 - choose statistical test
6 - calculate test statistic
What does a p value tell you?
the chance of getting an observed value by chance if H0 is true.
Under what circumstance can we reject H0 for a given p value?
When p<alpha
What does a sampling distribution show?
All possible results that can be obtained under H0
Linear association between two variables is measured by what unit free indicator of both strength and direction?
Pearsons correlation coefficient - r. Where r = -1 there is very strong negative correlation. Where r=0 there is no linear correlation and where r=1, there is very strong positive correlation.
What is ‘error’?
The deviation of points from their expected value.
To make all deviation positive, we use the sum of the squared error. How do you calculate this?
sum of (y-y^)^2