Introduction Flashcards

1
Q

What does a geographical data frame comprise?

A

Cases + geographical references + variables (attributes/measurements about the cases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Salary in £s per week is an example of which measurement?

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three broad purposes of data analysis?

A
  • Description/exploration
  • Probabilistic inference and confirmation
  • Modelling relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What measures of central tendency can be used to describe ratio data?

A

Mode, Median, arithmetic mean, geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What measures of spread can be used to describe ordinal data, and which cannot?

A

% in the mode and IQR can both be used, but standard deviation and the coefficient of variation cannot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the varience, and why does this explain its low resistance?

A

The average of the squared deviations - therefore any outliers are made worse by squaring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

By dividing the standard deviation by the mean, and multiplying the outcome by 100 what do I get?

A

The coefficient of variation - a dimensionless measure that gives relative spread in comparison to the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you standardize values?

A

Subtract the mean and then divide by the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the coefficient of variation particularly useful for?

A

Comparing distributions with very different means and comparing variables measured in different units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Boxplots are good for comparing multiple batches of data, but what do they show about each batch?

A

The middle, extremes, IQR and identifies outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What equation(s) can be used to find outliers on a boxplot?

A

> UQ + 1.5*IQR

<LQ - 1.5*IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do stem and leaf plots enable us to see about the data?

A

The frequency distribution and overall shape of the data, the centre of the data and marked deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is important about the sum of absolute deviations from the mean?

A

It will be less than from any other number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the arithmetic mean be applied to nominal data?

A

Binary variables can be split into categories of 0 and 1, with the mean giving us the proportion of data in the ‘1’ class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the difference between inferential, explanatory and relational statistics.

A

Inferential - Go beyond the data to say something about a population.
Relational - why two events coincide
Explanatory - What caused an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate the standard deviation?

A

The square root of the varience.

17
Q

When calculating the trimmed mean, what is g? And how can g:0 and g:1 be otherwise described?

A

g = robustness factor.

g:0 is the arithmetic mean (no values trimmed) whilst g:1 represents guarding against 2 outliers.

18
Q

For data with a positive skew, the median, mode and mean occur in which order?

19
Q

Starting with x^3 as a transformation for negatively skewed data, write out the ladder of power.

A

x^3…..x^2…….x…….sqrtx……ln(x)…..-1/sqrtx……-1/x…….-1/x^2

20
Q

What skew is found in most distributions and what transformation is used to correct this?

A

Positive - log scale corrects this.

21
Q

If an outlier is a mistake, you can correct or omit it. If not, what two steps should you follow?

A

Try a transformation. If there is still no apparent reason, do a with/without analysis.

22
Q

What does the multiplication theorem state?

A

That for independent events, the probability of them occurring together is given by the product of their individual probabilities.

23
Q

A permutation is a set of objects in a given order, whereas in a combination the order matters. How can we find the number of possible combinations of a sample?

A

c(population size, sample size) = c(3,2) = 3!/2!(3-2)!

24
Q

What is probabilistic inference?

A

Making an inference from a sample to a population and calculating the chances of this being wrong.

25
After stating H1, collecting representative data and stating H0, what are stages 4,5 and 6 in hypothesis testing?
4 - specify significance level 5 - choose statistical test 6 - calculate test statistic
26
What does a p value tell you?
the chance of getting an observed value by chance if H0 is true.
27
Under what circumstance can we reject H0 for a given p value?
When p
28
What does a sampling distribution show?
All possible results that can be obtained under H0
29
Linear association between two variables is measured by what unit free indicator of both strength and direction?
Pearsons correlation coefficient - r. Where r = -1 there is very strong negative correlation. Where r=0 there is no linear correlation and where r=1, there is very strong positive correlation.
30
What is 'error'?
The deviation of points from their expected value.
31
To make all deviation positive, we use the sum of the squared error. How do you calculate this?
sum of (y-y^)^2