Introduction Flashcards

1
Q

What does a geographical data frame comprise?

A

Cases + geographical references + variables (attributes/measurements about the cases)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Salary in £s per week is an example of which measurement?

A

Ratio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the three broad purposes of data analysis?

A
  • Description/exploration
  • Probabilistic inference and confirmation
  • Modelling relationships
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What measures of central tendency can be used to describe ratio data?

A

Mode, Median, arithmetic mean, geometric mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What measures of spread can be used to describe ordinal data, and which cannot?

A

% in the mode and IQR can both be used, but standard deviation and the coefficient of variation cannot.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the varience, and why does this explain its low resistance?

A

The average of the squared deviations - therefore any outliers are made worse by squaring.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

By dividing the standard deviation by the mean, and multiplying the outcome by 100 what do I get?

A

The coefficient of variation - a dimensionless measure that gives relative spread in comparison to the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you standardize values?

A

Subtract the mean and then divide by the standard deviation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the coefficient of variation particularly useful for?

A

Comparing distributions with very different means and comparing variables measured in different units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Boxplots are good for comparing multiple batches of data, but what do they show about each batch?

A

The middle, extremes, IQR and identifies outliers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What equation(s) can be used to find outliers on a boxplot?

A

> UQ + 1.5*IQR

<LQ - 1.5*IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do stem and leaf plots enable us to see about the data?

A

The frequency distribution and overall shape of the data, the centre of the data and marked deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is important about the sum of absolute deviations from the mean?

A

It will be less than from any other number.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the arithmetic mean be applied to nominal data?

A

Binary variables can be split into categories of 0 and 1, with the mean giving us the proportion of data in the ‘1’ class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe the difference between inferential, explanatory and relational statistics.

A

Inferential - Go beyond the data to say something about a population.
Relational - why two events coincide
Explanatory - What caused an event

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How do you calculate the standard deviation?

A

The square root of the varience.

17
Q

When calculating the trimmed mean, what is g? And how can g:0 and g:1 be otherwise described?

A

g = robustness factor.

g:0 is the arithmetic mean (no values trimmed) whilst g:1 represents guarding against 2 outliers.

18
Q

For data with a positive skew, the median, mode and mean occur in which order?

A

Mode<Mean

19
Q

Starting with x^3 as a transformation for negatively skewed data, write out the ladder of power.

A

x^3…..x^2…….x…….sqrtx……ln(x)…..-1/sqrtx……-1/x…….-1/x^2

20
Q

What skew is found in most distributions and what transformation is used to correct this?

A

Positive - log scale corrects this.

21
Q

If an outlier is a mistake, you can correct or omit it. If not, what two steps should you follow?

A

Try a transformation. If there is still no apparent reason, do a with/without analysis.

22
Q

What does the multiplication theorem state?

A

That for independent events, the probability of them occurring together is given by the product of their individual probabilities.

23
Q

A permutation is a set of objects in a given order, whereas in a combination the order matters. How can we find the number of possible combinations of a sample?

A

c(population size, sample size) = c(3,2) = 3!/2!(3-2)!

24
Q

What is probabilistic inference?

A

Making an inference from a sample to a population and calculating the chances of this being wrong.

25
Q

After stating H1, collecting representative data and stating H0, what are stages 4,5 and 6 in hypothesis testing?

A

4 - specify significance level
5 - choose statistical test
6 - calculate test statistic

26
Q

What does a p value tell you?

A

the chance of getting an observed value by chance if H0 is true.

27
Q

Under what circumstance can we reject H0 for a given p value?

A

When p<alpha

28
Q

What does a sampling distribution show?

A

All possible results that can be obtained under H0

29
Q

Linear association between two variables is measured by what unit free indicator of both strength and direction?

A

Pearsons correlation coefficient - r. Where r = -1 there is very strong negative correlation. Where r=0 there is no linear correlation and where r=1, there is very strong positive correlation.

30
Q

What is ‘error’?

A

The deviation of points from their expected value.

31
Q

To make all deviation positive, we use the sum of the squared error. How do you calculate this?

A

sum of (y-y^)^2