Revision Flashcards

1
Q

4 parts of statistics -

What is descriptive statistics?

A

Summarising data usefully

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

4 parts of statistics -

What is inference statistics?

A

(Interpolation) Measured data telling us things about unmeasured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

4 parts of statistics -

What is significance statistics?

A

Are the data collected or analysis made meaningful?

P-value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

4 parts of statistics -

What is prediction statistics?

A

(Extrapolation) What does the data we have lead us to expect in different situations?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Probability distributions - what’s on the x and y axis?

A
X-axis = outcomes 
Y-axis = probability
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

State the characteristics of the nominal measurement scale (10% of marks)

A

Nominal data are where individuals have been categorised.
An example -
- Data on first languages of students on the geography course.
- There’s no inherent order to these categories.
- A single nominal variable can only have one value (one first language).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

State the characteristics of the ordinal measurement scale (10% of marks)

A

Individuals ranked according to criterion / individuals ranked into sorted categories
No standard value for the difference between the ranks they’re just 1st 2nd and 3rd.
Example - top Welsh Universities of 2019
1st = Swansea University
2nd = Aberystwyth University
3rd = Bangor University
4th = Cardiff University

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

State the characteristics of the interval measurement scale (10% of marks)

A

Numerical measurement data that has an arbitrary origin
Examples include - temperature scales of degrees Celsius or Fahrenheit. - pH values in a lake.
Both data sets can go below zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

State the characteristics of the ratio measurement scale (10% of marks)

A

Numerical measurement data that has a meaningful origin where 0 means zero.
Examples include lengths and quantities eg metres or amount of people.
Amounts can be doubled and it’s twice as much
(2m doubled is 4 metres)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are data?

  • what’s a population
A

A whole body of individuals of whom we are interested

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are data?

  • what are the individuals?
A

Individuals of the population
Eg- the towns in a country

Each row corresponds to an individual
Each column corresponds to a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are data?

  • what are the variables?
A

Variables are the amount of schools / population / amount of cars per household in the town being measured

Each row corresponds to an individual
Each column corresponds to a variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are data?

  • what is a sample?
A

A collection of individuals drawn from a population.

It’s is rarely practical to obtain data for a whole population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the measures of central tendency

A

Mean median and mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the measures of dispersion

A
(Simple) range 
Inter-quartile range 
Standard deviation (and variance)
Skewness 
Kurtosis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mean

A

The sum of values in a data set divided by the number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the median

A

The middle observation / average of the two middle observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the mode

A

The value that has the highest frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the range

A

The difference between the smallest and largest values in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the inter-quartile range?

A

The difference between the lowest quarter and highest quarter of ranked values in a dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What’s the standard deviation

A

It measures the dispersion around the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the variance

A

Square of the standard deviation

Often used to compare variables measured in different units

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is the Skewness

A

Indicates how a dataset is distributed about the central value - how symmetrical is the distribution?
Helpful to decide if the data is useful for a parametric test or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the kurtosis

A

Measures the extent to which data are concentrated in one part of the frequency distribution - how peaky is the distribution?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Explain what the relationship between the mean, median and mode of a dataset reveals about its skewness. (20% of marks)

A

If mean > median > mode, then the skew is positive (to the right)

If mean = median = mode, then the skewness = 0 (AKA symmetrical)

If mean < median < mode, then the skew is negative (to the left)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the different types of kurtosis?

A

Positive kurtosis = leptokurtic (taller, narrower) (kurtosis > 0)

Zero kurtosis = mesokurtic (normal distribution) (kurtosis = 0)

Negative kurtosis = platykurtic (lower, wider) (kurtosis < 0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Explain the common elements of statistical tests

A
  • a question about the data
  • hypothesis H1 or H0 (default = null hypothesis)
  • tests may be one-tailed or two-tailed
  • the test gives us a significance level for the answer
  • allows us to say how confident we are in the result
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is significance?

A

Significance = (p-value) is the probability that the result is due to chance / the null hypothesis is true
OR if the null hypothesis were true, how likely would the observed outcome be?
Therefore we want the p-value / significance to be small if we want to reject the null hypothesis
Typically we want is less than 0.05 or even 0.01

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is confidence?

A

The probability that the result isn’t due to chance, expressed as a %.
Can be calculated by subtracting the significance from 1 and multiplying by 100.
Decide to start at 0.05 significance (95% confidence) then go down to 99% confidence if possible

30
Q

Name two 1-tailed tests …

A

T-test

Correlation

31
Q

Why are 1-tailed tests useful

A

Useful if we think we can see a pattern in the data at the offset
It’s a more stringent test therefore more useful

32
Q

What are parametric tests?

Give examples

A
Variables with normal distributions 
More powerful than non-parametric tests 
Examples -
T-test 
Correlation 
Regression
33
Q

What is a normal distribution?

A

Applies to many quantities where values are clustered around a mean value
Many variables in geography are (assumed to be) normally-distributed
This is when you use parametric statistical tests

34
Q

How much % of a data set lies within 1,2 or 3 standard deviations?

A

68% within 1 SD
95% within 2 SD
99% within 3 SD

35
Q

When designing research … what questions?

A

What are your research questions?
What’s the population?
What are the variables?
What type of data? (Nominal, ordinal, scale (interval or ratio))
Always start with scale data first if possible
What statistical tests are appropriate?

36
Q

What statistical test is nominal data limited to?

A

Chi squared??

37
Q

Hypotheses - what’s the null and alternative and which one should be always assume is correct?

A
Null hypothesis (H0) = no significant difference 
Alternative hypothesis (H1) = there is a significant difference 
Always assume the null hypothesis is correct
38
Q

What is accuracy?

A

A measure of how close the result of the experiment is to the true value (lack of bias) - therefore it is a measure of correctness of the result

39
Q

What is precision?

A

A measure of how well the result has been determined, without reference to its agreement with the true value - it is a measure of the reproducibility of the result

40
Q

When does bias arise?

A

When the sampling method over or under -represents particular characteristics of the population.

41
Q

Fundamentals of sampling -
For the individuals in a population, sampling should be equal and independent.
This means:

A
  • All individuals have the same chance of inclusion

- The inclusion of a given individual should not affect the chance of selection of any other individual

42
Q

What are the non-parametric test equivalents of the parametric students t-test?

A

Mann Whitney u test
Kruscal wallace test
Chi squared

43
Q

What’s the standard error of a mean?

A

Quantifies the probable relationship between the sample mean and the population mean
Quantifies the width of the sampling distribution
If we measure a particular sample mean, how close to the population mean is that likely to be?
Equivalent to the standard deviation of the sampling distribution
- The standard error is the sampling distributions own special version of the standard deviation

44
Q

What 2 statistical tests could you use for correlation?

A

Pearsons product moment correlation coefficient (for interval and ratio data)

Spearmans rank correlation coefficient (for ordinal data)

45
Q

What does ‘there is a relationship between x and y’ mean?

A

As x increases, y increases

46
Q

What does ‘there is an association between x and y’ mean?

A

As x changes, y changes

47
Q

What does ‘there is a positive / negative correlation between x and y’ mean?

A

x and y are positively / negatively correlated

48
Q

What are the two types of t-test?

A

1) comparing the means of sample and population (allows us to calculate using an estimate of the population mean)
2) comparing means of two samples

49
Q

What statistician devised the students t test?

A

William sealy gosset (1876-1937)

Employed by Guinness

50
Q

For the T-test sample Vs population comparison, how do we estimate the standard deviation?

A

Use the sample

51
Q

What does the T-statistic represent?

A

The difference between the means, scaled by an estimate of the standard error
Gives us a measure of the overlap between the two samples
Can be positive or negative

52
Q

What is a P-P or a Q-Q plot?

A

Plot of quantities (or proportions) of a variables distribution against the quantities (or proportions) of any of a number of test distributions

53
Q

What are probability plots used to determine?

A

Whether the distribution of a variance matches a given distribution
If the selected variable matches the test distribution, the points cluster around a straight line

54
Q

What is a confidence interval?

A

Calculated from the standard error
It’s the range of values in which we are confident the true population mean lies
Imagine we want to be 95% confident the true mean lies within our confidence interval. This means there must only be a 5% chance that it lies outside the interval.

55
Q

What’s a type 1 error?

A

If we conclude there is a relationship / pattern / presence where none exists (false positive)

56
Q

What is a type 2 error?

A

If we conclude there is no relationship / pattern / presence where in fact one does exist (false negative)

57
Q

Is a type 1 or type 2 error worse?

A

We would prefer to make a type 2 error than a type 1 error (dependent on the scenario)

58
Q

Give an example of a bad type 2 error

A

‘Love canal’
Land contamination
Concluded there was no contamination when there was in fact contamination.
Former landfill site
22,000 tonnes of chemical waste, heavy rain in 1977 caused chemicals to come to the surface and turn the new neighbourhood into a toxic waste area.

59
Q

Give an example of a bad type 1 error

A

‘Andrew Wakefield’
Link between MMR vaccine and autism in children
Type 1 error - inferred a casual relationship between receiving a vaccine and developing autism, when no relationship exists.

60
Q

How do you prevent type 1 and type 2 errors?

A

Don’t use statistical tests blind - challenge and test them for robustness
Think about how big a sample you need before collecting data

61
Q

What axies do the independent and dependent variables go on

A
Independent = x-axis (distance from smelter)
Dependent = y-axis (acidity - acidity of lake is caused by distance to smelter)
62
Q

What are the two main tests for correlation?

A

Pearsons product-moment correlation coefficient (interval / ratio data)

Spearmans rank correlation coefficient (ordinal data)

63
Q

What is a misleading statistic?

A

Misuse of numerical data (whether purposeful or not)

Results in misleading the reader

64
Q

What’s the difference between purposeful and selective bias?

A

Purposeful bias deliberately influences data by omission or adjustment. Selective bias is deliberately sampling certain demographic and/or misrepresenting the sample

65
Q

What is data fishing?

A

Large volumes of data are analysed to explore relationships and potential correlations
Undertaken without initial hypothesis which makes it misuse

66
Q

What does correlation tell us?

A

If there is a statistically significant relationship between two variables x and y

67
Q

What is a residual?

A

It is unlikely that the regression line will fit through all of the observed values
The predicted value of the dependent variable (ŷ) will be different from the observed value (y) for any particular value of x.
These deviations (ŷ-y) are known as residuals from the regression.
Smaller the residual - better the fit

The difference between the line and the data points (predicted and actual values of y)

68
Q

Homoscedasticity ….

What are the differences between homoscedastic and heterscedastic data?

A

Homoscedastic = variation around the fitted line is the same at all points along it

Heteroscedastic = variation around the fitted line varies

69
Q

What’s the difference between correlation and regression?

A

Correlation tests relationship between two variables,

Regression requires us to identify independent and dependent variables

70
Q

What is the f statistic used for in regression ?

A

To determine if the overall fit is significant