Revision Flashcards

Question

Explain what the relationship between the mean, median and mode of a dataset reveals about its skewness. (20% of marks)

Answer 1

If mean > median > mode, then the skew is positive (to the right) If mean = median = mode, then the skewness = 0 (AKA symmetrical) If mean < median < mode, then the skew is negative (to the left)

Answer 2

Positive kurtosis = leptokurtic (taller, narrower) (kurtosis > 0) Zero kurtosis = mesokurtic (normal distribution) (kurtosis = 0) Negative kurtosis = platykurtic (lower, wider) (kurtosis < 0)

Answer 3

- a question about the data - hypothesis H1 or H0 (default = null hypothesis) - tests may be one-tailed or two-tailed - the test gives us a significance level for the answer - allows us to say how confident we are in the result

Answer 4

Significance = (p-value) is the probability that the result is due to chance / the null hypothesis is true OR if the null hypothesis were true, how likely would the observed outcome be? Therefore we want the p-value / significance to be small if we want to reject the null hypothesis Typically we want is less than 0.05 or even 0.01

Answer 5

The probability that the result isn’t due to chance, expressed as a %. Can be calculated by subtracting the significance from 1 and multiplying by 100. Decide to start at 0.05 significance (95% confidence) then go down to 99% confidence if possible

Answer 6

T-test | Correlation

Answer 7

Useful if we think we can see a pattern in the data at the offset It’s a more stringent test therefore more useful

Answer 8

``` Variables with normal distributions More powerful than non-parametric tests Examples - T-test Correlation Regression ```

Answer 9

Applies to many quantities where values are clustered around a mean value Many variables in geography are (assumed to be) normally-distributed This is when you use parametric statistical tests

Answer 10

68% within 1 SD 95% within 2 SD 99% within 3 SD

Answer 11

What are your research questions? What’s the population? What are the variables? What type of data? (Nominal, ordinal, scale (interval or ratio)) Always start with scale data first if possible What statistical tests are appropriate?

Answer 12

Chi squared??

Answer 13

``` Null hypothesis (H0) = no significant difference Alternative hypothesis (H1) = there is a significant difference Always assume the null hypothesis is correct ```

Answer 14

A measure of how close the result of the experiment is to the true value (lack of bias) - therefore it is a measure of correctness of the result

Answer 15

A measure of how well the result has been determined, without reference to its agreement with the true value - it is a measure of the reproducibility of the result

Answer 16

When the sampling method over or under -represents particular characteristics of the population.

Answer 17

- All individuals have the same chance of inclusion | - The inclusion of a given individual should not affect the chance of selection of any other individual

Answer 18

Mann Whitney u test Kruscal wallace test Chi squared

Answer 19

Quantifies the probable relationship between the sample mean and the population mean Quantifies the width of the sampling distribution If we measure a particular sample mean, how close to the population mean is that likely to be? Equivalent to the standard deviation of the sampling distribution - The standard error is the sampling distributions own special version of the standard deviation

Answer 20

Pearsons product moment correlation coefficient (for interval and ratio data) Spearmans rank correlation coefficient (for ordinal data)

Answer 21

As x increases, y increases

Answer 22

As x changes, y changes

Answer 23

x and y are positively / negatively correlated

Answer 24

1) comparing the means of sample and population (allows us to calculate using an estimate of the population mean) 2) comparing means of two samples

Answer 25

William sealy gosset (1876-1937) | Employed by Guinness

Answer 26

Use the sample

Answer 27

The difference between the means, scaled by an estimate of the standard error Gives us a measure of the overlap between the two samples Can be positive or negative

Answer 28

Plot of quantities (or proportions) of a variables distribution against the quantities (or proportions) of any of a number of test distributions

Answer 29

Whether the distribution of a variance matches a given distribution If the selected variable matches the test distribution, the points cluster around a straight line

Answer 30

Calculated from the standard error It’s the range of values in which we are confident the true population mean lies Imagine we want to be 95% confident the true mean lies within our confidence interval. This means there must only be a 5% chance that it lies outside the interval.

Answer 31

If we conclude there is a relationship / pattern / presence where none exists (false positive)

Answer 32

If we conclude there is no relationship / pattern / presence where in fact one does exist (false negative)

Answer 33

We would prefer to make a type 2 error than a type 1 error (dependent on the scenario)

Answer 34

'Love canal' Land contamination Concluded there was no contamination when there was in fact contamination. Former landfill site 22,000 tonnes of chemical waste, heavy rain in 1977 caused chemicals to come to the surface and turn the new neighbourhood into a toxic waste area.

Answer 35

'Andrew Wakefield’ Link between MMR vaccine and autism in children Type 1 error - inferred a casual relationship between receiving a vaccine and developing autism, when no relationship exists.

Answer 36

Don’t use statistical tests blind - challenge and test them for robustness Think about how big a sample you need before collecting data

Answer 37

``` Independent = x-axis (distance from smelter) Dependent = y-axis (acidity - acidity of lake is caused by distance to smelter) ```

Answer 38

Pearsons product-moment correlation coefficient (interval / ratio data) Spearmans rank correlation coefficient (ordinal data)

Answer 39

Misuse of numerical data (whether purposeful or not) | Results in misleading the reader

Answer 40

Purposeful bias deliberately influences data by omission or adjustment. Selective bias is deliberately sampling certain demographic and/or misrepresenting the sample

Answer 41

Large volumes of data are analysed to explore relationships and potential correlations Undertaken without initial hypothesis which makes it misuse

Answer 42

If there is a statistically significant relationship between two variables x and y

Answer 43

It is unlikely that the regression line will fit through all of the observed values The predicted value of the dependent variable (ŷ) will be different from the observed value (y) for any particular value of x. These deviations (ŷ-y) are known as residuals from the regression. Smaller the residual - better the fit The difference between the line and the data points (predicted and actual values of y)

Answer 44

Homoscedastic = variation around the fitted line is the same at all points along it Heteroscedastic = variation around the fitted line varies

Answer 45

Correlation tests relationship between two variables, | Regression requires us to identify independent and dependent variables

Answer 46

To determine if the overall fit is significant

Revision Flashcards

(70 cards)