Revision Flashcards
4 parts of statistics -
What is descriptive statistics?
Summarising data usefully
4 parts of statistics -
What is inference statistics?
(Interpolation) Measured data telling us things about unmeasured data
4 parts of statistics -
What is significance statistics?
Are the data collected or analysis made meaningful?
P-value
4 parts of statistics -
What is prediction statistics?
(Extrapolation) What does the data we have lead us to expect in different situations?
Probability distributions - what’s on the x and y axis?
X-axis = outcomes Y-axis = probability
State the characteristics of the nominal measurement scale (10% of marks)
Nominal data are where individuals have been categorised.
An example -
- Data on first languages of students on the geography course.
- There’s no inherent order to these categories.
- A single nominal variable can only have one value (one first language).
State the characteristics of the ordinal measurement scale (10% of marks)
Individuals ranked according to criterion / individuals ranked into sorted categories
No standard value for the difference between the ranks they’re just 1st 2nd and 3rd.
Example - top Welsh Universities of 2019
1st = Swansea University
2nd = Aberystwyth University
3rd = Bangor University
4th = Cardiff University
State the characteristics of the interval measurement scale (10% of marks)
Numerical measurement data that has an arbitrary origin
Examples include - temperature scales of degrees Celsius or Fahrenheit. - pH values in a lake.
Both data sets can go below zero.
State the characteristics of the ratio measurement scale (10% of marks)
Numerical measurement data that has a meaningful origin where 0 means zero.
Examples include lengths and quantities eg metres or amount of people.
Amounts can be doubled and it’s twice as much
(2m doubled is 4 metres)
What are data?
- what’s a population
A whole body of individuals of whom we are interested
What are data?
- what are the individuals?
Individuals of the population
Eg- the towns in a country
Each row corresponds to an individual
Each column corresponds to a variable
What are data?
- what are the variables?
Variables are the amount of schools / population / amount of cars per household in the town being measured
Each row corresponds to an individual
Each column corresponds to a variable
What are data?
- what is a sample?
A collection of individuals drawn from a population.
It’s is rarely practical to obtain data for a whole population.
What are the measures of central tendency
Mean median and mode
What are the measures of dispersion
(Simple) range Inter-quartile range Standard deviation (and variance) Skewness Kurtosis
What is the mean
The sum of values in a data set divided by the number of observations
What is the median
The middle observation / average of the two middle observations
What is the mode
The value that has the highest frequency
What is the range
The difference between the smallest and largest values in a dataset
What is the inter-quartile range?
The difference between the lowest quarter and highest quarter of ranked values in a dataset
What’s the standard deviation
It measures the dispersion around the mean
What is the variance
Square of the standard deviation
Often used to compare variables measured in different units
What is the Skewness
Indicates how a dataset is distributed about the central value - how symmetrical is the distribution?
Helpful to decide if the data is useful for a parametric test or not
What is the kurtosis
Measures the extent to which data are concentrated in one part of the frequency distribution - how peaky is the distribution?
Explain what the relationship between the mean, median and mode of a dataset reveals about its skewness. (20% of marks)
If mean > median > mode, then the skew is positive (to the right)
If mean = median = mode, then the skewness = 0 (AKA symmetrical)
If mean < median < mode, then the skew is negative (to the left)
What are the different types of kurtosis?
Positive kurtosis = leptokurtic (taller, narrower) (kurtosis > 0)
Zero kurtosis = mesokurtic (normal distribution) (kurtosis = 0)
Negative kurtosis = platykurtic (lower, wider) (kurtosis < 0)
Explain the common elements of statistical tests
- a question about the data
- hypothesis H1 or H0 (default = null hypothesis)
- tests may be one-tailed or two-tailed
- the test gives us a significance level for the answer
- allows us to say how confident we are in the result
What is significance?
Significance = (p-value) is the probability that the result is due to chance / the null hypothesis is true
OR if the null hypothesis were true, how likely would the observed outcome be?
Therefore we want the p-value / significance to be small if we want to reject the null hypothesis
Typically we want is less than 0.05 or even 0.01