Statistics Flashcards

1
Q

What value should you use if a there is a trace amount of rainfall?

A

You should treat it as 0.025mm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

International locactions have more/less data than those in the UK.

A

less - they have limited data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you clean the data? (4)

A
  • Missing data (n/a or -) should be removed (e.g. not used in the mean?)
  • A value is assigned to trace
  • Find and exclude any anomalies due to errors
  • Make sure all values are given to the same number of decimal places/significant figures (generally already done)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What order are the UK cities in?

A

From south to north they are in alphabetical order, apart from Heathrow and Hurn (which are switched around)

Cambourne, Hurn, Heathrow, Leeming Leuchars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

As we move further north, during May to October, the maximum hours of sunshine ___

A

increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is daily maximum relative humidity presented?

A

Percentages given to the nearest integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Above what percentage of daily maximum relative humidity do you get mist and fog?

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is dailt mean windspeed measured in?

A

Knots to the nearest integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

1 knot = ?

A

1.15mph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is daily mean direction measured in?

A
  • Degrees clockwise from the north (like bearings) rounded to the nearest 10°
  • Cardinal directions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Wind/gust/cardinal direction refers to the direction the wind is blowing ____.

A

from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Beaufort conversion daily mean windspeed

A

A discrete 13 point scale from 0 (calm) to 12 (hurricane)

In the LDS, there is light from 1 – 3, moderate at 4 and fresh at 5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

less windy to more windy on the Beaufort scale

A

light, moderate, fresh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

daily maxiumum gust

A

The highest instantaneous windspeed recorded, measured in knots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

daily maximum gust direction

A

The direction of the maximum gust of wind recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

pressure units

A

hPa (hectopascals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

1hPa = ?

2 conversions

A

100 Pa or 1 millibar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Approximate low pressure

A

< 990 - 1000 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Approximate average air pressure (at sea level)

A

1013 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Approximate high pressure

A

1025 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

visibility

A

The greatest distance that an object can be seen and recognized in daylight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

visibility units

A

Dm (decametres)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

1 Dm = ?

A

10 m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What qualitative data is found in the large data set?

A

Beaufort scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Calculate the lower quartile for:
9, 9, 10, 11, 12, 12, 12, 13, 14

A

Q1 position = 9 * 1/4 = 2.25, this rounds up to the 3rd pos
Q1 = 10

You round up to the next position, even if it’s only 0.25

26
Q

Calculate the upper quartile for:
7, 9, 9, 10, 10, 11, 12, 12, 12, 13, 14, 14, 15, 16, 16

A

Q3 position = 12 * 3/4 = 9, this rounds up to the 9.5th pos (mean of 9th and 10th positions)
Q3 = 12 + 13 / 2 = 12.5

You round up to the next position

27
Q

interpercentile range

A

the difference between the values for two given percentiles

28
Q

Describe a correlation vs. interpret a correlation between variables

A

Describe = weak/strong positive/negative or no correlation
Interpret = as (e.g. the rainfall) increases, the (e.g. sunshine hours) decreases, but you must be specific to the question

29
Q

Bivariate data

A

Data which has pairs of values for two variables

30
Q

How do you represent bivariate data?

A

On a scatter diagram

31
Q

Regression line

A

The straight line that minimises the sum of the squares of the distances of each data point from it

Another name for the least squares regression line

Essentially it’s the ‘best’ line of best fit

32
Q

The equation of the regression line of GMP (y) on energy consumption (x) is y = 225 + 12.9x.
An economist uses this regression equation to estimate the energy consumption of a country with a Gross National Product of 3500.
Give one reason why this may not be a valid estimate.

A

The regression equation should only be used to predict a value of GNP (y) given the energy consumption (x).

33
Q

In maths, can you extrapolate data?

A

No - you can only make valid estimates for values of x within the range of the data set

34
Q

standard deviation definition

A

The average variability in the data set; i.e. how spread apart the data is from the mean.

35
Q

The data (represented by x) is coded using the formula y = (x - a) / b
How do you get from the coded data’s mean to the original data’s mean?

A

Coded data (y) * b + a

36
Q

The data (represented by x) is coded using the formula y = (x - a) / b
How do you get from the coded data’s standard deviation to the original data’s standard deviation?

A

Coded data (y) * b

37
Q

You are given a grouped frequency table showing the time taken for children to finish a race. How do you calculate an estimate for the standard deviation of the length of the children’s races?

A
  • Fix the class widths (if needed)
  • Calculate the total frequency (f)
  • Find the midpoints (x)
  • Multiply the midpoints by the frequency (fx)
  • Square the midpoints and multiply by the frequency (fx2)
  • Calculate the total fx and fx2
  • Calculate the standard deviation:
    √(sum of fx2 / f) - (sum of fx / f)2

You can use the table function in your calculator to speed things up

38
Q

Advantage of box plots

A

Can easily compare multiple different groups

39
Q

Disadvantage of box plots

A

Doesn’t show trends as easily as e.g. scatter graphs

40
Q

cleaning the data

A

removing anomalies from a data set

41
Q

experiment

A

A repeatable process that gives lots of outcomes

42
Q

event

A

A collection of one or more outcomes

43
Q

another way of saying “not A”

A

complement of A

44
Q

mutually exclusive events

A

When events have no outcomes in common; they can’t happen at the same time so the circles don’t overlap in Venn diagrams

45
Q

independent events

A

When one event has no effect on another; the probability of A occuring is the same whether or not B happens

46
Q

How can you calculate mutually exclusive events?

A

P(A or B) = P(A) + P(B)

47
Q

How can you calculate independent events?

A

P(A and B) = P(A) × P(B).

48
Q

A sample of 10 children is taken. 4 children have a height between 80 and 90cm. Estimate how many have a height between 80 and 85cm, and state one assumption you made.

A

5/10 * 4 = 2
The children’s heights are uniformally distributed in the 80 < h < 90cm class.

49
Q

P(X = x) = 2 / (x2), x = 2, 3, 4
Explain how you know that Marie’s function does not describe a probability distribution.

A

The sum of the probabilities does not equal 1.

50
Q

When can you model X with a binomial distribution? What needs to happen? (4)

A
  • There must be a fixed number of trials (n)
  • There must be two set outcomes
  • There must be a constant probability of success
  • Each trial is independent of one another

These are assumptions you make when modelling a binomial distribution.

51
Q

probability mass function

A

A function over the sample space of a discrete random variable which gives the probability that X is equal to a certain value. Can be presented as a function, table or graph

written as e.g. P(X = x) = 1/6

52
Q

probability distribution

A

A function that describes the probability of any outcome in the sample space.

It can be represented as a function, table or diagram.

53
Q

test statistic (+ example)

A

The result of an experiment or the statistic that is calculated from the sample e.g. the number of heads out of 10 trials

54
Q

population parameter

A

The probability of something occuring in the hypothesis

55
Q

hypothesis

A

A statement made about the value of a population parameter

56
Q

A researcher asks some people whether they shop with their own carrier bag. 17 out of 25 people sampled said they do.
They want to test, at the 5% significance level, whether over 60% of shoppers try to be sustainable by using their own carrier bag.

Explain the condition under which the null hypothesis would be rejected.

A

H0: p = 0.6
H1: p > 0.6

The null hypothesis would be rejected when the probability of 17 or more people from a sample of 25 using their own carrier bag is less than 0.05, given that p = 0.6.

57
Q

critical value

A

The first value to fall inside of the critical region

58
Q

significance level

A

The probability (usually given as a percentage) of rejecting the null hypothesis, when in fact it is true

59
Q

actual significance level

A

The probability of the test statistic falling within the critical region, given that H0 is true

60
Q

How does the actual significance level differ to the tested significance level (threshold probability)?

A

They are the same for continuous data but may differ for discrete data

61
Q

How can you find which tail a test statistic lies in a two-tailed test?

A

X ~ B(n,p)
n * p is the expected probability
If x < np, then you consider P(X ≤ p)
If x > np, then you consider P(X ≥ p)

62
Q

You can find which tail a test statistic lies in a two-tailed test, so you don’t have to test both tails. Explain why this works.

for understanding

A

X ~ B(n,p)
n * p is the expected probability

If x < np, then you consider P(X ≤ p)
This means that the thing being tested occured less than expected, so you need to find the lower critical value to see if the test statistic is low enough to fall within the lower critical region or not. (The higher critical region is somewhere off in the far distance - it’s far too common)

If x > np, then you consider P(X ≥ p)
This means that the thing being tested occured more than expected, so you need to find the higher critical value to see if the test statistic is high enough to fall within the lower critical region or not. (The lower critical region is somewhere off in the far distance - it’s far too uncommon)