Statistics Flashcards

(62 cards)

1
Q

What value should you use if a there is a trace amount of rainfall?

A

You should treat it as 0.025mm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

International locactions have more/less data than those in the UK.

A

less - they have limited data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you clean the data? (4)

A
  • Missing data (n/a or -) should be removed (e.g. not used in the mean?)
  • A value is assigned to trace
  • Find and exclude any anomalies due to errors
  • Make sure all values are given to the same number of decimal places/significant figures (generally already done)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What order are the UK cities in?

A

From south to north they are in alphabetical order, apart from Heathrow and Hurn (which are switched around)

Cambourne, Hurn, Heathrow, Leeming Leuchars

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

As we move further north, during May to October, the maximum hours of sunshine ___

A

increases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is daily maximum relative humidity presented?

A

Percentages given to the nearest integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Above what percentage of daily maximum relative humidity do you get mist and fog?

A

95%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is dailt mean windspeed measured in?

A

Knots to the nearest integer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

1 knot = ?

A

1.15mph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is daily mean direction measured in?

A
  • Degrees clockwise from the north (like bearings) rounded to the nearest 10°
  • Cardinal directions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Wind/gust/cardinal direction refers to the direction the wind is blowing ____.

A

from

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Beaufort conversion daily mean windspeed

A

A discrete 13 point scale from 0 (calm) to 12 (hurricane)

In the LDS, there is light from 1 – 3, moderate at 4 and fresh at 5.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

less windy to more windy on the Beaufort scale

A

light, moderate, fresh

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

daily maxiumum gust

A

The highest instantaneous windspeed recorded, measured in knots

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

daily maximum gust direction

A

The direction of the maximum gust of wind recorded

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

pressure units

A

hPa (hectopascals)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

1hPa = ?

2 conversions

A

100 Pa or 1 millibar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Approximate low pressure

A

< 990 - 1000 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Approximate average air pressure (at sea level)

A

1013 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Approximate high pressure

A

1025 hPa

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

visibility

A

The greatest distance that an object can be seen and recognized in daylight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

visibility units

A

Dm (decametres)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

1 Dm = ?

A

10 m

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What qualitative data is found in the large data set?

A

Beaufort scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Calculate the lower quartile for: 9, 9, 10, 11, 12, 12, 12, 13, 14
Q1 position = 9 * 1/4 = 2.25, this rounds up to the 3rd pos Q1 = 10 **You round up to the next position, even if it's only 0.25**
26
Calculate the upper quartile for: 7, 9, 9, 10, 10, 11, 12, 12, 12, 13, 14, 14, 15, 16, 16
Q3 position = 12 * 3/4 = 9, this rounds up to the 9.5th pos (mean of 9th and 10th positions) Q3 = 12 + 13 / 2 = 12.5 **You round up to the next position**
27
interpercentile range
the difference between the values for two given percentiles
28
Describe a correlation vs. interpret a correlation between variables
Describe = weak/strong positive/negative or no correlation Interpret = as (e.g. the rainfall) increases, the (e.g. sunshine hours) decreases, **but you must be specific to the question**
29
Bivariate data
Data which has pairs of values for two variables
30
How do you represent bivariate data?
On a scatter diagram
31
Regression line
The straight line that minimises the sum of the squares of the distances of each data point from it | Another name for the least squares regression line ## Footnote Essentially it's the 'best' line of best fit
32
The equation of the regression line of GMP (y) on energy consumption (x) is y = 225 + 12.9x. An economist uses this regression equation to estimate the energy consumption of a country with a Gross National Product of 3500. Give one reason why this may not be a valid estimate.
The regression equation should only be used to predict a value of GNP (y) given the energy consumption (x).
33
In maths, can you extrapolate data?
No - you can only make valid estimates for values of x within the range of the data set
34
standard deviation definition
The average variability in the data set; i.e. how spread apart the data is from the mean.
35
The data (represented by x) is coded using the formula y = (x - a) / b How do you get from the coded data's mean to the original data's mean?
Coded data (y) * b + a
36
The data (represented by x) is coded using the formula y = (x - a) / b How do you get from the coded data's standard deviation to the original data's standard deviation?
Coded data (y) * b
37
You are given a grouped frequency table showing the time taken for children to finish a race. How do you calculate an estimate for the standard deviation of the length of the children's races?
* Fix the class widths (if needed) * Calculate the total frequency (f) * Find the midpoints (x) * Multiply the midpoints by the frequency (fx) * Square the midpoints and multiply by the frequency (fx2) * Calculate the total fx and fx2 * Calculate the standard deviation: √(sum of fx2 / f) - (sum of fx / f)2 | You can use the table function in your calculator to speed things up
38
Advantage of box plots
Can easily compare multiple different groups
39
Disadvantage of box plots
Doesn't show trends as easily as e.g. scatter graphs
40
cleaning the data
removing anomalies from a data set
41
experiment
A repeatable process that gives lots of outcomes
42
event
A collection of one or more outcomes
43
another way of saying "not A"
complement of A
44
mutually exclusive events
When events have no outcomes in common; they can't happen at the same time so the circles don't overlap in Venn diagrams
45
independent events
When one event has no effect on another; the probability of A occuring is the same whether or not B happens
46
How can you calculate mutually exclusive events?
P(A or B) = P(A) + P(B)
47
How can you calculate independent events?
P(A and B) = P(A) × P(B).
48
A sample of 10 children is taken. 4 children have a height between 80 and 90cm. Estimate how many have a height between 80 and 85cm, and state one assumption you made.
5/10 * 4 = 2 The children's heights are **uniformally** distributed in the 80 < h < 90cm class.
49
P(X = x) = 2 / (x2), x = 2, 3, 4 Explain how you know that Marie’s function does not describe a probability distribution.
The sum of the probabilities does not equal 1.
50
When can you model X with a binomial distribution? What needs to happen? (4)
* There must be a fixed number of trials (n) * There must be two set outcomes * There must be a constant probability of success * Each trial is independent of one another | These are assumptions you make when modelling a binomial distribution.
51
probability mass function
A function over the sample space of a **discrete** random variable which gives the probability that X is equal to a certain value. Can be presented as a function, table or graph written as e.g. P(X = x) = 1/6
52
probability distribution
A function that describes the probability of any outcome in the sample space. It can be represented as a function, table or diagram.
53
test statistic (+ example)
The result of an experiment or the statistic that is calculated from the sample e.g. the number of heads out of 10 trials
54
population parameter
The probability of something occuring in the hypothesis
55
hypothesis
A statement made about the value of a population parameter
56
A researcher asks some people whether they shop with their own carrier bag. 17 out of 25 people sampled said they do. They want to test, at the 5% significance level, whether over 60% of shoppers try to be sustainable by using their own carrier bag. Explain the condition under which the null hypothesis would be rejected.
H0: p = 0.6 H1: p > 0.6 The null hypothesis would be rejected when the probability of 17 or more people from a sample of 25 using their own carrier bag is less than 0.05, **given that p = 0.6**.
57
critical value
The first value to fall inside of the critical region
58
significance level
The probability (usually given as a percentage) of rejecting the null hypothesis, when in fact it is true
59
actual significance level
The probability of the test statistic falling within the critical region, given that H0 is true
60
How does the actual significance level differ to the tested significance level (threshold probability)?
They are the same for continuous data but may differ for discrete data
61
How can you find which tail a test statistic lies in a two-tailed test?
X ~ B(n,p) n * p is the expected probability If **x** < np, then you consider P(X ≤ p) If **x** > np, then you consider P(X ≥ p)
62
You can find which tail a test statistic lies in a two-tailed test, so you don't have to test both tails. Explain why this works. ## Footnote for understanding
X ~ B(n,p) n * p is the expected probability If **x** < np, then you consider P(X ≤ p) This means that the thing being tested occured less than expected, so you need to find the lower critical value to see if the test statistic is low enough to fall within the lower critical region or not. (The higher critical region is somewhere off in the far distance - it's far too common) If **x** > np, then you consider P(X ≥ p) This means that the thing being tested occured more than expected, so you need to find the higher critical value to see if the test statistic is high enough to fall within the lower critical region or not. (The lower critical region is somewhere off in the far distance - it's far too uncommon)