Stats Flashcards

1
Q

Definition of a population

A

A complete set of data where every element is included

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Definition of a sample

A

A selection of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Definition of a census

A

Every element is surveyed. They are rare as it is expensive and labour intensive

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Definition of a sampling frame

A

A list of sampling units (eg oktas for cloud coverage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Random sampling methods

A

1) Simple random sample (eg random number generator or names out of a hat)
2) Systematic sample (eg all elements numbered. 1st element chosen random then every nth element is chosen eg pick random student 1-10 then every 10th student after this)
3) Stratified sample- proportions of groups in population represented in sample (eg if was 100 boys and 100 girls use 5 random boys and 5 random girls. if 103 boys, 107 girls use 9.8 boys (10) and 10.2 girls (10))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Non-random sampling methods

A

1) Opportunity sampling- whoever is present

2) Quota sampling- certain number from each group then filled first comes first served until quota filled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Risk of non-random sampling

A

Risk of bias/ lack of equal representation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Qualative data

A

Non-numerical data eg colours, tv shows, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Quantative data

A

Numerical data eg height, number of siblings, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Discrete data

A

Type of quantative data.
Can only take certain numbers, usually whole numbers.
Few exceptions eg shoe sizes can have half sizes
Usually comes from counting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Continuous data

A

Type of quantative data.
Can take any decimal value in a certain range.
Usually comes from measuring

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is n/a replaced by in data

A

n/a is ignored

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is tr (trace) replaced by in data

A

0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantages of simple random samplings vs disadvantages

A

+ve:

1) Free of bias
2) Easy and cheap for small samples
3) Each sampling unit has a known and equal chance of selection

  • ve:
    1) Requires a sampling frame
    2) Unsuitable for larger samples/ populations as time consuming, disruptive and expensive
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Advantages of systematic sampling vs disadvantages

A

+ve:

1) Simple and quick
2) suitable for larger samples/ populations

  • ve:
    1) Sampling frame required
    2) Can introduce bias if sampling frame is not random
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Advantages of stratified sampling vs disadvantages

A

+ve:

1) Sample actively refelcts population structure
2) Guarantees representation of all groups within a sample/population

  • ve:
    1) Population must be clearly classifies into distinct groups
    2) Selcection within each stratum can be time consuming/ expensive and requires a sampling frame
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Advantages of quota sampling vs disadvantages

A

+ve:

1) Allows even small samples to be representative of population
2) No sampling frame required
3) Quick, easy and inexpensive
4) Allows for easy comparison between groups in a population

  • ve:
    1) Non-random so can introduce bias
    2) Population must have seperate distincty groups
    3) Can be costly or inaccurate
    4) Non-responses are not recorded as such
    5) Increasing scope of stud/ number of groups adds time and expense
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Advantages of opportunity sampling vs disadvantages

A

+ve:

1) Easy to carry out
2) Inexpensive

  • ve:
    1) Unlikely to provide representaive results
    2) Highly dependent on individual researcher
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

variance=

A

standard deviation^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

variance=

A

{fx^2/n - mean^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

How is standard deviation affected by coding

A

Affected by x and ÷, however not affected by + or -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

How is mean affected by coding

A

Affected by both x and ÷, as well as + and -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Units for daily max temp

A

Degrees Celsius

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Units for daily total rainfall

A

millimetres (mm). If the total amount of rainfall collected is less than 0.05 mm, it is referred to as a trace of rain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Units for daily total sunshine

A

given in hours and to one decimal place

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Units for Daily Maximum Relative Humidity

A

Values for this are recorded as percentages (%). Relative humidities above 95% are associated with mist
and fog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Units for daily mean winspeed and daily max gust speed

A

The daily mean windspeed is given in knots. 1 knot is 1.15 mph. The windspeeds are also categorised according to the Beaufort scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Units for Daily Mean Wind Direction and daily max gust direction

A

The value is given in degrees relative to the true north

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Units for cloud cover

A

It is measured in eighths. The technical unit used in this case is called oktas.
0 oktas indicates a completely clear sky, while 8 oktas indicates complete overcast.

30
Q

Units for visibility

A

Metres (greatest horizontal distance at which an object can be seen)

31
Q

Units for pressure

A

hectopascals (hPa).

32
Q

Warmest/ highest temp places in large data set

A

Jacksonville (24.8)
Beijing (22.6)
As are in northern hemisphere so closest to equator

Not perth as is in Southern hemisphere

33
Q

Warmest locations in UK from large data set

A

Base on mean temperature: Heathrow (15.6), Hurn (14.1), Cambourne (13.6),

34
Q

Coldest location

A

Perth (15.2) outside UK

Leuchars (12.2) including UK

35
Q

Most rainfall

A

Based on mean rainfall:
Cambourne, UK - 2.8mm
Jacksonville, World wide -

36
Q

Dryest places

A

Heathrow, UK - 1.8mm

Heathrow, World wide - 1.8mm (If just B, J & P then Beijing at 2.1mm)

37
Q

Interpolation eqn

A

Lower bound + (how far through group/group frequency) x class width

38
Q

Histograms freqeuncy eqn

A
Frequency= class width x frequency denisty
Area of each bar represents the frequency of the group
39
Q

When does a product moment correlation coefficient suggest there is a linear relationship between 2 variables

A

When is close to 1 (+ve linear) or close to -1 (-ve linear)

40
Q

P(AnB) for mutually exclusive events=

A

p(A) + p(B)

41
Q

Rules to check if 2 events are statistically independent (2 rules)

A

1) P(AnB) = P(A) x P(B)

2) P(AlB) = P(a)

42
Q

n vs u meaning in probability

A
P(XnY)= prob of x and y (intersection of x and y)
p(XuY)= prob of x or y
43
Q

How to find lower quartile for discrete data

A

n/4th data

If give a 0.5 number then round up to nearest whole observation

44
Q

Discrete uniform distribution meaning

A

When probability of each potential outcome is equal

45
Q

P(X=r) for x-b(n,p)

A

NCr x p^r x (1-p)^(n-r)

46
Q

P(A’)=

A

1- P(A)

47
Q

If mutual exclusive P(AuB)=

A

P(A) + P(B)

48
Q

P(AlB)=

A

P(AnB)/ P(B)

49
Q

Addition rule of P(AuB)

A

P(AuB)= P(A) + P(B) - P(AnB)

50
Q

Addition rule of P(AnB)

A

P(AnB) = P(A) + P(B) - P(AuB)

51
Q

Normal distribution eqn

A

X-N(mean, variance)

52
Q

Standard normal substitution for Z (used to find unknow values for mean/ standard distribution)

A

Z= (X- mean) / standard deviation

53
Q

Whicj=h may must inequality face for inverse normal distribution

A

< (same as binomial)

54
Q

Median of a normal distribution

A

for a normal distribution, mean = median = mode.

55
Q

What is the sign of the area for inverse normal on calc function

A

Uses prob less than observed value

56
Q

How many decimal places to use for Z values in normal dist question as this is how given in table of z values

A

give to 4dp

57
Q

How to tell if data has a +ve or -ve skew

A

+ve skew= mean> median

  • ve skew: meanQ2-Q1
  • ve skew Q2-Q1>Q3-Q2
58
Q

Formula for coefficient of skewness

A

Coefficient of skewness= 3(mean-median) / standard deviation

59
Q

What skew is required for a normal distribution

A

No skew/ slight skew/ almost symetrical distribution

60
Q

Sxx=

A

{x^2 - (({x)^2 / n ) = variance x n = in formula book

61
Q

Outlier eqns

A

Any value GREATER than: Q3 + k(Q3-Q1)
Any vlaue LOWER than: Q1 - k(Q3-Q1)
If K not given use k=1.5

62
Q

In question were says use log

A

use log_10 (use log button on calc)

63
Q

Q3=

A

3n/4

64
Q

Q1=

A

n/4

65
Q

y~ = y bar above =

A

mean of y

66
Q

Sx =

A

different version of st dev

67
Q

Sxx=

A

Variance /n

68
Q

standar dev =

A

square root of (Sxx/n)

69
Q

If y= ax^n for constants a and n then

A

y= log(a) + nlog(x)

70
Q

If y= kb^x for constants k and b then

A

log (y) = log (x) + xlog(b)

71
Q

Reasons to use a histogram

A

1) Continuous data

2) Unequal class widths