Year 1 - Statistics Flashcards

1
Q

1.1 What is a census?

A

A census observes or measures every member of a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

1.1 What is a sample?

A

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

1.1 What are the advantages and disadvantage of a census?

A

Adv- Completely accurate result

Disadv- Time consuming, expensive, cannot be used when testing process destroys item, hard to process as large quantities of data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

1.1 What are the advantages and disadvantages of a sample?

A

Adv- time-efficient, fewer people have to respond, less data to process

Disadv- Not as accurate, sample not large enough to give information about sub-groups of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

1.2 What are the three methods of random samplng?

A

Simple random sampling - every member has an equal chance of being selected

Systematic sampling - required elements chosen at regular intervals from an ordered list

Stratified sampling - population is divided into mutually exclusive groups (e.g. males & females) and a random sample is taken from each

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

1.2 What are the advantages and disadvantages of simple random sampling?

A

Adv- No bias, easy & cheap to do for small samples, each sampling unit has an equal chance

Disadv - Not suitable when population is large as time-consuming, sampling frame is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

1.2 What are the advantages and disadvantages of systematic sampling?

A

Adv- Simple & quick, suitable for large samples & populations

Disadv- Sampling frame needed, can introduce bias if sampling frame is not random

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

1.2 What are the advantages and disadvantages of stratified sampling?

A

Adv- Accurately reflects population structure, guarantees proportional representation of groups

Disadv- Population classified into distinct groups (strata), selection within each stratum suffers with same disadvantages as simple random sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

1.3 What is quota sampling?

A

A researcher selects a sample that reflects the characteristics of the whole population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

1.3 What is opportunity sampling?

A

Taking the sample from people who are available at the time of the study and who fit the criteria of the study

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

1.3 What are the advantages and disadvantages of quota sampling?

A

Adv- Allows a small sample to be representative of the population, no sampling frame, quick, easy, cheap, easy comparison between different groups

Disadv- Non-random sampling can introduce bias, population is divided into groups - costly/inaccurate, increasing scope of study increases no. of groups - time-consuming

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

1.3 What are the advantages and disadvantages of opportunity sampling?

A

Adv- Easy, cheap

Disadv - Unlikely to be representative, highly dependent on individual researcher

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

1.4 What is the difference between qualitative and quantitative data?

A

Qualitative - non-numerical observations

Quantitative - numerical observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

1.4 What is the difference between discrete and continuous data?

A

Discrete - A variable that can only take specific values in a range e.g. shoe size

Continuous - A variable that can take any value in a range e.g. time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

1.5 What are the 8 cities in the large data set?

A

Leuchars, Leeming, Heathrow, Hurn, Camborne, Beijing, Jacksonville, Perth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

1.5 What are the following measured in? Daily mean temp, daily total rainfall, daily total sunshine, daily mean wind direction and windspeed, daily max gust, daily max relative humidity, daily cloud cover, daily mean visibility, daily mean pressure

A

Daily mean temp - degrees Celsius (1dp)
Daily total rainfall - mm (1dp)
Daily total sunshine - tenth of an hour
Daily mean wind direction - Cardinal directions
Daily mean windspeed - Knots (1kn = 1.15mph)
Daily max gust - knots
Daily max relative humidity - percentage of air saturation (%)
Daily cloud cover - oktas (eighths of the sky covered)
Daily mean visibility - Decametres (Dm)
Daily mean pressure - Hectopascals (hPa)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

1.5 What time periods are used in the Large Data Set?

A

May-October 1987 & 2015

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

2.1 What is the formula you can use to calculate the mean from a set of data?

A

x̄ = (Σx)/n where x bar is the mean, x is each data value, and n is the number of data values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

2.1 What is the formula you can use to calculate the mean from a frequency table?

A

x̄ = (Σxf)/(Σf) where x bar is the mean, x is each data value, and f is each frequency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

2.2 How do you find the upper and lower quartiles for discrete data?

A

LQ: divide n by 4, if a whole number then LQ between this data point and one above, if a decimal then round up

UQ: Find 3/4 of n, if a whole number the UQ is between this data point and the one above, if a decimal round up

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

2.2 What is interpolation used for and how do you do it?

A

Used to find the median, quartiles, or percentiles of a grouped frequency table, assuming data values are distributed evenly within each class

Median= LB + ((n-a)/(b-a) x range) where LB is lower bound, n is the middle value, a is the lower frequency bound and b is the upper frequency bound

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

2.3 What is the range, IQR, and interpercentile range?

A

Range - difference between largest and smallest values

IQR - difference between upper and lower quartiles

Interpercentile range - difference two given percentiles

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

2.4 Give the formula for variance

A

((Σx^2)/n)-((Σx)/n)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

2.4 Give the formula for standard variation

A

sqrt(((Σx^2)/n)-((Σx)/n)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

2.4 What is the formula for variance and standard deviation in a frequency table?

A

Variance:
((Σfx^2)/Σf)-((Σfx)/Σf)^2

Standard deviation:
sqrt(((Σfx^2)/Σf)-((Σfx)/Σf)^2)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

2.5 If data is coded, what happens to the mean and the standard deviation when uncoded?

A

Mean: will be affected by any operations (+-x/)

Standard deviation: will only be affected by x or /

27
Q

3.2 What are the 5 aspects of a box plot?

A

Range, Interquartile Range, Lower Quartile (Q1), Median (Q2), Upper Quartile (Q3)

28
Q

3.3 Describe how to plot a cumulative frequency graph

A
  • Add another column to the frequency table labelled ‘cumulative frequency’
  • Plot cumulative frequency on the y-axis and the measurement on the x-axis
  • Plot the first point at the origin
  • Plot each point at the upper bound for each range
  • Join up points with a curve
29
Q

3.4 What kind of data is a histogram used to present?

A

Continuous data

30
Q

3.4 What is the formula for frequency density?

A

Frequency density = frequency/class width

31
Q

3.4 How do you draw a frequency polygon?

A

Join up the middle of the top of each bar of a histogram with a straight line

32
Q

3.4 What is the relationship between the area of each bar in a histogram and the frequency?

A

Area of each bar is proportional to the frequency

33
Q

3.5 What can you comment on when asked to compare data sets? Give the difference between a boxplot and a frequency table

A

Boxplots:
Compare the median and the IQR - describe what effect these values have on the spread/location of data

Frequency tables:
Compare the mean and the standard deviation - describe what effect these values have on the spread/location of data

34
Q

4.1 What is bivariate data?

A

Data that has pairs of values for two variables

35
Q

4.1 What do the x and y axes of a scatter diagrams mean?

A

x-axis - independent variable, also known as explanatory variable

y-axis - dependent variable, also known as response variable

36
Q

4.1 What are the five ways that you can describe a correlation as?

A

Strong positive, weak positive, no correlation, weak negative, strong negative

37
Q

4.1 What is a causal relationship between two variables?

A

When a change in one variable causes a change in the other

38
Q

4.2 What is the equation for a regression line on a scatter diagram? What are the conditions to use it?

A

y = a + bx

If you know the value of the independent variable (x), you can use the regression line to make a prediction of the dependent variable

Only make predictions for the dependent variable NOT the independent variable

39
Q

4.2 What is the difference between interpolation and extrapolation?

A

Interpolation - using values from within the given range of data, is usually accurate

Extrapolation - using values from outside the given range of data, is usually innaccurate

40
Q

5.1 What is a sample space?

A

The set of all possible outcomes

41
Q

5.2 What is the notation for the intersection of a Venn diagram describing events A and B?

42
Q

5.2 What is the notation for the union of a Venn diagram describing events A and B?

43
Q

5.2 What is the notation used when describing a section of a Venn diagram that is ‘Not A’?

44
Q

5.3 What does it mean when two events are mutually exclusive?

A

They cannot happen at the same time

45
Q

5.3 What does it mean when two events are independent?

A

When on event has no effect on the other

46
Q

5.3 What is the multiplication rule for independent events?

A

P(A and B) = P(A) x P(B)

47
Q

5.4 What is another way of calculating P(At least one head) when flipping a coin twice?

A

P(At least one head) = 1 - P(Both Tails)

48
Q

6.1 What is a random variable?

A

A variable whose value depends on the outcome of a random event

49
Q

6.1 What is a sample space?

A

The range of values a random variable can take

50
Q

6.1 What is a probability distribution?

A

It fully describes the probability of any outcome in the sample space

51
Q

6.1 What is the probability mass function of rolling a fair six-sided dice?

A

P(X=x) = 1/6, x=1, 2, 3, 4, 5, 6

52
Q

6.1 What is a discrete uniform distribution? Give an example

A

When all the probabilities in a distribution are the same, e.g. rolling a fair six-sided dice

53
Q

6.1 For a random variable, X, how can you write that all the probabilities of all outcomes of an event add up to 1?

A

ΣP(X=x) = 1 for all x

54
Q

6.2 What do you need in order to model X with a binomial distribution?

A
  • A fixed number of trials, n
  • Two possible outcomes (success and failure)
  • Fixed probability of success, p
  • Trials are independent of each other
55
Q

6.2 How do you write a binomial distribution?

56
Q

6.2 If a random variable X has the binomial distribution X~B(n, p), what is its probability mass function?

A

P(X=r) = nCr x p^r x (1-p)^(n-r)

57
Q

6.3 What is a cumulative probability function for a random variable X?

A

Gives the sum of all the individual probabilities up to and including the given X value, P(X≤x)

58
Q

7.1 Define Test Statistic

A

The statistic that is calculated from the sample, e.g. no. of passes, no. of hands

59
Q

7.1 Define Null Hypothesis

A

H(0): The hypothesis you assume to be correct

60
Q

7.1 Define Alternative Hypothesis

A

H(1): tells you about the parameter if your assumption is shown to be wrong

61
Q

7.1 What is a one-tailed test?

A

H(1): p<… or H(1): p>…

62
Q

7.1 What is a two-tailed test?

A

H(1): p≠…

63
Q

7.1 How do you carry out a hypothesis test?

A
  • You assume the null hypothesis is true
  • Then consider how likely the observed value of the test statistic was to occur.
  • If the likelihood is less than a given threshold (significance level) then you reject the null hypothesis