Year 1 - Statistics Flashcards

Question 1

Q

1.1 What is a census?

Answer

A

A census observes or measures every member of a population

Question 2

Q

1.1 What is a sample?

Answer

A

A selection of observations taken from a subset of the population which is used to find out information about the population as a whole.

Question 3

Q

1.1 What are the advantages and disadvantage of a census?

Answer

A

Adv- Completely accurate result

Disadv- Time consuming, expensive, cannot be used when testing process destroys item, hard to process as large quantities of data

Question 4

Q

1.1 What are the advantages and disadvantages of a sample?

Answer

A

Adv- time-efficient, fewer people have to respond, less data to process

Disadv- Not as accurate, sample not large enough to give information about sub-groups of the population

Question 5

Q

1.2 What are the three methods of random samplng?

Answer

A

Simple random sampling - every member has an equal chance of being selected

Systematic sampling - required elements chosen at regular intervals from an ordered list

Stratified sampling - population is divided into mutually exclusive groups (e.g. males & females) and a random sample is taken from each

Question 6

Q

1.2 What are the advantages and disadvantages of simple random sampling?

Answer

A

Adv- No bias, easy & cheap to do for small samples, each sampling unit has an equal chance

Disadv - Not suitable when population is large as time-consuming, sampling frame is needed

Question 7

Q

1.2 What are the advantages and disadvantages of systematic sampling?

Answer

A

Adv- Simple & quick, suitable for large samples & populations

Disadv- Sampling frame needed, can introduce bias if sampling frame is not random

Question 8

Q

1.2 What are the advantages and disadvantages of stratified sampling?

Answer

A

Adv- Accurately reflects population structure, guarantees proportional representation of groups

Disadv- Population classified into distinct groups (strata), selection within each stratum suffers with same disadvantages as simple random sampling

Question 9

Q

1.3 What is quota sampling?

Answer

A

A researcher selects a sample that reflects the characteristics of the whole population

Question 10

Q

1.3 What is opportunity sampling?

Answer

A

Taking the sample from people who are available at the time of the study and who fit the criteria of the study

Question 11

Q

1.3 What are the advantages and disadvantages of quota sampling?

Answer

A

Adv- Allows a small sample to be representative of the population, no sampling frame, quick, easy, cheap, easy comparison between different groups

Disadv- Non-random sampling can introduce bias, population is divided into groups - costly/inaccurate, increasing scope of study increases no. of groups - time-consuming

Question 12

Q

1.3 What are the advantages and disadvantages of opportunity sampling?

Answer

A

Adv- Easy, cheap

Disadv - Unlikely to be representative, highly dependent on individual researcher

Question 13

Q

1.4 What is the difference between qualitative and quantitative data?

Answer

A

Qualitative - non-numerical observations

Quantitative - numerical observations

Question 14

Q

1.4 What is the difference between discrete and continuous data?

Answer

A

Discrete - A variable that can only take specific values in a range e.g. shoe size

Continuous - A variable that can take any value in a range e.g. time

Question 15

Q

1.5 What are the 8 cities in the large data set?

Answer

A

Leuchars, Leeming, Heathrow, Hurn, Camborne, Beijing, Jacksonville, Perth

Question 16

Q

1.5 What are the following measured in? Daily mean temp, daily total rainfall, daily total sunshine, daily mean wind direction and windspeed, daily max gust, daily max relative humidity, daily cloud cover, daily mean visibility, daily mean pressure

Answer

A

Daily mean temp - degrees Celsius (1dp)
Daily total rainfall - mm (1dp)
Daily total sunshine - tenth of an hour
Daily mean wind direction - Cardinal directions
Daily mean windspeed - Knots (1kn = 1.15mph)
Daily max gust - knots
Daily max relative humidity - percentage of air saturation (%)
Daily cloud cover - oktas (eighths of the sky covered)
Daily mean visibility - Decametres (Dm)
Daily mean pressure - Hectopascals (hPa)

Question 17

Q

1.5 What time periods are used in the Large Data Set?

Answer

A

May-October 1987 & 2015

Question 18

Q

2.1 What is the formula you can use to calculate the mean from a set of data?

Answer

A

x̄ = (Σx)/n where x bar is the mean, x is each data value, and n is the number of data values

Question 19

Q

2.1 What is the formula you can use to calculate the mean from a frequency table?

Answer

A

x̄ = (Σxf)/(Σf) where x bar is the mean, x is each data value, and f is each frequency

Question 20

Q

2.2 How do you find the upper and lower quartiles for discrete data?

Answer

A

LQ: divide n by 4, if a whole number then LQ between this data point and one above, if a decimal then round up

UQ: Find 3/4 of n, if a whole number the UQ is between this data point and the one above, if a decimal round up

Question 21

Q

2.2 What is interpolation used for and how do you do it?

Answer

A

Used to find the median, quartiles, or percentiles of a grouped frequency table, assuming data values are distributed evenly within each class

Median= LB + ((n-a)/(b-a) x range) where LB is lower bound, n is the middle value, a is the lower frequency bound and b is the upper frequency bound

Question 22

Q

2.3 What is the range, IQR, and interpercentile range?

Answer

A

Range - difference between largest and smallest values

IQR - difference between upper and lower quartiles

Interpercentile range - difference two given percentiles

Question 23

Q

2.4 Give the formula for variance

Answer

A

((Σx^2)/n)-((Σx)/n)^2

Question 24

Q

2.4 Give the formula for standard variation

Answer

A

sqrt(((Σx^2)/n)-((Σx)/n)^2)

Question 25

Q

2.4 What is the formula for variance and standard deviation in a frequency table?

Answer

A

Variance:
((Σfx^2)/Σf)-((Σfx)/Σf)^2

Standard deviation:
sqrt(((Σfx^2)/Σf)-((Σfx)/Σf)^2)

Question 26

Q

2.5 If data is coded, what happens to the mean and the standard deviation when uncoded?

Answer

A

Mean: will be affected by any operations (+-x/)

Standard deviation: will only be affected by x or /

Question 27

Q

3.2 What are the 5 aspects of a box plot?

Answer

A

Range, Interquartile Range, Lower Quartile (Q1), Median (Q2), Upper Quartile (Q3)

Question 28

Q

3.3 Describe how to plot a cumulative frequency graph

Answer

A

Add another column to the frequency table labelled ‘cumulative frequency’
Plot cumulative frequency on the y-axis and the measurement on the x-axis
Plot the first point at the origin
Plot each point at the upper bound for each range
Join up points with a curve

Question 29

Q

3.4 What kind of data is a histogram used to present?

Answer

A

Continuous data

Question 30

Q

3.4 What is the formula for frequency density?

Answer

A

Frequency density = frequency/class width

Question 31

Q

3.4 How do you draw a frequency polygon?

Answer

A

Join up the middle of the top of each bar of a histogram with a straight line

Question 32

Q

3.4 What is the relationship between the area of each bar in a histogram and the frequency?

Answer

A

Area of each bar is proportional to the frequency

Question 33

Q

3.5 What can you comment on when asked to compare data sets? Give the difference between a boxplot and a frequency table

Answer

A

Boxplots:
Compare the median and the IQR - describe what effect these values have on the spread/location of data

Frequency tables:
Compare the mean and the standard deviation - describe what effect these values have on the spread/location of data

Question 34

Q

4.1 What is bivariate data?

Answer

A

Data that has pairs of values for two variables

Question 35

Q

4.1 What do the x and y axes of a scatter diagrams mean?

Answer

A

x-axis - independent variable, also known as explanatory variable

y-axis - dependent variable, also known as response variable

Question 36

Q

4.1 What are the five ways that you can describe a correlation as?

Answer

A

Strong positive, weak positive, no correlation, weak negative, strong negative

Question 37

Q

4.1 What is a causal relationship between two variables?

Answer

A

When a change in one variable causes a change in the other

Question 38

Q

4.2 What is the equation for a regression line on a scatter diagram? What are the conditions to use it?

Answer

A

y = a + bx

If you know the value of the independent variable (x), you can use the regression line to make a prediction of the dependent variable

Only make predictions for the dependent variable NOT the independent variable

Question 39

Q

4.2 What is the difference between interpolation and extrapolation?

Answer

A

Interpolation - using values from within the given range of data, is usually accurate

Extrapolation - using values from outside the given range of data, is usually innaccurate

Question 40

Q

5.1 What is a sample space?

Answer

A

The set of all possible outcomes

Question 41

Q

5.2 What is the notation for the intersection of a Venn diagram describing events A and B?

Question 42

Q

5.2 What is the notation for the union of a Venn diagram describing events A and B?

Question 43

Q

5.2 What is the notation used when describing a section of a Venn diagram that is ‘Not A’?

Question 44

Q

5.3 What does it mean when two events are mutually exclusive?

Answer

A

They cannot happen at the same time

Question 45

Q

5.3 What does it mean when two events are independent?

Answer

A

When on event has no effect on the other

Question 46

Q

5.3 What is the multiplication rule for independent events?

Answer

A

P(A and B) = P(A) x P(B)

Question 47

Q

5.4 What is another way of calculating P(At least one head) when flipping a coin twice?

Answer

A

P(At least one head) = 1 - P(Both Tails)

Question 48

Q

6.1 What is a random variable?

Answer

A

A variable whose value depends on the outcome of a random event

Question 49

Q

6.1 What is a sample space?

Answer

A

The range of values a random variable can take

Question 50

Q

6.1 What is a probability distribution?

Answer

A

It fully describes the probability of any outcome in the sample space

Question 51

Q

6.1 What is the probability mass function of rolling a fair six-sided dice?

Answer

A

P(X=x) = 1/6, x=1, 2, 3, 4, 5, 6

Question 52

Q

6.1 What is a discrete uniform distribution? Give an example

Answer

A

When all the probabilities in a distribution are the same, e.g. rolling a fair six-sided dice

Question 53

Q

6.1 For a random variable, X, how can you write that all the probabilities of all outcomes of an event add up to 1?

Answer

A

ΣP(X=x) = 1 for all x

Question 54

Q

6.2 What do you need in order to model X with a binomial distribution?

Answer

A

A fixed number of trials, n
Two possible outcomes (success and failure)
Fixed probability of success, p
Trials are independent of each other

Question 55

Q

6.2 How do you write a binomial distribution?

Answer

A

X~B(n, p)

Question 56

Q

6.2 If a random variable X has the binomial distribution X~B(n, p), what is its probability mass function?

Answer

A

P(X=r) = nCr x p^r x (1-p)^(n-r)

Question 57

Q

6.3 What is a cumulative probability function for a random variable X?

Answer

A

Gives the sum of all the individual probabilities up to and including the given X value, P(X≤x)

Question 58

Q

7.1 Define Test Statistic

Answer

A

The statistic that is calculated from the sample, e.g. no. of passes, no. of hands

Question 59

Q

7.1 Define Null Hypothesis

Answer

A

H(0): The hypothesis you assume to be correct

Question 60

Q

7.1 Define Alternative Hypothesis

Answer

A

H(1): tells you about the parameter if your assumption is shown to be wrong

Question 61

Q

7.1 What is a one-tailed test?

Answer

A

H(1): p<… or H(1): p>…

Question 62

Q

7.1 What is a two-tailed test?

Answer

A

H(1): p≠…

Question 63

Q

7.1 How do you carry out a hypothesis test?

Answer

A

You assume the null hypothesis is true
Then consider how likely the observed value of the test statistic was to occur.
If the likelihood is less than a given threshold (significance level) then you reject the null hypothesis