Year 1 - Statistics Flashcards
1.1 What is a census?
A census observes or measures every member of a population
1.1 What is a sample?
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole.
1.1 What are the advantages and disadvantage of a census?
Adv- Completely accurate result
Disadv- Time consuming, expensive, cannot be used when testing process destroys item, hard to process as large quantities of data
1.1 What are the advantages and disadvantages of a sample?
Adv- time-efficient, fewer people have to respond, less data to process
Disadv- Not as accurate, sample not large enough to give information about sub-groups of the population
1.2 What are the three methods of random samplng?
Simple random sampling - every member has an equal chance of being selected
Systematic sampling - required elements chosen at regular intervals from an ordered list
Stratified sampling - population is divided into mutually exclusive groups (e.g. males & females) and a random sample is taken from each
1.2 What are the advantages and disadvantages of simple random sampling?
Adv- No bias, easy & cheap to do for small samples, each sampling unit has an equal chance
Disadv - Not suitable when population is large as time-consuming, sampling frame is needed
1.2 What are the advantages and disadvantages of systematic sampling?
Adv- Simple & quick, suitable for large samples & populations
Disadv- Sampling frame needed, can introduce bias if sampling frame is not random
1.2 What are the advantages and disadvantages of stratified sampling?
Adv- Accurately reflects population structure, guarantees proportional representation of groups
Disadv- Population classified into distinct groups (strata), selection within each stratum suffers with same disadvantages as simple random sampling
1.3 What is quota sampling?
A researcher selects a sample that reflects the characteristics of the whole population
1.3 What is opportunity sampling?
Taking the sample from people who are available at the time of the study and who fit the criteria of the study
1.3 What are the advantages and disadvantages of quota sampling?
Adv- Allows a small sample to be representative of the population, no sampling frame, quick, easy, cheap, easy comparison between different groups
Disadv- Non-random sampling can introduce bias, population is divided into groups - costly/inaccurate, increasing scope of study increases no. of groups - time-consuming
1.3 What are the advantages and disadvantages of opportunity sampling?
Adv- Easy, cheap
Disadv - Unlikely to be representative, highly dependent on individual researcher
1.4 What is the difference between qualitative and quantitative data?
Qualitative - non-numerical observations
Quantitative - numerical observations
1.4 What is the difference between discrete and continuous data?
Discrete - A variable that can only take specific values in a range e.g. shoe size
Continuous - A variable that can take any value in a range e.g. time
1.5 What are the 8 cities in the large data set?
Leuchars, Leeming, Heathrow, Hurn, Camborne, Beijing, Jacksonville, Perth
1.5 What are the following measured in? Daily mean temp, daily total rainfall, daily total sunshine, daily mean wind direction and windspeed, daily max gust, daily max relative humidity, daily cloud cover, daily mean visibility, daily mean pressure
Daily mean temp - degrees Celsius (1dp)
Daily total rainfall - mm (1dp)
Daily total sunshine - tenth of an hour
Daily mean wind direction - Cardinal directions
Daily mean windspeed - Knots (1kn = 1.15mph)
Daily max gust - knots
Daily max relative humidity - percentage of air saturation (%)
Daily cloud cover - oktas (eighths of the sky covered)
Daily mean visibility - Decametres (Dm)
Daily mean pressure - Hectopascals (hPa)
1.5 What time periods are used in the Large Data Set?
May-October 1987 & 2015
2.1 What is the formula you can use to calculate the mean from a set of data?
x̄ = (Σx)/n where x bar is the mean, x is each data value, and n is the number of data values
2.1 What is the formula you can use to calculate the mean from a frequency table?
x̄ = (Σxf)/(Σf) where x bar is the mean, x is each data value, and f is each frequency
2.2 How do you find the upper and lower quartiles for discrete data?
LQ: divide n by 4, if a whole number then LQ between this data point and one above, if a decimal then round up
UQ: Find 3/4 of n, if a whole number the UQ is between this data point and the one above, if a decimal round up
2.2 What is interpolation used for and how do you do it?
Used to find the median, quartiles, or percentiles of a grouped frequency table, assuming data values are distributed evenly within each class
Median= LB + ((n-a)/(b-a) x range) where LB is lower bound, n is the middle value, a is the lower frequency bound and b is the upper frequency bound
2.3 What is the range, IQR, and interpercentile range?
Range - difference between largest and smallest values
IQR - difference between upper and lower quartiles
Interpercentile range - difference two given percentiles
2.4 Give the formula for variance
((Σx^2)/n)-((Σx)/n)^2
2.4 Give the formula for standard variation
sqrt(((Σx^2)/n)-((Σx)/n)^2)
2.4 What is the formula for variance and standard deviation in a frequency table?
Variance:
((Σfx^2)/Σf)-((Σfx)/Σf)^2
Standard deviation:
sqrt(((Σfx^2)/Σf)-((Σfx)/Σf)^2)
2.5 If data is coded, what happens to the mean and the standard deviation when uncoded?
Mean: will be affected by any operations (+-x/)
Standard deviation: will only be affected by x or /
3.2 What are the 5 aspects of a box plot?
Range, Interquartile Range, Lower Quartile (Q1), Median (Q2), Upper Quartile (Q3)
3.3 Describe how to plot a cumulative frequency graph
- Add another column to the frequency table labelled ‘cumulative frequency’
- Plot cumulative frequency on the y-axis and the measurement on the x-axis
- Plot the first point at the origin
- Plot each point at the upper bound for each range
- Join up points with a curve
3.4 What kind of data is a histogram used to present?
Continuous data
3.4 What is the formula for frequency density?
Frequency density = frequency/class width
3.4 How do you draw a frequency polygon?
Join up the middle of the top of each bar of a histogram with a straight line
3.4 What is the relationship between the area of each bar in a histogram and the frequency?
Area of each bar is proportional to the frequency
3.5 What can you comment on when asked to compare data sets? Give the difference between a boxplot and a frequency table
Boxplots:
Compare the median and the IQR - describe what effect these values have on the spread/location of data
Frequency tables:
Compare the mean and the standard deviation - describe what effect these values have on the spread/location of data
4.1 What is bivariate data?
Data that has pairs of values for two variables
4.1 What do the x and y axes of a scatter diagrams mean?
x-axis - independent variable, also known as explanatory variable
y-axis - dependent variable, also known as response variable
4.1 What are the five ways that you can describe a correlation as?
Strong positive, weak positive, no correlation, weak negative, strong negative
4.1 What is a causal relationship between two variables?
When a change in one variable causes a change in the other
4.2 What is the equation for a regression line on a scatter diagram? What are the conditions to use it?
y = a + bx
If you know the value of the independent variable (x), you can use the regression line to make a prediction of the dependent variable
Only make predictions for the dependent variable NOT the independent variable
4.2 What is the difference between interpolation and extrapolation?
Interpolation - using values from within the given range of data, is usually accurate
Extrapolation - using values from outside the given range of data, is usually innaccurate
5.1 What is a sample space?
The set of all possible outcomes
5.2 What is the notation for the intersection of a Venn diagram describing events A and B?
A ∩ B
5.2 What is the notation for the union of a Venn diagram describing events A and B?
A ∪ B
5.2 What is the notation used when describing a section of a Venn diagram that is ‘Not A’?
A′
5.3 What does it mean when two events are mutually exclusive?
They cannot happen at the same time
5.3 What does it mean when two events are independent?
When on event has no effect on the other
5.3 What is the multiplication rule for independent events?
P(A and B) = P(A) x P(B)
5.4 What is another way of calculating P(At least one head) when flipping a coin twice?
P(At least one head) = 1 - P(Both Tails)
6.1 What is a random variable?
A variable whose value depends on the outcome of a random event
6.1 What is a sample space?
The range of values a random variable can take
6.1 What is a probability distribution?
It fully describes the probability of any outcome in the sample space
6.1 What is the probability mass function of rolling a fair six-sided dice?
P(X=x) = 1/6, x=1, 2, 3, 4, 5, 6
6.1 What is a discrete uniform distribution? Give an example
When all the probabilities in a distribution are the same, e.g. rolling a fair six-sided dice
6.1 For a random variable, X, how can you write that all the probabilities of all outcomes of an event add up to 1?
ΣP(X=x) = 1 for all x
6.2 What do you need in order to model X with a binomial distribution?
- A fixed number of trials, n
- Two possible outcomes (success and failure)
- Fixed probability of success, p
- Trials are independent of each other
6.2 How do you write a binomial distribution?
X~B(n, p)
6.2 If a random variable X has the binomial distribution X~B(n, p), what is its probability mass function?
P(X=r) = nCr x p^r x (1-p)^(n-r)
6.3 What is a cumulative probability function for a random variable X?
Gives the sum of all the individual probabilities up to and including the given X value, P(X≤x)
7.1 Define Test Statistic
The statistic that is calculated from the sample, e.g. no. of passes, no. of hands
7.1 Define Null Hypothesis
H(0): The hypothesis you assume to be correct
7.1 Define Alternative Hypothesis
H(1): tells you about the parameter if your assumption is shown to be wrong
7.1 What is a one-tailed test?
H(1): p<… or H(1): p>…
7.1 What is a two-tailed test?
H(1): p≠…
7.1 How do you carry out a hypothesis test?
- You assume the null hypothesis is true
- Then consider how likely the observed value of the test statistic was to occur.
- If the likelihood is less than a given threshold (significance level) then you reject the null hypothesis