Stats Year 1 Flashcards
What are the 5 things that must be in a hypothesis testing
- Null hypothesis
- Alternative hypothesis
- Test statistic
Define the significance level
The probability of incorrectly rejecting the null hypothesis
Define a population
The whole set of items that are of interest
Define census
Observes or measures every member of a population
Define sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
Define sampling units
Individual units of a population
What is a sampling frame
Sampling units of a population that are individually named or numbered to form a list
What is an advantage of a census
It should give a completely accurate result
What are the 3 disadvantages of a census
- Time consuming and expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantity of data
What are the 3 advantages of using a sample
- Less time consuming and expensive than a census
- Fewer people have to respond
- Less data to process than in a census
What are the 2 disadvantages of using a sample
- The data may not be as accurate
- The sample may not be large enough to give information about small sub-groups of the population
What is a simple random sample
Where every sample has an equal chance of being selected
How do you carry out a simple random sample
- Allocate each person or thing in the sampling frame a unique number
- Random number generate
What is systematic sampling
Required elements are chosen at regular intervals from an ordered list e.g. every 5 numbers are selected
What is stratified sampling
The population is divided into mutually exclusive strata (males and females for example) and a random sample is taken from each
What is the equation to calculate how many people/things should be involved in the sample per strata in stratified sampling
Number in stratum/ number in population then x by overall sample size
What are the 3 advantages of simple random sampling
- Free of bias
- Easy and cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection
What are the 2 disadvantages of simple random sampling
- Not suitable when the population size or the sample size is large n
- A sampling frame is needed
What are the 2 advantages of systematic sampling
- Simple and quick to use
- Suitable for large samples and large populations
What are the 2 disadvantages of systematic sampling
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random
What are the 2 advantages of stratified sampling
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population
What are the 2 disadvantages of stratified sampling
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from the same disadvantages as simple random sampling
What is quota sampling
An interviewer or researcher selects a sample that reflects the characteristics of the whole population
What is opportunity sampling
Consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for
What are the 4 advantages of quota sampling
- Allows a small sample to still be representative of the population
- No sampling frame required
- Quick, easy and inexpensive
- Allows for easy comparison between different groups within a population
What are the 4 disadvantages of quota sampling
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Increasing scope of study increases number of groups, which adds time and expense
- Non-responses are no recorded as such
What are the 2 advantages of opportunity sampling
- Easy to carry out
- Inexpensive
What are the 2 disadvantages of opportunity sampling
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
What are variables that are associated with numerical observations called
Quantitative variables
What are variables associated with non-numerical observations called
Qualitative variables
What is a continuous variable
A variable that can take any value in a given range
What is a discrete variable
A variable that can take only specific values in a given range
What is the mode
Value that occurs the most
What is the median
The middle value
How do you calculate the variance
Mean of the squares minus the square of the mean
How do you calculate standard deviation from the variance
Square root it
What are the 3 common definitions of an outlier if not stated in the question
- Upper quartile + 1.5(IQR)
- Lower quartile - 1.5(IQR)
- The mean plus or minus 2 standard deviation
The process of what is known as cleaning the data
Process of removing anomalies from a data set
How do you calculate the frequency density
Frequency/ class width
When comparing data what 2 things must you compare
- A measure of location
- A measure of spread
What things are measures of location
- Mode
- Mean
- Median
- Quartiles
- Percentiles
What things are measures of spread
- Range
- Interquartile range
- Variance
- Standard deviation
What are the 5 possible correlation descriptions
- Strong negative correlation
- Weak negative correlation
- No correlation
- Weak positive correlation
- Strong positive correlation
Is the explanatory variable dependent or independent
Independent
Where does the explanatory variable go, x or y axis
x axis
Is the response variable dependent or independent
Independent
Which axises should you plot the response variable
Y-axis
What type of relationships do the variables have if a change in one causes a change in the other
Casual relationship
What is the equation of a regression line
y=a+bx
What is an experiment
A repeatable process that gives rise to a number of outcomes
What is an event
A collection of one or more outcomes
What is a sample space
The set of all possible outcomes
What is the term used to describe when events have no outcomes in common
Mutually exclusive
For mutually exclusive events, how do you calculate the P(A+B)
P(A)+P(B)
For independent events, how do you calculate P(AandB)
P(A) X P(B)
How do you calculate whether events are independent
P(AandB) = P(A) X P(B)
How do you calculate whether events are mutually exclusive
P(AandB) = P(A) + P(B)
When using the binomial distribution function on your calculator using CD, what must you remember it is calculating
Equal to or less than the number (the x)
What is the basic equation for binomial distribution
B(n,p)
- n= number of trials
-p= probability
What is the null hypothesis
The hypothesis that you assume to be correct
What is the alternative hypothesis
Tells you about the parameter if your assumption is shown to be wrong
When dealing with a 2-tailed test, what happens to the signifance level
It is split in half, half of the percentage is given to below the number and the other half is given to above the number
e.g. for a 5% significance level 2.5% is given to above and 2.5% is given to below
What is the rejection/ critical region
The region of the probability that would cause you to reject the null hypothesis
What is the critical value
The first value to fall inside the rejection region
What is the actual significance level
The probability of incorrectly rejecting the null hypothesis, the significance level given in the question is rarely the exact % you find from the calculator
- Say the question gives a 5% significance level, and the value given you 4.4% instead of exactly 5% the actual significance level is 4.4%
How many rejections regions are there in a 2-tailed test
2, one at each end of the distribution