Sampling And Distributions Flashcards
Advantages of sampling
Cheap and quick
Disadvantages of sampling
Can be biased and unaccurate
Advantages of a census
Gives a completely accurate result
Disadvantages of a census
Time consuming and expensive
What is a sampling frame?
A list of the people in a sample
What is a sampling unit?
Each individual thing in the population
What is the population mean?
It is a parameter
What is the sample mean?
It is a statistic
How does a random sample eliminate bias?
Everyone has an equal chance of being chosen
All subsets of population size n must be possible
Every possible sample of size n must be equally likely to occur
Simple random sampling
Each thing has an equal chance of being selected
Each element of the sampling frame is assigned a number
Advantages:
Free if biased
Easy to use
Everyone has an equal chance of being chosen
Disadvantages:
Not representative
Not suitable for large populations
Systematic sampling
Required elements are chosen at regular intervals in an ordered list e.g. take every Kth element where (image)
Advantages:
Simple
Suitable for large samples
Disadvantages:
Relies on a sampling frame to be randomly ordered
Stratified sampling
Population is divided into strata and a simple random sample is carried out in each group
Sample proportion:
(Image)
Sampled from each strata
Used when sample is large and population naturally divides into groups
Advantages:
Can give more accurate estimates
Reflects the population structure
Disadvantages:
Problems the same as any simple random sample
If strata are not clearly defined they may overlap
Disproportionate stratified sampling
After the population has been divided into strata sometimes, the sample size of each sample may be chosen to be equal regardless of the strata propertion
Quota sampling
Population is divided into strata, the sample size for the strata can be found using known proportions like stratified or an attempt can be made to estimate them
An opportunity sample is taken to ‘fill’ the required quota
Once filled ignore any others
Advantages:
A chance for proportional representation
Fairly easy and cheap
Disadvantages:
Non random sample isn’t taken to generate each quota sample
As it is non random it can be easily biased
Could take a long time to fill each quota if samples taken are for any quota already filled
Opportunity sample
Ask people walking past to take part
Advantages:
Quick, cheap and easy
Disadvantages:
Can be very biased due to personal preference
People may not want to take part
Cluster sampling
Split the population into clusters
Number each from 1 - n
Use a RNG to select a sample of clusters and choose the corresponding clusters
Number all subjects in cluster from 1 - n
Use a RNG to select a small sample from the cluster and choose the corresponding subject
Repeat for all clusters
Advantages:
Non random with an element of randomness to aid against bias
If an equivalent simple random sample of a given population requires lots of travel/work
This provides an easier time efficient way to sample
Disadvantages:
No random, be aware of bias
Samples taken may not be representative of the whole population
Snowball sampling
Primary data sources are found and then they are asked to identify other samples that are relevant for the sample e.g. a drug user can be asked to identify another
Advantages:
Useful when samples possess rare/difficult characteristics and hence cannot be easily obtained
Disadvantages:
Non random and is only accurate as the referrals from the initial samples
Could be time consuming
Judgmental sampling
(Used when a quick sample is required)
The researchers own judgement is used to select the sample for example a snap election is called and a tv political commentator needs a quick sample of opinions from the general public
Advantages:
Can be quick and convenient
Can be cheap to do
Disadvantages:
Researcher is using their own judgement to generate a sample therefore there is a high chance of bias
How to estimate the mode from a histogram?
Probability notation
A random variable has three things associated with it…
1) the outcomes
2) the probability function
3) parameters - values we cannot control, but do not change across different outcomes
Advantages of probability function
Can have a rule/expression based on the outcome
Particularly for continuous random variables, it would be impossible to list the probability for every outcome
Advantages of distribution
The probability for each outcome is more explicit
Cumulative distribution function (CDF)
The CDF is F
Expected value E(X)
It represents the mean outcome we would expect if we were to do the experiment multiple times
It is a population paramtere
The mean of the squares
Population variance Var(X)
Mean of the squares - square of the mean
Uniform/rectangular distribution notation
Probability density function (p.d.f)
Area of rectangular distribution
Height of rectangular distribution
E(X) for rectangular distribution
Var(X) for rectangular distribution
E(X^2) for rectangular distribution
Quartiles in a rectangular distribution
Mode in a continuous distribution
The mode is the value where the pdf is greatest (the peak or turning point)
The conditions for binomial distribution
Fixed number of trials
Probability of success and failure must be constant
All events must be independent
Fixed number of outcomes (success and failure)
The Bernoulli distribution
The most simple distribution
Models an experiment with two outcomes, success or failure
A sequence of Bernoulli trials is a Bernoulli process
Binomial distribution is an example of the Bernoulli distribution
Binomial distribution
The binomial formula
The binomial distribution formula
The binomial coefficient
Comment on the suitability of using a binomial distribution
Consider the 4 conditions for binomial
Is it suitable?
Greater than probabilities for binomial CD
Expected value for binomial
Np
Number of trials x probability
Variance for binomial
Np (1 - p)
Number of trials x probability (1 - probability)
Structuring a hypothesis test
A hypothesis is a statement about a given population and its parameters
From a given situation in your enquiry, the ‘norm’ is known as the null hypothesis. This is a statement that what is believed about the null hypothesis is true
The null hypothesis is denoted by H0
Stages of a hypothesis test
1) state the null hypothesis
2) state the alternative hypothesis
3) make your assumptions
4) prove/disprove your hypothesis
5) conclude
State the null hypothesis
H0: p =
State the alternative hypothesis
An alternative hypothesis is denoted H1 and will be either
One tail (P> )
Or
Two tail (P doesn’t equal )
Make your assumptions
After you have stated your hypothesis you must make a statement that you are assuming the null hypothesis
Proving / disproving your hypothesis
You are looking to not reject or reject the null hypothesis
We are aiming to find out the chance of the outcomes occurring