Statistics Flashcards
Population
The whole set of items that are of interest
Sample
A subset of the population intended to represent the population
Sampling unit
Each individual item in the population that can be sampled
Sampling frame
Sampling units of a population are individually named or numbered to for a list e.g. DVLA list of drivers and car registration, patients at a doctors surgery
Census
Data collected from the entire population
Census advantages
Should give completely accurate results
Census disadvantages
- Time consuming & expensive
- Can not be used with tests involving destruction
Sampling advantages
- Cheaper, quicker
- Less data to process
Sampling disadvantages
- Data may not be accurate
- Data may not be large enough to represent small sub-groups
How do you carry out simple random sampling?
In a sampling frame, each item is assigned a number. Use a random number generator to select a random item or βlottery samplingβ
Simple random sampling advantages
- Bias free
- Easy + cheap
- Each number has a known equal chance of being selected
Simple random sampling disadvantages
- Unsuitable for large population sizes
- Sampling frame needed
- Can introduce bias if the sampling frame is not random
How do you carry out systematic sampling?
- > Required elements are chosen at regular intervals in an ordered list
- Take every Kth element where: K= population size(N)/sample size(n)
- Starting at a random item between 1 and K
Systematic sampling advantages
- Simple and easy to use
- Suitable for large samples
Systematic sampling disadvantages
- Sampling frame needed
- Can introduce bias if sampling frame is not random
How do you carry out stratified sampling?
Divide population into strata (groups) and a simple random sample is carried out in each group
- Sample size(n)/population size(N) sampled from each strata
Stratified sampling advantages
- Reflects population structure
- Guarantees proportional representation of groups within population
Stratified sampling disadvantages
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from same disadvantages as simple random sampling
How do you carry out quota sampling?
- Population is divided into groups according to characteristics
- A quota of items in each group is set to reflect the proportion in the whole population
- Interviewer selects the actual sampling unit
Quota sampling advantages
- Allows small sample to still be representative of population
- No sampling frame required
- Relatively easy & inexpensive
- Allows for easy comparison between different groups of a population
Quota sampling disadvantages
- Non-random sampling can introduce bias
- Population must be divided into groups which can be costly or inaccurate
- Increasing scope (further investigation) of the study increases number of groups adding time and expense
- Non-responses are not recorded
How to carry out opportunity sampling?
Sampling taken from people who are available at the time of study who meet criteria
Opportunity sampling advantages
- Easy to carry out and inexpensive
Opportunity sampling disadvantages
- Unlikely to provide a representative sample
- Highly dependant on individual researcher
Quantitative values
Variables or data associated with numerical values e.g. shoe size
Qualitative values
Variables or data associated with non-numerical values e.g. hair colour
Continuous variable
A variable that can take any value within a given range e.g. time: 1 second, 1.1 seconds, 1.01 seconds
Discrete variable
A variable that can only take a specific value in a given range e.g. number of kids; you cannot have 4.69 children
What values do daily total sunshine, daily mean wind speed and daily max gust take for the first two weeks of May 1987 in the UK
n/a
Value for trace
0.025
Leuchars (UK)
- Sheltered by the Ochil hills on the west and exposed to the North Sea on the east
- Most northern UK weather station so it has the lowest average temperatures
Leeming (UK)
- Situated between Yorkshire dales to the west and north York Moore to the east
- Sheltered location leads to a dry, almost semi-arid climate
Heathrow airport (UK)
- Far from the city centre so the temperature arenβt raised by the urban heat island effect
- Below-average rainfall for Britain
- Hotter summer temperatures due to its proximity to continental
Europe and its southerly latitude
Hurn (UK)
- 6km from the south coast of England
- Has rainfall well below the national average
Camborne (UK)
- Mildest and sunniest climate in UK
- In some places it is sub tropical because of its southern location and also the warm water of the Gulf Stream
- Presence of sea moderates extreme temperatures
- Extreme rainfall is not uncommon
Beijing (INT)
- Shielded by mountains to the north and west
- It has a humid continental climate
- East Asian monsoon causes humid summers
- Siberian anticyclone causes cold, windy and dry winters
Jacksonville (INT)
- Humid subtropical climate
- Winters are typically mild and sunny as it is low lying on the coast
- Summers are usually hot, very humid and prone to thunderstorms
- High humidity makes high heat common in the summer
Perth (INT)
- Hot summer Mediterranean climate
- Winters are generally cool and wet
- Summer months are hot, dry and sunny
- Summer rainfall usually caused by short thunderstorms or decaying tropical cyclones
Daily mean temperature (Β°c)
Average of the hourly temperature readings during a 24 hr period
Daily total rainfall
Includes solid precipitation such as snow and hail which is melted befor being included in any measurements
Daily total sunshine
Recorded to the nearest tenth of an hour
Daily mean wind direction and wind speed
- Measured in knots
- Averaged over 24 hours
- Mean wind directions are given as bearing and as cardinal (compass) directions
- Data for mean wind speed categorised according to the Beaufort scale
Daily maximum gust
- Measured in knots
- Highest instantaneous wind speed recorded
- Direction is also recorded
Daily maximum relative humidity
- Percentage of air saturation with water vapour
- Humidities above 95% gives rise to duty and foggy conditions
Daily mean cloud cover
- Measured in oktas
- Eighths of the sky covered in clouds
Daily mean visibility
- Measured in decametre
- Greatest horizontal distance at which an object can be seen in daylight
Daily mean pressure
Measured in hectapascals
Measures of location
Single values which describe a position in a data set
Variance
The average squared distance of each value from the mean
Standard deviation
(Variance)^1/2
Outlier
An extreme value that goes outside the overall pattern of the data
Cleaning the data
Process of removing anomalous from a data set
What do you compare between two sets of data
Measure of location and measure of spread
Equation for frequency density
Frequency / class width
What is an experiment
A repeatable process that gives rise to a number of outcomes
What is an event?
A set of one or more of these outcomes (we often use capital lettered to represent this)
What is a sample space?
Set of all the possible outcomes
Mutually exclusive events
Two things that cannot happen at the same time hence why they do not share an intersection in a Venn diagram
The event A and B
The intersection of both A and B, it represents the event that both A and B occur
The event A or B
Also known as the union of A and B, it represents the event of A or B, or both, occur
The event not A
Also known as the complement of A representing the event that A does not occur
You can model a random variable with binomial distribution if:
- There are a fixed number of trials
- There are two possible outcomes: βsuccessβ or βfailureβ
- There is a fixed probability of success
- The trials are independent of each other
Test statistic
Evidence from the sample in a hypothesis test
Null hypothesis
Is the current position that nothing has changed unless proven otherwise
-> In binomial hypothesis testing it will always be that the probability equals a specific value
Alternative hypothesis
There is some change in the population parameter
When do you use a one tailed test?
If the alternative hypothesis is H1:p>a (probability greater than expected) or H1:p<a></a>
When do you use a two tailed test?
If the alternative hypothesis is H1: p does not equal a (probability is different than expected)
Critical region
The values for which you would reject the null hypothesis
Critical value
The first value to fall inside the critical region