Statistics 1 Flashcards
Define population
Whole set of items of interest
Define census
Observation/measure of every member of the population
Name of the sampling units used for sampling from census?
Parameters
Define sample
Selection of observation from a subset of population in order to discover information about the population in its entirety
Name of sampling units used for sampling from sample?
Statistics
Advantage of a census?
-Completely accurate result obtained ( ie. everyone’s views recorded), giving true measure of population.
Disadvantages of a census?
-Time-consuming, labour-intensive and expensive
-Hard to contact whole population if applicable.
-Not used when testing involves the destruction of the item
-Hard to process large quantity of data
Advantages of a sample?
-Less-time consuming, labour-intensive and expensive
-If applicable, more easy to contact whole population
-Fewer people required to respond
-Less data to be processed
Disadvantages of a sample
-Data could be inaccurate
-Sample could not be large enough to inform of whole population by small sub-groups used
Correlation between a sample size and the validity of conclusions of the processed data.
Larger size of sample usually increases the validity of the conclusions of the processed data.
Unless using non-random sampling, requirement of sample?
To be random.
What does the size of a sample depend on.
-Accuracy required
-Resources available
Why is larger sample typically more accurate.
Larger proportion of data examined, more likely to be representative of population.
If population is very varied (heterogeneous)?
Size of sample required would be larger than that of a uniform (homogeneous) population.
Different samples can…
Lead to different conclusions due to the natural variation of a population.
Define sampling units.
The individual units of a population available for sampling.
Define sampling frame.
Where sampling units are individually named/numbered to form a list.
Criteria (generally) for representative sampling?
-Usage of random sampling method
-Typically, large sample size.
What is a biased sample?
One that does not accurately reflect the population, and perhaps favours a proportion of population over another.
How can you assess if a sample could be biased?
-Sample excludes people (based on age/gender/different interests (sweet sample outside of sweet shop) or habits (sport sampling at a sports centre) etc.)
-Sometimes, a small sample is likely to be biased.
If a sample is biased, what then can occur?
A sample unrepresentative of a population can lead in a sampling error.
Conclusion of data, on whole/average =x. Use data to agree/disagree with statement.
Steps
-Mean of data?
-Median of data?
-Presence of anomalies?
-Thus, mean/median better
(mean affected, median not)
-Hence, validity of data…
Define random sampling.
Where every member of the population has an equal chance of being selected for sampling (each sampling unit chosen by chance for sampling).
Thus, the sample performed under the methods of random sampling should be…
More representative of the population.
Benefit of random sampling as a whole.
It helps to eradicate the bias from sampling.
What are the 3 types of random sampling.
-Simple random
-Systematic
-Stratified
Discuss the method undergone of simple random sampling.
-Requirement of sampling frame.
-Utilisation of random number function of calculator or “lottery sampling”.
-Lottery sampling is where the members of the sampling frame are placed in a hat/other appropriate item, and then the required number of “tickets” are drawn from this object.
What are the advantages of simple random sampling.
-No bias
-Easy and cheap for small populations and small samples
-Each sampling unit has known and equal chance of selection
Disadvantages of simple random sampling.
-Not suitable with large population/sample size
-Requirement of a sampling frame.
-Only random if sampling frame is random.
Discuss the method undergone in systematic sampling.
-Required elements chosen at regular intervals from sampling frame.
-Regular intervals decided by number of units/required sampling size (n + x, n + 2x etc.)
-1st person chosen should be of randomised (1-x), then from then on the succeeding units are chosen at regular intervals (n+x, n+2x etc.) from the sampling frame.
Describe the advantages of systematic sampling.
-Simple and quick
-Suitable for large populations and sample sizes.
Describe the disadvantages of systematic sampling.
-Requirement of sampling frame
-If the 1st person chosen is not randomised, bias can be introduced into the sampling.
-Only random if sampling frame random
Possible limitation of systematic sampling?
Patterns could randomly occur in the selected data you have, not representative of all sub-groups of population.
Discuss the method undergone to perform stratified sampling.
-Population divided into mutually exclusive strata and random sampling occurs from each strata.
-The PROPORTION of each strata should be equal
What is the equation that decides what the number of strata will be to ensure its proportion of the overall population is equal to the rest.
No. of sample in the strata+ no of strata/no. of overall population x overall sampling size
What are the advantages of stratified sampling.
-Sample accurately reflects the population structure
-Guarantees proportional representation of groups within the population
What are the disadvantages of stratified sampling.
-Population required to be classified into distinct strata.
-Selection process within each strata is not suitable for large population/ sample sizes
-Requirement of sampling frame.
-Only random if sampling frame random
Chance of being selected in a stratified sample.
-Assumed that each member has equal chance of being selected due to system of random sampling.
Then:
chance= number of groups selected x 1/number of groups of study.
What are the 2 types of non-random sampling.
-Quota sampling
-Opportunity/Convenience sampling.
Describe quota sampling.
-Interviewer/researcher selects a sample to try to reflect the characteristics of a population.
-Population divided into groups according to the given characteristic, with the size of each group determining the proportion of the sample that will have that specific characteristic.
-Interviewer meets people, assesses group, and subsequently allocates them into the appropriate quota
-This continues until all the quota are filled.
What occurs if a person refuses to be interviewed/person fits into quota already filled?
Simply ignored and researcher/interviewer moves onto next person.
What are the advantages of quota sampling.
-Allows a small sample to still be representative of the population so field work can be done quickly.
-Not requiring a sampling frame.
-Quick, administration easy, inexpensive.
-Allows for easy comparison between the different groups of a population.
What are the disadvantages of quota sampling.
-Methods of non-random sampling, with judgement of interviewer, can introduce bias
-Population must be divided into groups, which can be costly or inaccurate.
-Increase scope of the study increases the number of groups, hence increasing the time and expense.
-Non-responses are recorded as such.
-Not possible to estimate sampling errors
(due to lack of randomness)
-Difficulties of defining controls e.g. social class
What is the method of opportunity sampling.
-Taking the sample from people available at the time (e.g first n people saw etc.) of sampling who fit the criteria that is being researched.
Advantages of opportunity sampling.
-Easy to carry out
-Inexpensive.
Disadvantages of opportunity sampling.
-Unlikely that the sampling is representative of the population
-Is highly dependent on the individual researcher
How can non-random sampling data be made to be more representative.
Contextual to the question, try and make less biased and more representative. Thus, usually increasing size of sample size valid, and also, 1 way of eradicating bias (prevent exclusion of certain people etc.)
What is qualitative data/variables.
Variables/data associated with non-numerical (ie. categorical) observations, being descriptive.
Quantitative?
Variables/data associated with numerical observations, being numerical.
What is a continuous variable.
A variable that can take any value of a given range, data “measured”
What is a discrete variable.
-Variable that can only take specific values in a given range, data “counted”.
What is sometimes helpful to be done for large data?
To display it in frequency tables/as grouped data.
Discuss the features of a grouped frequency table.
-Specific data values not shown
-Groups called classes
-Class boundaries are the maximum and minimum values that belong to each class
-Midpoint= Average of class boundaries.
-Class width= Upper class boundary- lower class boundary (difference of both)
Why is data usually grouped?
For the purpose of comparison.
If asked to compare 2 groups of data, 2 things looked at?
-Averages - mean usually, perhaps if anomalous median, if appropriate mode.
-General spread (skewness) of data- therefore would mean/median be more useful etc.
What are the 5 UK weather stations.
-Camborne
-Hurn
-Heathrow
-Leeming
-Leuchars
What are the coastal weather stations.
-Camborne
-Hurn
-Leuchars
What are the inland weather stations.
-Heathrow
-Leeming
What is the westernmost UK weather station.
Camborne.
What is the northernmost UK weather station?
Leuchars
What is the southernmost UK weather station?
Camborne
What is the easternmost UK weather station?
Heathrow
What UK weather station is the closest to the Isle of Wight.
Hurn
What is the 2nd most northernmost weather station.
Leeming.
What is the only non-English (Scottish) weather station.
Leuchars.
What are the 2 international weather stations.
-Jacksonville, USA
-Beijing, China
-Perth, Australia.
What are the coastal international weather stations.
-Jacksonville
-Perth
What is the inland international weather station.
-Beijing
What is the northernmost international station?
Beijing
What is the southernmost (overall + international) weather station.
Perth
What is the westernmost (overall +international) weather station.
Jacksonville
What is the easternmost (overall + international) weather station.
Perth
What is the northernmost overall weather station.
Leuchars
What is the only weather station of the Southern Hemisphere.
Perth
What is the daily mean temperature.
Average of hourly readings of temperature in a 24-hour period.
Where is the temperature recorded.
1.25m from ground level, by thermometers with a lowered screen above short grass.
What is the units of daily mean temperature.
Degrees celscius
When is the daily mean temperature recorded.
0900 to 0900 GMT.
What is the daily mean temp range for the UK weather stations?
Camborne: 10-20
Heathrow 8-29
Hurn: 6-24
Leeming: 4-23
Leuchars: 4-19
What is the daily mean temp. range for the international weather stations?
Beijing 8-33
Jacksonville 15-31
Peth 8-25
What is daily total rainfall measured in.
mm
What, as well as rain are measured? How are they made recordable?
-Solid precipitation like hail/snow
-They are melted before being measured.
What are amounts less than 0.05mm recorded as.
Tr (trace)
How should trace values be written numerically in calculations using them?
0
When is daily total rainfall measured.
0900 to 0900 GMT.
What is daily total sunshine.
-Amount of solar radiation that exceeds a threshold.
What are they recorded in.
Neatest tenth of an hour.
What is daily mean wind direction recorded in.
Knots (nautical mph)
1 knot=1.15 mph
When is daily mean wind direction recorded.
Average of data recorded between 0000 to 0000 GMT
How are wind directions (meaned) recorded.
-As bearings/cardinal (compass) direction, rounded to the nearest 10 degrees.
How is the daily mean windspeed recorded according to.
The Beaufort scale.
How is daily mean windspeed measured.
A visiometer.
Where is daily mean windspeed recorded from.
Averaged data recorded from 10m above ground level.
What is the scale and term for windspeed less than 1km.
Scale=0
Term=Calm
What is the scale and term for windspeed of 1-10km.
Scale=1-3
Term= Light
What is the scale and term for windspeed of 11-16km.
Scale= 4
Term= Moderate
What is the scale and term for windspeeds of 17-21km.
Scale=5
Term=Fresh
What are the ONLY measurements conducted by international weather stations.
-Daily mean temperature
-Daily total rainfall
-Daily mean windspeed
What is daily maximum gust.
-Highest instantaneous windspeed recorded.
-Direction from which the maximum gust blows is also recorded.
What is the units of daily maximum gust.
Km
What is daily maximum relative humidity.
How close air is to becoming saturated by water vapour
-Official term= percentage of air saturation that is water vapour.
What are causative of relative maximum humidities of more than 95%?
Foggy/misty conditions.
How is daily mean cloud cover measured.
Oktas= eighths of sky covered by cloud.
What is the daily mean visibility.
Greatest horizontal distance at which an object can be seen in daylight.
What would be the nightly mean visibility hence.
Greatest horizontal distance at which an object could be seen in the GENERAL ILLUMINATION OF DAYLIGHT.
What is daily mean visibility measured in.
Decametres
How is daily mean pressure recorded.
Hectopascals.
How is daily mean pressure recorded.
Pressure at station level of the sea.
When was data recorded from the specific weather stations to create the large data set.
-May-October 1987
-May-October 2015
How are missing values indicated.
N/a (not available)
Typical tendency of coastal locations.
Increased wind.
Typical tendency of northern locations.
Decreased temperature
Wettest city of 2015?
Jacksonville
Windiest month of 2015?
May
Wettest UK city of 2015?
Camborne
Warmest UK location on average of 2015?
Heathrow
Wettest month of 2015?
August
Importance of 1987.
-Great Storm occurred
-On midnight of 15-16 October.
Importance of 2015.
-Heathrow Airport affected by heavy rains of 26 August
Temperature of international weather stations in ascending order of 2015.
-Jacksonville (coldest)
-Beijing
-Perth
2015 importance for UK?
30 June, temperatures above 30 degrees celsius recorded.
Importance for knowledge on place of measurements.
-Consistency
-Comparison
Type of data collection that the Large Data Set is.
Secondary
Importance of Heathrow geographically in terms of windspeed.
Windspeeds tendency to be high due to is affect of arrivals and departures in its airport.