Statistics As Flashcards
Population
The whole set of items that are of interest
Census
Observes/Measures every member of a population
Sample
A selection of observations taken from a subset of the population which is used to find out information about the population as a whole
Ad, Disad: Census
Ad: It should give a completely accurate result
Dis: Time consuming and expensive
Cannot be used when the testing process destroys the item
Hard to process large quantity of data
Ad, Disad: Sample
Ad: less time consuming and expensive than census
Fewer people have to respond
Less data to process than census
Dis: The data may not be as accurate
The sample may not be large enough to give information about small subgroups of the population
Sampling units
Individual units of a population
Sampling frame
Sampling units of a population that are named/numbered to form a list
What are the 3 methods of random sampling?
1) Simple random
2) Systematic
3) Stratified
What does random sampling do?
Every member of the population has an equal chance of being selected. Therefore should be representative of the population. Also helps to remove bias from a sample.
How do you carry out simple random sampling?
Need a sampling frame. Each unit is allocated a unique number and a selection of these numbers is chosen at random
What are the ways of picking a random unit in simple random sampling?
1) Generating random numbers (using calculator, computer etc.)
2) Lottery sampling - The members of the sampling frame could be written on tickets and placed into a ‘hat’. The required number of tickets is then drawn out
Systematic sampling
The required elements are chosen at regular intervals from an ordered list. Eg.
Sample size: 20
Population: 100
100/20 = 5. Every 5th person is picked.
The first person to be chosen is picked at random
Stratified sampling
the population is divided into mutually exclusive strata and a random sample is taken from each
The proportion of each strata should be the same.
No. sampled in stratum = (No. in strata / No. in population) * Overall sample size
Ad, Disad: Simple random sampling
Ad: Free of bias
Easy and cheap to implement for small populations and small samples
Each sampling unit has a known and equal chance of selection
Dis: Not suitable when the pop. size or sample size is large
a sampling frame is needed
Ad, Disad: Systematic sampling
Ad: Simple and quick to use
Suitable for large samples and populations
Dis: A sampling frame is needed
It can introduce bias if the sampling frame is not random
Ad, Disad: Stratified sampling
Ad: Sample accurately reflects the population structure
Guarantees proportional representation of groups within a population
Dis: Population must be clearly classified into distinct strata
Selection within each stratum suffers from the same disad. as simple random sampling
Two types of non-random sampling:
1) Quota sampling
2) Opportunity sampling
Quota sapling
An researcher selects a sample that reflects the characteristics of the whole population
the population is divided into groups according to a given characteristic. The size of each group determines the proportion of the sample that should give that characteristic
As an interviewer, you would meet people, assess their group and then allocates them into the appropriate quota
This continues until all quotas have been filled. If a person refuses to be interviewed or the quota is full then you ignore them
Opportunity sampling
Taking the sample from people who are available at the the time of the study, who fit the criteria you are looking for
Ad. Disad; Quota
Ad: Allows a small sample to still be representative of the population
No sampling frame required
Quick, easy and inexpensive
Allows for easy comparison between different groups within a population
Dis: Non-random samples can introduce bias
Population must be divided into groups, which can be costly or inaccurate
Increasing scope of study increase number of groups which adds time and expense
Non-responses are not recorded as such
Ad. Disad; Opportunity
Ad: Easy to carry out
Inexpensive
Dis: Unlikely to provide a representative sample
Highly dependant on individual researcher
Quantitative data
Numerical data
Qualitative data
Non-numerical
Continuous
Take any value in a given range
Discrete
Only specific values
Explain briefly what you understand by
(i) a statistical experiment [1]
A test/investigation adopted for collecting data to provide evidence for or against a hypothesis
Explain briefly what you understand by
(ii) an event. [1]
Sub-set of of possible outcomes of an experiment
State one advantage and one disadvantage of a statistical model. [2]
Ad: Quick, cheap, vary parameters/predict
Dis: Does not replicate real-world situation in every detail
Define Hypothesis Test
A statistical test that is used to determine whether there is enough evidence in a sample of data to infer that a certain condition is true for the entire population
What are the advantages and disadvantages of using the median over the mean?
The median is used when there are extreme values, as they do not affect it
However, because the mean uses all the pieces of data, it gives a true measure of the data. It is affected by extreme values
Describe how to find the lower quartile for discrete data. n = number of data points
Divide n by 4. If this is a whole number, the lower quartile is halfway between this data point and the one above. If it is not a whole number, round up and pick this data point
Describe how to find the upper quartile look for discrete data
n = number of data points
Find 3/4 of n. If this is a whole number, the upper quartile is halfway between this date point and the one above. If it is not a whole number, round up and pick this date point
Finding quartiles in data:
What do you assume when you use interpolation?
That the data values are evenly distributed within each class
How do you find the quartiles for groups continuous data, or data presented in a cumulative frequency table?
Q1 = n/4th data point Q2 = n/2th data point Q3 = 3n/4th data point
What are alternative phrases for measures of spread?
Measures of dispersion
Measures of variation
What is range?
The difference between the largest and smallest values in the dataset