sampling and representing data Flashcards
Define a population
Whole set of items that are of intrest to the study
What is a census?
Study that observes or measures every member of a population
What is a sample?
Selection of observations taken from a subset of the population to find out information about the population as a whole
Advantages of a census?
Gives a completely accurate result/representation
Disadvantages of a census?
Time consuming and expensive
Expensive to process and store so much data
Advantages of a sample?
Less time consuming; far cheaper
Less data to process/store
Disadvantages of a sample?
May not be large enough to enlighten on small subgroups
Not as accurate as a census
How to execute a simple random sample?
Give each item in the sampling frame an identifying number.
Use a random number gen. then go back to the sampling frame and select the items that corresponding lot-drawn numbers
Advantages of simple random sampling?
- Bias free
- Cheap and easy to do
- Each item has a calculable and equal chance of selection
Disadvantages of simple random sampling?
- Unsuitable for large population sizes
- Sampling frame (list of entities) required
How to carry out a systematic sample?
Required elements are selected at regular intervals in an ordered list.
Take every kth element where:
k = pop. size / samp. size
starting with random item between 1 and k
Advantages of systematic sampling?
- Quick and simple
- Suitable for large populations/samples
Downsides of systematic sampling?
- Sampling frame required
- Bias risk if sampling frame is not random
How to execute a stratified sample?
Divide population into strata (groups) and carry out a simple random sample in each group.
Use the same proportion:
samp. size / pop. size
sampled from each strata.
Use with large samples where pop. naturally divides into subgroups
Advantage of stratified sampling?
- Reflects pop. structure
- Guarantees proportional representation within population of given groups
Disadvantages of stratified sampling?
- Pop. must be defined by clearly distinct strata
- Selection from each strata suffers same pitfalls as simple random sampling
How to execute quota sampling?
Divide a pop. into groups according to a characteristic.
Quota of items in each group is set to try to emulate groups’ proportion in whole pop.
Interviewer selects the actual sampling units
Advantages of quota sampling?
- Lets small sample still be representative of pop.
- No sampling frame needed
- Quick, easy, inexpensive
- Allows easy comparison between groups in a pop.
Disadvantages of quota sampling?
- Risk of bias
- Pop must be divided into groups: costly of inaccurate
- Increased scope increases costs and time
- Non-responses are not recorded.
How to run an opportunity/convenience sample?
Take sample of people who are available at the time and fit the build
Advantages of an opportunity/convenience sample?
- Easy
- Cheap
Disadvantages of an opportunity/convenience sample?
- Unlikely to provide a representative sample
- Highly dependant on individual researcher
What is an outlier?
An extreme value, commonly defined as being 1.5 IQRs beyond the upper and lower quartiles
What are the features of a box plot?
How do you draw a cumulative frequency diagram?
Go down the table adding the frequency of the row underneath to the sum of the ones before it until the whole table is done.
plot the upper bound of the independent variable against the cumulative frequency for each row and draw a smooth curve to join these.
How do you use a cumulative frequency diagram?
you can quarter the cumulative frequency to find the IQR and quartiles, along with working out percentiles, by working out the desired percent of the population, drawing from this value on the y axis to the curve and drawing straight down to find its corresponding x value on the independent variable scale
What can be represented on a histogram?
continuous data given in a group frequency table
How do you calculate the height of each bar on a histogram?
Known as the frequency density, this is calculable using the formula:
area of bar = k X frequency
k = 1 id the easiest value to use as if k = 1, freq. density = freq / class width
How do you form a frequency polygon?
Join the middle of the top of each bar in a histogram:
How do you figure out how many elements satisfy a value range that crosses multiple boxes in a histogram?
Find how many boxes the interval spans. Separate into each individual box. For full boxes, the no. entities will just be the frequency for that column (or CW * FD).
work out percentage of class width needed in specific box. find the same percentage of the area (frequency). repeat for required boxes in range and sum together.
How do you find the height and width of boxes in histograms when diagram has been scaled up but data is same?
Find class width and area of a normal box and do the same for the same representative box of the blown up diagram. Find separate S.Fs for the area and width and just scale components according to geometry
How can you compare datasets?
By using mean and standard deviation or using the median and IQR. if dataset contains extreme values, Median and IQR are more suitable comparisons