Collecting And Interpreting Data Flashcards
What is a population?
A population is the whole set of items that are of interest
What is a sample?
A sample is some subset of the population intended to represent the population
What is a sampling unit?
Each individual in the population that can be sampled is a sampling unit
What is a sampling frame?
Sampling units of a population can be individually named or numbered to form a list called the sampling frame
What is a census?
Data collected from every sampling unit is known as a census
What are the advantages and disadvantages of a census?
Advantage: It should give completely accurate results
Disadvantage: Time consuming, expensive, impossible if testing involves destruction, lots of data to process
What are the advantages and disadvantages of a sample?
Advantages: Quick, Cheap, Less data to process
Disadvantages: Inaccurate, Miss out small sub groups
How do you carry out simple random sampling?
Every sample has an equal chance of being picked.
In sampling frame, each item assigned identifying number and a random number generator or “lottery sampling” is used to pick specific sampling units
What are the advantages and disadvantages of simple random sampling?
Advantages: Unbiased, Easy, Cheap, Equal Chance of Selection
Disadvantages: Impossible with large population, Sampling Frame needed
How to carry out systematic sampling?
Required elements are chosen at regular intervals in ordered list. Every kth elements where
k = pop size///samp size
Starting at a random item between 1 and k
What are the advantages and disadvantages of systematic sampling?
Advantages: Simple, Quick, Can do large samples
Disadvantages: Sampling frame needed, Bias if frame is not random
How to carry out stratified sampling?
Population divided into groups (strata) and simple random sampling occurs within them. Same proportion sampled from each strata:
Samp size///Pop size
Used when sample is large and population naturally falls into groups
What are the advantages and disadvantages of stratified sampling?
Advantages: Reflects population structure, Proportional representation of strata in population
Disadvantages: Population must be in strata, Same disadvantages as simple random sampling
How to carry out quota sampling?
Divide population into groups according to characteristics of interest to reflect population proportion. Interviewer selects the sampling units
What are the advantages and disadvantages of quota sampling?
Advantages: Small sample represents population, No sampling frame, Quick, Easy, Cheap, Easy comparison
Disadvantages: Possible bias, Costly or inaccurate to make groups, More groups is more expense, Non-response not recorded
How to carry out opportunity sampling?
Sample taken from people who are available at the time, who meet criteria
What are the advantages and disadvantages of opportunity sampling?
Advantages: Easy, Cheap
Disadvantages: Unlikely to be representative sample, Dependent on researcher (bias)
What does x bar mean?
X bar is the mean of the values in the variable ‘x’
What are measures of location? Examples?
Measures of location are single values which describe a position in a data set (Maximum, Minimum, Quartile, Percentile)
What are the measures of central tendency? Examples?
Within the measures of location, you have the measures of central tendency which are to do with the where the centre of data is (Mean, Medium, Mode)
What are the measures of spread? Examples?
There are measures of spread which are to do with how data is spread out (Range, IQR, Variance, Standard Variation)
What is the equation for x bar?
X bar = Total of X/Number of X values
What is the x bar for a frequency table?
X bar = (Frequency***X)///Frequency
What is the equation for x bar of grouped frequency?
X bar = (Average Frequency***X)///Frequency
What is the equation for standard deviation?
Standard Deviation = Sqrt(Sigma x2 - n*x bar 2///n-1)
How do you calculate the median for a list and grouped data?
To find the median, use the equation
Number of Values///2
If decimal, round up, if whole, use halfway between this item and the one after
With grouped data, it is the same equation and then linear interpolation
What is the method behind linear interpolation?
Divide the median by the interval in the group data it is in. Multiply the fraction by the group interval and add this to the lower bound
What is the formula for the Lower and Upper Quartiles?
UQ = 3(N+1)///4 LQ = N+1///4
What is a formula to find an outlier using the IQR?
Data which lies 1.5 *** IQR beyond the lower and upper quartiles
What is the formula for an outlier using standard deviation?
Data which lies +- 2 *** standard deviation from the mean
What is variance and standard variation?
Variance is a measure of the spread between numbers in a data set from the mean
What is the formula for variance?
Variance = Sigma x2 - (n)(x bar)///n-1
What is a causal relationship?
Two variables have a causal relationship if a change in one variable directly causes change in the other
What is the relationship between extrapolating and reliability?
Extrapolating outside the range of data compromises reliability
What is cluster sampling?
The population is divided into smaller groups known as clusters of which simple random sampling is used to pick an entire cluster rather than a sampling unit. The groups are not connected by a characteristic like in stratified sampling
What are the advantages and disadvantages of cluster sampling?
Advantages: Simple manual process, Cheap, Quick, Allows for increasing sample size
Disadvantages: Less precise, Leads to over/underrepresentation causing bias, Does not represent population structure well
What does it mean if a set of data has a negative, positive, and symmetrical skew?
A negative skew is where the mean lies before the median
A positive skew is where the median lies before the mean
A symmetrical skew Is when the mean and medium lie together