Data Collection, Sampling and Descriptive Statistics Flashcards
Data Collection Techniques (5)
Observations
Tests and assessments
Surveys
Document analysis (published articles)
Interviews
Cannot mix the techniques
Types of Data (2)
Primary: data that you collected
Secondary: data that someone else collected
Secondary Data Disadvantages (6)
May be out of date (limited by time)
May not have been collected long enough to detect trends.
May be missing info on some observations
May be incomplete
No control over data quality
Data collection may be estimated
Secondary Data Advantages (4)
Saves time
Saves money
Easily accessible
Makes collaboration easy, multicenter collaboration (rare diseases).
Primary Data Disadvantages (4)
Can be expensive to collect
Selection of population or sample
Difficulty recruiting participants
Pretesting the instrument to determine presence or absence of measurement bias.
Probabilistic Sampling Methods (4)
Simple random
Stratified random
Systematic random
Clustered random
Non Probabilistic Sampling Methods (3)
Convenience
Purposive
Snowball
What are Descriptive Statistics used for? (3)
To summarize data, describe data and present data.
Types of Descriptive Statistics (4)
- Measures of frequency: count, percent and frequency (how often an observation occurs).
- Measures of central tendency: mean, median and mode (data in relation to the middle position, locates distribution).
- Measures of Dispersion or variability: range, variance, standard deviation(difference between observed score and mean) and Interquartile range.
- Measures of position and rank: Percentile ranks, quartile.
Mean
Average
Mean = (Y1+Y2+…+Yn)/n
Y: variable
Y1: 1st observation of variable Y
Yn: last observation of variable Y
n: number of observations in sample
Outliers make the mean a bad measure of central tendency.
Median
All values are in rank order. The median is that value that splits the data set equally in halves. Same as 50th percentile.
If you have even nr. the average of the two middle nrs. is the median.
Mode
Observation with the highest frequency.
Can have more than one mode: Bimodal (2 modes).
Finding the mean when you have a bar chart with class intervals
You cannot find the exact mean when you have class intervals. You can estimate it by finding the midpoint of each interval.
(frequency x midpoint) / frequency
So you take each class interval and multiply the frequency of that class with it’s midpoint and then you add all of them up together and divide the nr by the total frequency.
Range
Difference between the lowest value and the highest value in a dataset.
Range = maximum value - minimum value
Can be affected by outliers.
Percentile
(C+0.5xf/N)x100%
C: nr/count of all observations lower than the observation of interest.
f: frequency of the observation of interest.
N: nr of all observations.
If you have two of the same observations you have to use the higher observation when finding C.
100th percentile means the highest score, 0 percentile the lowest score, not the same as percentage.