Introduction to statistics - Data Flashcards
Data collection techniques
- Observations
- Tests and assessments
- Surveys
- Document analysis (published articles)
- Interviews
What are Secondary data?
data that someone else has collected
What are Primary data?
data that you collect.
Secondary Data Sources
- Country & local city departments -> e.g.,
Ministère-Direction de la Santé, STATEC (Statistic portal of Luxembourg) - Hospitals, Clinics & Schools
- Research institutions e.g., Luxembourg Institute of Health
- Journal Articles
- International institutions e.g., OECD
Secondary Data disavantages
- May be out of date for what you want to analyze.
- May not have been collected long enough for detecting trends.
- There may be missing information on some observations
- May be incomplete
- You have no control over data quality
Secondary data advantages
- Saves time
- Saves money
- Easily Accessible
- Increases the feasibility of multicenter/ international collaboration
Challenges of primary data
- Can be expensive to collect
- Selection of population or sample
- Difficulty recruiting participants
- Pretesting/piloting the instrument to determine the presence or absence of measurement bias
Who/What do we collect data from?
- Patients
- Therapists
- Published research
- Electronic devices
- Public & Private organisations
- Anything that exists can be a source of data
Defenition Population
A population is a group (e.g. patients) that have something in common (e.g. back pain).
(-> Target population)
Defenition Sample
A sample is a smaller group with similar characteristics from within that population.
What’s a sample bias?
When the sample shows a higher % of a special caractéristics, or a type of person compared to the % in the population
Which are the probalistic sampling methods?
- simple random
- statified random
- systematic random
- clustered random
Non probabilistic sampling methods?
- convenience
- purposive
- snowball
What are descriptive statistics used for?
- summarise data
- describe data
- present data
Types of descriptive statistics
- Measures of Frequency
- Measures of Centeal Tendency
- Measures of Dispersion or variability
- Measures of position & rank
Measures of Frequency
Count, Percent & Frequency
Shows how often an observation occurs.
Measures of Centeal Tendency
Mean, Median & Mode
-> Locates the distribution by variouspoints.
-> Shows the average or the most common score.
Measures of Dispersion or variability
Range, Variance, Standard Deviation & Interquartile Range.
- Identifies the spread of scores by stating intervals
- Range = high/low points in the dataset
- Variance or Standard Deviation = difference between observed score and the mean
- Shows how “spread out” the data are. It is helpful to know when data are spread out as this can affect the average.
- The Interquartile range is the difference between the third quartile and the first quartile.
Measures of position & rank
Percentile Ranks, Quartile
-> Describes how scores fall in relation to one another. Relies on standardized scores
-> Compares scores to a normalized score (e.g. a national norm)
Mean
By what can be affected the Mean?
By outliers
-> can make the mean a bad measure of central tendency
What’s the median
When data are listed in order, the median is the point at which 50% of the cases are above and 50% below it.
This is the same as the 50th percentile.
What’s the mode?
The observation with the highest frequency
What’s the range?
The range is the difference between the lowest value and the highest value in a dataset
Range = (maximum value – minimum value)
What’s Q1?
=1st Quartile
= the value occupying the ¼ position of all values arranged in ranked order. Or median of the 1st half of all observations.
What’s Q3?
= 3rd quartile
= the value occupying the ¾ position of all values arranged in ranked order. Or the median of the 2nd half of all observations.
What’s IQR?
= Interquartile = Q3-Q1
What’s Q2?
= median
Variance
= is a measure of how close together or far apart the values in a databest
->The larger the variance, the further the individual values are from the mean.
->The smaller the variance, the closer the individual values are to the mean.