Describing data Flashcards
What is a random sample?
In a random sample, every individual in a population has the same chance of being selected, and the selection of individuals is independent.
What is a sample of convenience?
A sample of convenience is a collection of individuals easily available to a researcher, but it is not usually a random sample.
Can lead to volunteer bias: Volunteer bias is a systematic discrepancy in a quantity between the pool of volunteers and the population.
What is sampling error?
Sampling error is the chance difference between an estimate describing a sample and the corresponding parameter of the whole population.
What is sampling bias?
Bias is a systematic discrepancy between an estimate and the population quantity.
What are precision and accuracy?
Precision: The spread of an estimate (between samples) due to sampling error
Accuracy: The extent to which an estimate reflects sampling bias
What is a frequency distribution and a probability density function?
The frequency distribution describes the number of times each value of a variable occurs in a sample
A probability density function is taking this count and turning it into a proability.
What are the main measure of central tendency
Mean: The sum of values divided by sample size.
Median: The middle measurement.
Mode: The most common measurement
How to measure the mean, median and mode
Mean:
Y= sum of (Yi)/ n
Median: Middle value
- Even: between values
- Odd: specific value
Mode: Most common value
What are the measures of distribution of spread?
Range: max-min
Variance (S^2): sum of (Y1-sample mean)^2/ n-1
Sd: Square root of the variance
Interquartile range: First quartile- Third quartile (this can be depicted in a box plot)
Be careful when measuring SD and variance from a frequency table (make sure to multiply the Y-mean by the frequency)
Coefficient of variation
A relative measure of stdev as a percentage of the mean
CV=S/Y x100
When to use the mean or median/ SD or IQR
Median/ IQR: extreme data, skewed data (small sample size)
Mean/ SD: large sample size and normally distributed data
Cumulative frequency distribution
This distribution can be used to show percentiles or quantiles
The percentile of a measurement specifies the percentage of observations less than or equal to it. The quantile of a measurement specifies the fraction of observations less than or equal to it.
Proportions
Proportions are descriptive statistics used for categorical data
P hat= number in category/n