chapter 1 - Statistics, Data, and Statistical Thinking Flashcards
Statistics
science of data
Descriptive Statistics
Describing sets of data using numerical and graphical methods to explore data patterns
Inferential Statistics
Drawing conclusions about sets of data based on sampling (need to distinguish between population and sample)
units
rows
variables
columns
Parameter of interest
what your data is focused on
numerical value that you want to draw conclusions about from a sample of data — typically characteristic of a population
ex: average income
experimental (observational) unit
object (person, thing, transaction, event) upon which we collect data
ex: individual students/health workers whom we collect data from
Variable
numbers or characteristics that can be counted or measured
characteristic of the experimental units
Population
A set of experimental units that we are interested in studying
Sample
a subset of the units of the relevant population
Statistical Inference
estimation, prediction, or other generalization about a population based on information contained in a sample
Parameter
number that describes the whole population
any quantity computed from the observations in the population
ex: population mean (average) µ (mu)
average length of a butterfly
Statistic
any quantity computed from the observations in the sample
ex: ex: sample mean x̄ (x bar)
ex: the average income for a sample drawn from the U.S. is a sample statistic.
Measure of reliability (measure of uncertainty)
statement (usually quantified) about the degree of uncertainty associated with a statistical inference
Four Elements of Descriptive Statistical Problems
- The population or sample of interest
- One or more variables that are to be investigated
- Tables, graphs, or numerical summary tools
- Identification of patterns in the data
Five Elemental of Inferential Statistical Problems
- The population of interest
- One or more variables that are to be investigated
- A sample
- The inference about the population based on information contained in the sample
- A measure of reliability for the inference
Quantitative data (numerical data)
data that can be counted and measured in numeric values
ex: temperature
unemployment rate
test scores
number of female executives
Qualitative data
measurements that cannot be recorded on a numerical scale; can only be classified into one of a group of categories
ex: zip codes
“yes/no”
Representative sample
exhibits characteristics typical of those possessed by the population of interest
Stratification
splitting population into strata and then a random subsample is taken from each stratum
SUB-SAMPLE EACH GROUP
cluster sample
a sample in which each population unit belongs to a cluster, and the clusters are sampled
SAMPLE GROUPS
Systematic sample
every kth unit in the population
selected according to a random starting point but with a fixed, periodic interval
divide population size by desired sample size
Randomization
- selection is fair
- protects against biased samples
- help represent all the features of the population
Sampling variability (sampling error but no error has taken place)
sample-to-sample differences in the values of the variables