Lecture 1 Flashcards
Statistics
the science of data that involves designing, collecting, classifying, summarizing, organizing, analyzing, and interpreting numerical and categorial information
what questions can statistics answer?
what kind and how much data needs to be collected? how should we organize and summarize the data? how can we analyze the data and draw conclusions from it? how can we assess the strength of the conclusions and evaluate their uncertainty?
Descriptive statistics
utilizes numerical or graphical methods to explore data; includes the construction of graphs, charts, and tables, and the calculation of various descriptive measures such as averages, measures of variation, and percentiles
inferential statistics
utilizes sample data to make estimates, decisions, predictions, or other generalization about a large set of data
population
collection of all individuals or items under consideration in a statistical study
sample
part of the population from which information is collected
variable of interest
characteristic or property of an individual experimental unit. Often represented numerically by a measurement
statistical inference
is an estimate or prediction or other generalization about a population based on information contained in a sample
experimental unit
set of units that we are interested in studying
quantitative
measurements that are recorded on a naturally occurring numerical scale
qualitative or categorical
measurements that cannot be measured on a naturally numerical scale. they can only be classified into groups or categories
designed experiment
data collection method where the researcher exerts full control over the characteristics of the experimental units sampled. these experiments typically involve a group of experimental units that are assigned the treatment and an untreated (control) group
observational study
data collection method where the experimental units sampled are observed in their natural setting. No attempt is made to control the characteristics of the experimental units sampled
representative sample
exhibits characteristics typical of those possessed by the target population
simple random sample
subset of a statistical population in which each member of the subset has an equal probability of being chosen
stratified random sample
typically used when the experimental units associated with the population can be separated into two groups (strata)
cluster sample
used when it is more convenient and logical to sample natural groupings (clusters) of experimental units first, then collect data from all units within each cluster
systematic sampling
involves selecting every kth unit from a list
efficiency
time and money
accuracy
an estimator is accurate if it is both unbiased and precise
bias
a systematic difference between an estimator and the population parameter
in what ways can bias be introduced into an estimator
measurement error, coverage error, nonresponse error, the estimator itself (MCNE)
selection bias
results when a subset of experimental units in the population has little or no chance of being selected for the sample
nonresponse bias
type of selection bias that results when data on all experimental units in a sample are not obtained
measurement error
refers to inaccuracies in the values of the data collected. In surveys, the error may be due to ambiguous or leading questions and the interviewer’s effect on the respondent
data analysis path
formulate the research problem, define population and sample, collect the data, do descriptive analysis, use appropriate statistical methods to solve the research problem, report the results,
statistical thinking
involves applying rational thought and the science of statistics to critically assess data and inferences. Fundamental to the thought process is that variation exists in populations of data.
unethical statistical practice
includes introducing intentional bias into the sample or inferences