Data basics Flashcards
Case or Observational Unit
Each row in a dataset represents a single instance or entity being studied, called a case or observational unit. For example, in a loan table, each row represents a single loan.
Variables
Characteristics or attributes of each case or observational unit, represented as columns in a dataset. For instance, in a loan table, variables include loan amount, interest rate, borrower location, and income.
Data Matrix
A structured way to organize data, often used in spreadsheets, where each row corresponds to a unique case (observational unit) and each column corresponds to a variable.
Numerical
A variable that can take a wide range of numerical values and allows meaningful mathematical operations such as addition, subtraction, or averaging. For example, unemployment rate or population count.
Discrete
A type of numerical variable that can only take whole numbers or values with jumps, such as population count (0, 1, 2, …).
Continuous
A type of numerical variable that can take any value within a range, allowing for decimals and fractions, such as the unemployment rate.
Categorical
A variable where responses fall into distinct categories or groups. For example, the variable “state” with values such as AL, AK, or WY.
Level
The possible values a categorical variable can take. For instance, the variable “state” has levels like AL, AK, WY, etc.
Ordinal
A categorical variable with levels that have a natural ordering. For example, median education level (below HS, HS diploma, some college, bachelor’s).
Nominal
A categorical variable with levels that do not have any inherent order. For example, telephone area codes or state names.
Explanatory Variable
A variable that is believed to influence or explain changes in another variable. For example, median household income in a study examining its effect on population change.
Response Variable
A variable that is affected or influenced by the explanatory variable. For example, population change in a study investigating the impact of median household income.
Observational Study
A study where researchers collect data without interfering with how the data arise, such as through surveys, reviewing records, or tracking a cohort. It can show associations but not causation.
Cohort
A group of similar individuals followed over time in a study, often to observe the development of certain outcomes, such as diseases.
Experiment
A study where researchers actively intervene to investigate a possible causal connection by assigning treatments to study participants and observing outcomes.
Randomized Experiment
An experiment where participants are randomly assigned to treatment groups, ensuring unbiased distribution of characteristics across groups.
Placebo
A fake treatment used in experiments to control for the psychological effects of receiving a treatment, helping to isolate the true effects of the actual treatment.
Data
Observations collected from sources such as field notes, surveys, and experiments, forming the foundation of a statistical investigation.
Statistic
The study of how to best collect, analyze, and draw conclusions from data.
Sample
A subset of the population, often a small fraction, used to represent and draw conclusions about the whole population.