Chapter 1, Intro to Data Flashcards
What is a summary statistic?
a single number summarizing a large amount of data
What is a proper data set called? and what makes is “proper”?
data matrix, each row corresponds to a unique case and each column corresponds to a variable.
What is the formal name for a row?
case or observational unit
What do columns represent? and what is important to know about them?
characteristics, called variables (imp to understand what each variable means, as well as units of measurement)
What are 2 types of variables?
Numerical and Categorical
What are the 2 kinds of numerical variable?
Discrete and continuous
What are the 2 kinds of categorical variable?
Ordinal and nominal
What is a discrete numerical variable?
a number value that can only be a whole number, e.g. population, since you can’t have half a person
What is a continuous numerical variable?
a number value that can be in between whole numbers, e.g. an hourly pay rate.
What is an ordinal categorical variable?
a categorical variable that involves an ordering, e.g. educational level attained
What is a nominal categorical variable?
a categorical variable that doesn’t involve an ordering, e.g. color
What are possible categorical variables called?
levels
What makes 2 variables “associated” or “dependent”?
When they show some connection with one another.
What is a scatterplot graph useful for?
Showing whether or not 2 variables are associated, as well as trends in the relationship
What is a positive correlation between 2 variables?
a relationship where if one variable increases, the other also increases or vice versa
What is a negative correlation between 2 variables
a relationship where if one variable increases, the other decreases or vice versa
What are independent variables?
variables that aren’t associated
What 2 words express whether or not one variable affects another?
an explanatory variable (might affect) a response variable
What are the 2 primary types of data collection?
observational and experimental
What makes a study observational? Any why use this method?
Research do not interfere directly with how the data arise. (Surveys, collect data from existing records, follow a cohort of similar individuals in studies of diseases). Can provide evidence of association between variables, but can’t show a causal connection. Can give rise to hypotheses to be checked using experiments.
Why use an experiment?
to investigate the possibility of a causal connection
What is a sample?
A subset of the population to be studied.
Define anecdotal evidence. Why is it a problem?
Data collected in a haphazard fashion. May not be representative of the population.
What is a non-response rate and why is it important?
Non-response rate is the rate at which people in the sample population do not respond. A high non-response rate can skew the results.