Week 1 - Intro to Data Flashcards
what is a variable
something whose value can vary
numerical variable
a numerical variable takes values that are associated with numbers.
it can be split into to sub topics:
- discrete
- continuous
discrete numerical variable
these measure counts, like the number of objects in a collection
continuous numerical variable
these values represent measurable amounts like height or weight, where there is no definite value
categorical variables
variables that take values not associated with actual numbers, they summarize qualitative information rather than quantitative
this can be split into sub sections:
- nominal
- ordinal
nominal categorical variable
when there is no natural order to the different categories ie. eye colors, genders, type of pets
ordinal categorical variable
when there is a natural order to the different categories ie education level, military ranking, degree classification
what is sampling
a subset of a population is called a sample. it allows us to make an educated guess on the rest of the population
what is statistical inference
a technique that allows us to make generalizations about an entire population based on only a sample from that population.
representative sample
one that accurately reflects the relevant features of the larger population. if a sample isn’t representative of the wider population then it is biased
non response bias
when study participants are voluntarily participating. it is possible that the very fact they volunteer to participate makes them systematically different to those who don’t volunteer.
the idea is that the people with the strongest beliefs are the most likely to respond
selection bias
when the sample participants are chosen unfairly. ie a phone survey limits the respondents to people who can afford phones
simple random sample
when people are chosen at random from the population to participate
explanatory variable
the variable that affects the other
ie smoking
response variable
the result of the explanatory variable
ie cancer