Intro to data science (wk 2) Flashcards
What is a population?
A complete subset of objects e.g. all UG students
What is a sample?
A subset of a given population e.g. a group of students in this module out of all UG students
What is a variable?
A variable is a set of related events that can take in more than one value
What is an independent variable?
-The IV is the variable representing the value being changed or manipulated.
-It’s controlled or selected to determine its relationship on an observed outcome
What is a dependent variable?
-The DV is the observed result of the IV being manipulated.
-It is something that (may) depend on the IV
Levels of IVs
-An IV can be composed of different categories
-These are called levels, conditions or treatments
-This is different from the number of IVs- you only belong to one level, but have multiple IVs
What is a control variable?
-Variables that are kept constant to prevent them influencing the effect of IV on DV
-Critical for study design (e.g. recruitment criteria for participants)
What is nominal data (categorical)?
-Cannot be ordered, cannot be counted e.g. country, gender, occupation
What is ordinal data?
-Can be ordered, but cannot be added or subtracted e.g. satisfaction rating, education level
What is interval data?
-Can be ordered, and their difference can be measured, but you cannot compute a ratio between two value (no meaningful zero exists) e.g. exam date
What is ratio data?
-Interval and can take ratio between two (has a meaningful zero) e.g. distance, height, annual income
What 2 data sets are qualitative and categorical
Nominal and ordinal values
What 2 data sets are quantitative and continuous
Interval and ratio
What is a histogram?
Histograms visualises how data is distributed
What are the 3 central tendencies?
Mode, median, mean