Problem 2 Flashcards
label
= a specific variable used in some data sets to distinguish the different cases
variable
= a characteristic of a case
- different cases can have different values of variables
categorical variable
= places a case into one of several groups/ categories
- description of categories
- -> numbers that have no numerical value
categorical nominal
= equal categories (e.g. gender (male/ female))
- -> `naming´
- special case: dichotomous (only two categories)
categorical ordinal
= ordered categories (e.g. education level (low/ average/ high)
–> òrdering´
quantitative variable
= takes numerical values for which arithmetic operations such as adding and averaging make sense
quantitative interval
= meaningful numbers (e.g. IQ 50-150)
- because distance between consecutive units is always equally (in principle)
- -> `distance´
- numbers as we know them, whit the same intervals (between them)
quantitative ratio
= an interval variable within an absolute zero point (e.g. age)
- distinguishing ratio from interval isn’t crucially important
- -> `rate´
case
= an object that is described by the data
exam a distribution
= look for overall pattern like the shape, the spread, the center
–> if distribution is not symmetric, it’s tighter right skewed (positive) or left skewed (negative
5-number summary
- interquartal range (IQR)
- identify possible outliers (1.5xIQR rule)
asked for:
minimum, maximum; median; Q1, Q3
–> used to make a boxplot
- = distance between Q1 and Q2 (IQR = Q3 - Q1)
- = every value that is not in the rage of Q1-1.5xIQR and Q3-1.5xIQR is an outlier
measures of centre:
- mode
- median
- (arithmetic) mean
- = value with highest frequency
- -> as measure of centre not very informative; useful to see existence of subgroups - = value for which it holds 50% for all scores above it and 50% below (middle value)
- -> spot of median = n+1/2 - centre of gravity of a distribution (average value)
- -> X= sum of xi /n
- z-scores
2. standard deviation
1. = in order to compare different distributions with each other & have a linked component z = x-x̄/Sx
- = standard Abweichung (how much the data differs from the mean value)
- descriptive statistics
2. inferential statistics
- = summary description of data by means of tables, graphs and characteristic measures
- = conclusions about population based on limited number of elements (= sample) from that population