Week 3 (Data Types) Flashcards
What are the two main data types?
Categorical or nominal: things that can be counted
Measurement: things that can be measured
What are categorical/nominal variables?
Discrete- certain number
Label can be represented by name of number (vanilla strawberry chocolate, 1, 25 ,18)
Only valid mathematical operation is counting
Ordinal scales?
Discrete- certain number Inherent order (ranks) Some information about quantity Steps may not be equal Movement along the scale indicates a change in amount but doesn’t indicate how much change Can’t calculate means etc
Interval scales?
Order and equal intervals
Continuous
Mathematical operations - addition and subtraction
No true 0
0 doesn’t mean absence
Ratio scales?
Order Equal intervals True 0 = absence Physical quantities (mass, length, time) Can calculate ratios of different values
Continuous variables?
Theoretically infinite resolution between minimum and maximum
Can be converted to discrete variables but not view versa - conversion causes loss of information/precision
A construct can be continuous but the method of quantifying it may be discrete
What are the 3 measures of central tendency?
Mean
Median
Mode
What is the mode?
Only used for categorical data
Most commonly occurring value in a set
Sample can have more than one mode
Bimodal: two modal values
Multimodal: more than two modal values
If there are no values that occur more than once, there isn’t a mode for the data set
What is the median?
All scores ordered in increasing value and the middle score is the median.
Same number of observations below and above the median
Odd sample size- middle score
Even sample size - average elf two middle points
What is the mean?
Most commonly used
Add all values and divide by total number
Value around which scores are distributed
Won’t over or under estimate
Isn’t biased
What happens with extreme values?
If the outlier isn’t obvious, need to be careful about discarding
Median is unaffected by end points whereas mean uses all of the data and therefore in some cases the mean is not representative
Outliers can sometimes be seen using visual inspection
What measures are used to measure the spread of dispersion?
Range
Interquartile range
Sample standard deviation
What is range?
Maximum - minimum
Depended entirely on two extreme scores - if either are outliers, the range overestimates variability in the data
Range increases as sample size increases and this is because the bigger the sample size, the more opportunities there are to get extreme variables. Large samples allow for a good look and feel for extremes
Explain quartiles?
Group the data into 4 ordered, equal groups
Q1 lower quartile: 25% are below, 75% are above
Q3 upper quartile: 75% below, 25% above
Interquartile range: difference between Q1 and Q3
How much spread in the middle 50% of scores
the bigger the IQR = bigger dispersion
Variance and standard deviation?
Variance is roughly the average of the squared differences of the mean
Calculate how far away each score is from the mean- some are below and some are above