Lecture 4 Flashcards
Summarizing Data Collected in the Sample
What are the 4 types of variables?
- dichotomous
- ordinal
- categorical
- continuous
What are the two pieces of information in every frequency distribution table for any type of variable?
frequency and relative frequency
How do you calculate relative frequency (%)?
RF = (frequency / total # of responses) x 100%
What two additional pieces of data is calculated for ordinal variables?
cumulative and cumulative relative frequency
What are the two types of measurements of continuous variables?
central tendencies and variability
What are the three central tendency measurements?
mean, median, and mode
What are the four variability measurements?
range, IQR, variance, and standard deviation (SD)
If data is missing, is that individual still counted in the total # of persons?
No, only participants who provided a value are accounted for in the denominator
What is mean denoted as?
x-bar
What are the advantages to median?
- not impacted by outliers
- good index of what is “typical” if distribution is skewed
- some categorical data can have the median applied
- good for ordinal data
What is the main disadvantage of median?
does NOT take the data values themself into account (only a position indicator)
What are the advantages of mode?
- can be applied to any variable type (even nominal)
- reflects and actual value from the data (easy to understand)
- useful when there are multiple common values
What are the disadvantages to mode?
- ignores majority of other info regrading distribution
- tends to vary between samples
- some samples do not have a mode
In a normal distribution, where are all the central tendencies found?
in the center of the bell-curve
In skewed distributions, where are the central tendencies found?
positive skew (tail to right)- mode > median > mean
negative skew (tail to left) - mean < median < mode
- in both the mean is what is off centered towards the tail end
Where are the outliers found in skewed curves?
on the lower tail ends
What are the two types of variability?
heterogeneous (high variability)
homogeneous (low variability)
What percentage of the variation does range represent?
100%
What is the advantage to range?
it’s easily understood
What are the disadvantages to range?
- only dependent on 2 scores (not all info taken into account)
- sensitive to outliers
- tends to vary between samples
- influenced by sample size
What percentage of the variation does the IQR represent?
50%
What are the advantages to the IQR?
- reduced influence of outliers (only the middle 50% is looked at)
- uses more information than range
- helpful for evaluating/identifying outliers
- appropriate for ordinal measures
What are the disadvantages of the IQR?
- not easy to compute
- not well understood
- doesn’t take all values into account
What is deviance?
the spread of the scores (how different each score is from the center of a distribution)
What is the deviance equation?
deviance = xᵢ - X̄
What is variance?
indicates the total dispersion of scores from the MEAN
What is the variance equation?
variance = (sum of (devience)^2) / N - 1
What needs to be done to the variance in order to obtain the standard deviation?
square root the value
What is the interpretation of the standard deviation?
- high result = values are FAR off from the mean
- low results = values are not that far off (close to the mean)
In a normal distribution, past what point are value considered outliers?
anything beyond +/- 1.96 standard deviations
When outliers are NOT present, which central tendency and variability values are appropriate?
sample mean and the standard deviation
When outliers ARE present, which central tendency and variability values are appropriate?
median and interquartile range
What is the equation for calculating the upper and lower limits to determine mild outliers?
lower limit = Q1 - (1.5IQR)
upper limit = Q3 - (1.5IQR)
What is the equation for calculating the upper and lower limits to determine extreme outliers?
lower limit = Q1 - (3IQR)
upper limit = Q3 - (3IQR)
What type of graph is used to represent dichotomous or categorical variables?
bar charts (frequencies and relative frequencies)
What type of graph is used to represent ordinal variables?
histograms (frequencies, relative frequencies, cumulative frequencies, and cumulative relative frequencies)
What type of graph is used to represent continuous variables?
boxplot (central tendencies and variabilities/ranges)