module 1 intro Flashcards
levels of measurement
categorical and scale
sometimes referred to as discreet data, where numbers are used to represent categories of information, qualitative in nature. the types are nominal and ordinal data
categorical data
a numerical measurement of something in which the difference between values has meaning. quantitative in nature. referred to as continuous data. two types are interval and ratio
scale
data that is grouped in unordered categories. only label holds value. number assigned is meaningless. can be binary or non binary
eg binary yes or no non binary blue eye green eyes brown eyes hazel eyes
nominal
data that is grouped in ordered categories, the number assigned to the category means something. calculations cannot be performed on. these numbers. referred to as ranked data
eg disagree somewhat disagree neutral somewhat agree
ordinal
a numerical measurement on a scale where each point is placed at an equal distance from one another. there is no true zero
eg temperature, 0 degrees doesn’t mean absence of heat
interval
measurement of something where the numbers are not restricted to certain values and there is a true zero
eg amount of money in pocket 0 = no money
ratio
true or false nominal data has no inherent order or ranking
true
true or false interval data has equal spacing between values but no true zero
false
true or false ordinal data can tell you how much more one value is than another
false
true or false ratio data can be used to calculate meaningful ratios example twice as much
true
identify the level of measurement
type of pet
nominal
star rating at restaurant 1-5
ordinal
temperature in celsius
interval, equal intervals but no true zero, 0 doesn’t mean absence of heat
age in years
ratio
time of day in a clock
interval, no true zero
exam scores as percentages
ratio, percent scores have a true zero 0% =0 no points
happiness level rated as unhappy neutral happy
ordinal
relative frequency =
absolute frequency/ sum of all frequencies
measures of central tendency
mean median and mode
the sum of all values divided by the number of values in the data set. denotes as x bar. also known as average
mean
most sensitive to outliers
mean
best measure of central tendency for nominal data
mode
best measure of central tendency for ordinal data
median or mean depending on category
best measure of central tendency for scale data
mean median or mode
the variability of a set of data is also referred to as ___
spread
two common ways of describing the spread of data
range and interquartile range (IQR)
range
a way of measuring the spread of data by describing the difference between the minimum and the maximum value in a data set.
range = maximum value - minimum value
interquartile range
a way of measuring the spread of data by dividing the data set into quartile as. by finding the quartiles l you can identify the range of values that way within 50% of the median of a data set.
how to calculate interquartile range
order data
calculate median
calculate median of lower half
upper half
findings are IQR= Q3-Q1
the larger the IQR value the ____
more spread out the data is
how to calculate variance
identify the distance between each data point and the mean and then averaging those distances (subtract the mean from each value)
degrees of freedom
the number of values free to vary in a data set, total observations minus 1
n - 1
standard deviation
a way to calculate variance around the mean. calculated by taking a positive square root of variance or s2. when you take the positive square root of something that is swuared it cancels out the square
how do you calculate standard deviation
it’s just square rooted variance squared
what does a high standard deviation indicate?
the data points are more spread out from the mean, a high sd means more variability data is more spread out
true or false if all the data points in the set are the same the sd cannot be calculated
false, there is is no spread if the data is the same, no spread so sd is 0
a small standard deviation means the data values are clustered closely around the mean true or false
true
absolute number
the raw numbers collected during the days acquisition process
relative number
the absolute numbers shown as a proportion or percentage
true or false variance is measured in the same unit as the original data
false, variance is in squared units not the same as data
true or false the standard deviation can be negative
false, it is always zero or positive never negative
true ir false SD is a measure of central tendency
false, it is a measure of spread
why might a public researcher want to look at standard deviation in a survey response?
to understand how much variation or spread there is in the response. even if the mean is the same between two groups, the sd tells them how consistent or inconsistent people’s answers are.
if two datasets have the same mean but different standard deviations what does that tell you
it tells you that one dataset is more spread out than the other, even though they centre around the same average. the data set with the larger Sd has more variability meaning individual data points tend to deviate more from the mean. the dataset with the smaller sd is more consistent and tightly clustered around the mean true
the
it tells you that one dataset is more spread out than the other, even though they centre around the same average. the data set with the larger Sd has more variability meaning individual data points tend to deviate more from the mean. the dataset with the smaller sd is more consistent and tightly clustered around the mean
a small sd means and large sd means what
small sd means people gsve very similar answers and large means people have a wide range of answers
why is SD important in real life
low sd means most people consistently liked it
high sd means some people loved it and some people hated it. can influence deductions such as as why there is so mu fb variability
to avoid having values canceled out when using variance on symmetrical data, use
variance squared. (s2) makes all values positive so they don’t cancel each other out