Unit 1 Flashcards
what is the study of statistics?
is the set of methods for obtaining, organizing, summarizing, presenting and analyzing data
what is data?
a set of measurements or observations taken on a group of objects
ex. the people taking the survey
what is a population?
the totality of individuals or units about which we want information
ex. all people in manitoba taking the survey
what is a variable?
a characteristic or property of an individual or unit
what are some examples of variables?
hair color
height
your grade in this course
marital status
what is a sample?
a subset of units in a population that we examine in order to gather information about the population
what does categorical ordinal mean?
data follows a natural order and the order makes sense
what are examples of categorical nominal?
gender (female or male)
marital status (married, widowed, divorced)
what are quantitative variables?
have values that are a count or are obtained by measurement
it makes sense to take the average
what are examples of quantitative variables?
distance ran in 45 mins
measurement in cm of all the students in this classroom
square footage of your house
what does the distribution of a variable tell us?
what values it takes and how often it takes on these values
what type of charts can we use with categorical variables?
bar charts
pie charts
what type of charts can we use with quantitative variables?
histograms
timeplots
what is the difference between a bar chart and a histogram?
in a bar chart the bars don’t touch
in a histogram the bars do touch
what do pie charts give us a visual representation of?
relative frequency
proportion of the observed values
what does a frequency distribution table look like
31 37 40 44 49 50 51 53 56 56
62 64 67 67 68 68 69 70 71 72
73 73 74 75 77 78 78 81 82 84
what are the two types of quantitative variables?
continuous variable
discrete variable
what values can a continuous variable take?
any value within a given range
ex. weight and distance
what values can a discrete variable take?
only take a countable number of values
ex. number of children in a family and the number of days of rain in a month
what do we look for in respect to histograms? (5)
shape
any gaps
peaks (center)
spread (how variable the values of the data are)
outliers (observations that fall away from the overall pattern)
what are the 3 types of shapes a histogram can have?
approximately symmetric
skewed to the left
skewed to the right
why would we use a timeplot?
if we gather data that comes to us in a sequence over a period of time
what is a trend?
a time series with a persistent long term rise or fall
what is seasonal variation?
a pattern that repeats itself at certain intervals
what do we use to measure the centre of our data?
we use a measure of central tendency
what are the two measures of central tendency?
mean
median
mean
average value
median
in a set of ordered data the median is the value that splits the data into two equal parts
what is the sum of deviations always equal to?
0
what is an outlier?
a point that falls far away from the majority of the data
is the median robust (resistant) or not robust (not resistant) to outliers?
median is robust to outliers (not affected)
mean is affected by outliers
when does symmetric distribution occur?
when the mean and median are equal
(exactly symmetric is ideal)
when does skewed to the left occur?
when the mean is less than the median
when does skewed to the right occur?
when the mean is greater than the median
how do we calculate the measure of spread?
by using the range
what is the range?
R
a measure of spread and is simply calculated as maximum-minimum
what are characteristics about range? (3)
the larger the value of R the more variable the data are
R measures the length of the interval containing 100% of the data
Range is affected by outliers
what is IQR (interquartile range)?
Q3 - Q1
when do we use the five number summary?
when describing our distributions with numbers
what does the five number summary consist of?
minimum
first quartile (Q1)
median
third quartile (Q3)
maximum
what does a boxplot consist of?
rectangle that is formed by using the quartiles and whiskers extending from the rectangle to the maximum and minimum values
what is the standard deviation?
written as s
measure of spread around the mean
what is the variance?
written as s^2
the square of the standard deviation
what is degrees of freedom?
n-1 in the denominator
when do we use deviation as a measure of spread?
when x̄ is the measure of centre
when does the standard deviation equal zero?
when there is no spread about the mean
is standard deviation affected by outliers?
yes!