Chapter 2 Flashcards
What’s a variable?
characteristic of a person or thing that can be assigner a number or category
What is a categorical variable?
-no obvious order
-blood type, gender, colors
What is a numeric variable?
can be ordered
What is a discrete numeric variable?
-no fractions, whole numbers
-number of children, length of DNA sequence n basepairs
-number of classes
What is a continuous numeric variable?
-does have fractions
-weight of a baby, cholesterol concentration in blood sample, height
What is an observational unit?
-sometimes we sample n persons or things and collect multiple variables for each
-so the sample is the observational unit
How can a frequency distribution be displayed?
a table or even a bar chart
When making figures and comparing multiple figures what should you do?
-always label the axes, check the axes
What is relative frequency?
count divided by the sample size
CDC versus NYT figures
-CDC shows a smooth transition time wise and this figure only shows two age groups while the NYT shows a range of ages
Dotplot example
Histogram example
What is the area of one or several bars proportional to in a histogram?
the corresponding frequency
What decision do we have to make with continuous numeric variables?
how to group the data
What are the characteristics of a bell-shaped curve? (Gaussian or normal)
symmetric and unimodal
What does a bimodal figure look like?
e.g. male and female height cause two modes
What does an asymmetric graph that is skewed to the right look like?
What does an asymmetric graph skewed to the left?
What does an exponential figure look like and what is an example?
e.g. wait times
What is a statistic?
-a numeric measure calculated from sample data
What is the median?
-a measure of center and is the value that most nearly lies in the middle of the sample
What is the mean?
average
What does it mean if a statistic is robust and are the mean and median robust?
-relatively unaffected by changes in a small portion of the data
-median is unchanged meaning it is robust
-mean changes so it is not robust
What is another measure of center?
trimmed mean
What are the characteristics of a box plot?
-the median splits the distribution into two parts (upper and lower)
-the quartile splits each of these parts in half
-the first quartile Q1 splits the lower, and
-the third quartile Q3 splits the upper
What is the interquartile range? (IQR)
the difference between the third and first quartiles
Boxplot (with no outliers) example
What does the boxplot quickly show?
center, spread of total distribution, spread of middle 50% distribution, and skewness
What is an outlier?
-any data point lower than the lower fence or higher than the upper fence is an outlier
-could be a mistake in measurement or in the experiment
What is the lower fence?
Q1 - (1.5 x IQR)
What is the upper fence?
Q3 + (1.5 X IQR)
How far do whiskers extend?
only to the smallest and largest data points that are not outliers
How are outliers identified in a boxplot?
dots (or other symbols)
Why treat outliers differently?
not representative could be error
Violin plot (combine boxplot and histogram) example
How can you consider the relationship between multiple variables (multivariate data)?
stacked bar charts
Stacked relative frequency charts example
Side-by-side jittered dotplots example
Side-by-side boxplots
Scatterplot example
What are some examples of measures of center?
median, mean, trimmed mean
What are some measures of dispersion?
Range, IQR, Sample Standard Deviation
Is the range robust?
No
Is the IQR robust?
more robust than the range there is a slight shift though
In a sample standard deviation what does the sum of the deviations equal and what does the average of the deviations equal?
zero
What is the formula for the sample standard deviation?
What is the unit fo the sample standard deviation?
the same units as the observations
What is the sample variance?
s^2
Is the standard deviation robust?
no because it depends on the mean
For normal distributions what percent observation are within +-1 SD of the mean?
68%
For normal distributions what percent observation are within +-2 SD of the mean?
95%
For normal distributions what percent observation are within +-3 SD of the mean?
99.7%
How does a linear transformation affect the median?
it doesn’t change the order of the data
-if we multiply a number then multiply the median by that number
-if we add a number add that number to the median
How does a linear transform affect Q1 and Q3?
same as the median
-if we multiply a number then multiply Q1 and Q3 by that number
-if we add a number add that number to the Q1 and Q3
How does a linear transform affect the IQR?
-if we add a number then no change
-if we multiply a number then there is a change
How does linear transformation affect the mean?
-same as the median
-scale it according to the transformation meaning if you add then add that number to the mean and if you multiply then multiply the mean by that number as well
How does a linear transform affect the SD?
-same as IQR
-adding and subtracting does not affect it but multiplying and dividing does affect it
What is the coefficient of variation and what is it a measure of?
SD/mean; measure of dispersion
How does scaling work in regards to nonlinear transformation like log?
-no scaling have to take the log of all the data and recalculate mean, median, and STDEV
What is another name for the sample value?
statistic
What is another name for the population value?
parameter
What is the sample value and population value for a proportion?
sample value = p hat p^ (^ over the p)
population value = p
What is the sample value and population value for a mean?
sample value = y bar y- (- over y)
population value = µ
What is the sample value and population value for a standard deviation?
sample value = s
population value = σ
What is statistical inference?
Drawing conclusions on a population based on observations from a sample