Module 3 Flashcards
What is a variable?
any measureable characteristic of an observation unit (ex. the number of facebook posts that a person might read each day). can vary among sampling units
What is the 3 pieces of information a variable contains?
- what the variable repersents
- the measurement unit
- description of observation unit
What is the datum/data?
the value of a variable that you measure from an observation unit
Difference between numerical and categorical variables?
numerical: the data is numeric (have measurement units that indicate the scale)
categorial: the data is a qualitative description (have no measurement units)
What is a continuous numerical variable?
- can take on continuous numers (any number, even fractional numbers)
What is a discrete numerical variable?
can only take on whole numbers (integers)
What is an ordinal categorical variable?
can take on qualitative values but where the values are from a ranked scale (ex. using a scale to describe how you’re feeling)
What is a nominal caregorical variable?
can take on qualitative values but where the values do not have a particular order (ex. food)
How is categorical data characterized
- counts: the number of sampling units in each category
- proportions: the share of the total sampling units in each category
Which is easier to undersand when doing descriptive statistics: counts or proportions
proportions
What do counts and proportions indciate?
the central tendency of categorical data
what is range? and what is it used to indicate?
- range is used to indicate dispersion
- describes the variation in the esponse variable
What are the two approachs used to determine descriptive statistics for numerical variables?
- means
- quartiles
What does the mean characterize?
the central tendency of a numeric variable
What is variance and how is it calculated?
- measure of amount of variation in sample
- calculated as average squared distance of each data point from sample mean (all summed) then divided by number of data points
What is standard deviation?
- square root of variance
*
What are quartiles?
- are specific values of the variable that divide your data into ranked groups
*
What is the central tendency given by?
the second quartiles, is also called the median (50% above and 50% below)
What is the interquartile range?
uses quartiles to describe dispersion in a numerical variable. It is the difference between the 3rd and 1st quartiles and gives the range of the innermost 50% of a numerical sample.
subtract 1st quartile from 3rd
What are the pros of using quartiles?
- median and interquartile range are relatively robust to extreme values (not as affected by the extremes)
cons of using quartiles?
the median and interquartile range can become variable for samples with small number of observations (more sensitive)
Pros of using means?
mean and standard deviation are more robust when small number of observations
cons of using means?
downside of mean and standard deviation is that they are sensitive to extreme values
When is it better to use quartiles?
when characterizing numerical values, as long as the number of observations is not too small
What is effect size?
the change in mean value of the response variable among groups. allows us to evaluate whether the change in the response variable is meaningful for a particular study
What is effect size calculated among?
between treatment levels (one possibility)