MODULE 3 - DESCRIPTIVE STATISTICS Flashcards
What is a variable?
is any measurable characteristic of an observation unit
3 pieces of information a variable contains
- what the variable represents
- the measurement unit
- a description of the observation unit
what are numerical variables?
those where the data is numeric
what are categorical variables?
those where the data is a qualitative description
what are continuous numerical variables?
a variable that can take on continuous numbers
continuous numbers are those that can take on any value including fractional numbers
eg. your weight is a continuous numerical value because it can be portions of a kilogram (e.g., 104.23 kg)
what are discrete numerical variables?
a variable that can take only take on whole numbers (integers)
eg. if you are counting the number of patients that arrive at the emergency room each day, you can only have integer values (e.g., 28 people)
what are ordinal categorical variables?
a variable that can take on qualitative values but where values are from a ranked scale
eg. using emojis to describe how you are feeling today
what are nominal categorical variables?
a variable that can take on qualitative values but where values do not have any particular order
eg. food
what is the data type for describing age?
continuous numerical
what is the data type for the description: child, teenager, adult?
ordinal categorical
what is the data type for the number of students in a class?
discrete numerical
what is the data type for the letter grade on your exam?
ordinal categorical
what is the data type for the percentage grade on your exam?
continuous numerical
what is a count?
the number of sampling units in each category, and proportions are the share of the total sampling units in each category
what are proportions?
the share of observations in your sample that fall into each category
what is a range?
the difference between the maximum and minimum values for numerical variables, or the difference between the maximum and minimum number of counts for categorical variables
what is the mean?
the average value
what is a variance?
a measure of the amount of variation in your sample
how do you calculate variance?
- Calculate the mean for a sample
- Calculate the difference between each data point and the mean, then square that value
- Sum the squares of the differences and divide by the number of observations/data points
what is standard deviation?
the square root of variance
what is a quartile?
one quarter of your sample when the values are ranked from lowest to highest
how to calculate quartiles?
- sort data from lowest to highest value
- find the 2nd quartile by splitting the data in half according to whether:
- the sample has an odd number of observations, in which case the middle value of the dataset is the second quartile
- the sample has an even number of observations, in which case the average of the two values closest to the middle is the second quartile
- find the 1st quartile by creating a subset of the data that is the lower-valued half of the observations, then use the rules in step 2 to find the middle value. The lower-valued subset is created according to whether
- the sample has an odd number of observations, in which case the lower-valued subset is all values less than or equal to the second quartile. The subset includes the second quartile
- The sample has an even number of observations, in which case the lower-valued subset is all values less than the second quartile. The subset does not include the second quartile
- find the 3rd quartile by repeating step 3 but for the upper-valued half of the observation
what is the central quartile?
the median
what is dispersion?
describes how much variation there is in a sample
what is the interquartile range (IQR)?
the range between the 1st and 3rd quartiles
how to calculate the IQR?
subtract the 1st quartile from the 3rd quartile
pros and cons to quartiles
pros:
- The median and interquartile range are relatively robust to extreme values
cons:
- The median and interquartile range
become quite variable for samples with a small number of observations
pros and cons to using mean
pros:
- The mean and standard deviation
are more robust when there is a small number of observations in the sample
cons:
- The downside to the mean and standard deviation
is that they are sensitive to extreme values
Calculate the mean & median of the following data:
7.5 9.9 8.6 10.3 8.5 9.4 15.1
mean: 9.9
median: 9.4
Would the mean or median be a better descriptor of the ‘middle’ value for this set of data?
7.5 9.9 8.6 10.3 8.5 9.4 15.1
median
Calculate the population variance & interquartile range (IQR) of the following data:
7.5 8.6 8.9 8.5 9.4 10.7 15.1
variance: 5.5
IQR: 1.5
Calculate the interquartile range (IQR) for the following set of numbers and indicate what range the answer lies within.
10.1, 18.6, 19.8, 15.7, 21.9, 12.9, 11.8, 26.0, 13.0, 12.9
5 < ANSWER < 7
Calculate the interquartile range (IQR) for the following set of data and indicate what range the answer lies within.
46.7, 18.7, 39.4, 7.2, 19.8, 42.1, 2.6, 17.1, 30.7, 21.9
19 < ANSWER < 23
what is effect size?
the change in mean value of the response variable among groups
2 ways to calculate effect size
- difference
- ratio
difference calculations
the differences in mean values among groups
ratio calculations
the ratio of mean values among groups
The rate of home ownership in Canada decreased from 46% in 2004 to 44% in 2011. What is the effect size as a difference between the years?
-2%
true or false: relative effect sizes have no units
true
In the United Kingdom, 56% of older adults (55+ years) get their news from the television whereas only 12% of youth (18-24 years) do. What is the relative effect size of youth compared to older adults?
4.7