Lecture 2: Descriptive Statistics and Data Visualization Flashcards
What are the 3 data measures in SPSS?
Scale (Ratio and Interval)
Ordinal
Nominal
What is scale measure?
used for interval and ratio variables (both considered continuous data)
What is ratio scale?
measurement scale uses the same interval between one measurement and the next, with a true 0 point.
- you can perform an arithmetic on: height, weight, blood pressure, nutrient intakes
What is interval scale?
same as ratio scale but has an arbitrary zero point
- you cannot perform an arithmetic on: temperature, time, etc.
What is ordinal data?
variable has two or more categories with an intrinsic ranking/order (temporal position, superiority(, but the difference between each value is not necessarily known.
- gender, ethnicity, etc.
What is nominal data?
variable has two or more categories without a natural order (categorical data)
- eye color, hair color, etc.
What are the 3 aspects of descriptive statistics?
- shape of the data distribution
- measures of central tendency
- dispersion (variability)
What is distribution?
function showing all the possible data values and how often they occur
What is normal distribution (Gaussian)?
useful because of central limit theorem: averages of variable independently drawn become normally distributed when the number of observations are sufficiently large.
What is a histogram?
A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable
What is a standard normal curve?
has a mean of 0 and standard deviation of 1
Why is the distribution of data important?
distribution of data can be a factor in deciding whether to use a parametric or non-parametric statistical test.
Parametric vs Non-parametric tests
Parametric test: have assumptions regarding shape of data distribution and its parameters (e.g. mean, SD)
Non-parametric test: have few/no assumptions regarding shape of data distribution or parameters
Non-normal distributions
Positively skewed (skewed to the right)
Negatively skewed (skewed to the left)
Bimodal distribution
Uniform distribution
What are the measures of central tendency?
mode
median
mean
What is mode?
the most frequent value in a dataset
- not a very good indicator of central tendency (but the only method for a dataset containing normal categorical data)
What is median?
central datum when all of the data are arranges in numerical order
- literal measure of central tendency
What is mean?
average of central tendency
- most influenced by outliers –> best used when data is normally distributed.
True/False
The mean and median are computable with categorical variables
False
Where are the mean, median, and mode located on a normally distributed data?
all three of them are in the middle of the distribution and they will overlap on the highest point of the distribution
Where are the mean, median, and mode located on a positively skewed distribution?
mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the right of the curve
the median will be in between the mode and the mean
Where are the mean, median, and mode located on a negatively skewed distribution?
mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the left of the curve
the median will be in between the mode and the mean
How to calculate the mean?
sum of all data divided by the sample size
What is the difference between population mean and sample mean?
population all McGill Students
sample 1000 McGill Students that will represent the population.
How to calculate median?
- order the values of your variable from lowest to highest
- if it is odd number of observations then the median is the number in the middle
- if it is even number of observations then the median is the 2 middle values divided by 2
What is variability?
also referred to as dispersion, spread, or scatter
What is range?
the simplest measure of dispersion (largest data values - smallest data value)
What is variance?
average of the squared differences from the mean
What is standard deviation?
degree to which individual values differ from the mean
- the square root of the variance
What are the measures of variability?
range
variance
standard deviation
standard error of the mean (SEM)
What is the standard error of the mean (SEM)?
estimate of how fat the sample mean is from the population mean
- measures how accurately the sample represents the population
- a sample mean will always deviate from the population mean. This deviation is the standard error of the mean
- Large SEM indicated inaccuracy in estimate
Describe the descriptive statistics for categorical variable
not appropriate to calculate mean or median but can calculate mode
- instead calculate proportions (percentages)
Bar chart
displays summary statistics for continuous variables according to some category(ies).
- helpful to compare means/medians across groups.
Pie chart
illustrates proportions for categorical data that total to 100
Histogram
displays the distribution of continuous data
- frequency displayed on y-axis
- helpful to evaluate shape of data distribution
- can add dimensions by stacking variables on x-axis
Boxplot
standardized format for displaying min, max and quartiles, and interquartile range (IQR)
- SPSS displays outliers
Scatterplot
displays values for two variables in a dataset (generally independent variable (x-axis) and dependent variable (y- axis)).
o Helpful when assessing correlations and linear relationships
o Can display two dependent variables using two different coloured circles.