Lecture 2: Descriptive Statistics and Data Visualization Flashcards
What are the 3 data measures in SPSS?
Scale (Ratio and Interval)
Ordinal
Nominal
What is scale measure?
used for interval and ratio variables (both considered continuous data)
What is ratio scale?
measurement scale uses the same interval between one measurement and the next, with a true 0 point.
- you can perform an arithmetic on: height, weight, blood pressure, nutrient intakes
What is interval scale?
same as ratio scale but has an arbitrary zero point
- you cannot perform an arithmetic on: temperature, time, etc.
What is ordinal data?
variable has two or more categories with an intrinsic ranking/order (temporal position, superiority(, but the difference between each value is not necessarily known.
- gender, ethnicity, etc.
What is nominal data?
variable has two or more categories without a natural order (categorical data)
- eye color, hair color, etc.
What are the 3 aspects of descriptive statistics?
- shape of the data distribution
- measures of central tendency
- dispersion (variability)
What is distribution?
function showing all the possible data values and how often they occur
What is normal distribution (Gaussian)?
useful because of central limit theorem: averages of variable independently drawn become normally distributed when the number of observations are sufficiently large.
What is a histogram?
A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable
What is a standard normal curve?
has a mean of 0 and standard deviation of 1
Why is the distribution of data important?
distribution of data can be a factor in deciding whether to use a parametric or non-parametric statistical test.
Parametric vs Non-parametric tests
Parametric test: have assumptions regarding shape of data distribution and its parameters (e.g. mean, SD)
Non-parametric test: have few/no assumptions regarding shape of data distribution or parameters
Non-normal distributions
Positively skewed (skewed to the right)
Negatively skewed (skewed to the left)
Bimodal distribution
Uniform distribution
What are the measures of central tendency?
mode
median
mean
What is mode?
the most frequent value in a dataset
- not a very good indicator of central tendency (but the only method for a dataset containing normal categorical data)
What is median?
central datum when all of the data are arranges in numerical order
- literal measure of central tendency
What is mean?
average of central tendency
- most influenced by outliers –> best used when data is normally distributed.
True/False
The mean and median are computable with categorical variables
False
Where are the mean, median, and mode located on a normally distributed data?
all three of them are in the middle of the distribution and they will overlap on the highest point of the distribution
Where are the mean, median, and mode located on a positively skewed distribution?
mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the right of the curve
the median will be in between the mode and the mean
Where are the mean, median, and mode located on a negatively skewed distribution?
mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the left of the curve
the median will be in between the mode and the mean
How to calculate the mean?
sum of all data divided by the sample size
What is the difference between population mean and sample mean?
population all McGill Students
sample 1000 McGill Students that will represent the population.