Lecture 2: Descriptive Statistics and Data Visualization Flashcards

1
Q

What are the 3 data measures in SPSS?

A

Scale (Ratio and Interval)
Ordinal
Nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is scale measure?

A

used for interval and ratio variables (both considered continuous data)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is ratio scale?

A

measurement scale uses the same interval between one measurement and the next, with a true 0 point.
- you can perform an arithmetic on: height, weight, blood pressure, nutrient intakes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is interval scale?

A

same as ratio scale but has an arbitrary zero point

- you cannot perform an arithmetic on: temperature, time, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is ordinal data?

A

variable has two or more categories with an intrinsic ranking/order (temporal position, superiority(, but the difference between each value is not necessarily known.
- gender, ethnicity, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is nominal data?

A

variable has two or more categories without a natural order (categorical data)
- eye color, hair color, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 3 aspects of descriptive statistics?

A
  • shape of the data distribution
  • measures of central tendency
  • dispersion (variability)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is distribution?

A

function showing all the possible data values and how often they occur

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is normal distribution (Gaussian)?

A

useful because of central limit theorem: averages of variable independently drawn become normally distributed when the number of observations are sufficiently large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a histogram?

A

A histogram is an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a standard normal curve?

A

has a mean of 0 and standard deviation of 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Why is the distribution of data important?

A

distribution of data can be a factor in deciding whether to use a parametric or non-parametric statistical test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Parametric vs Non-parametric tests

A

Parametric test: have assumptions regarding shape of data distribution and its parameters (e.g. mean, SD)

Non-parametric test: have few/no assumptions regarding shape of data distribution or parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Non-normal distributions

A

Positively skewed (skewed to the right)
Negatively skewed (skewed to the left)
Bimodal distribution
Uniform distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the measures of central tendency?

A

mode
median
mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is mode?

A

the most frequent value in a dataset

- not a very good indicator of central tendency (but the only method for a dataset containing normal categorical data)

17
Q

What is median?

A

central datum when all of the data are arranges in numerical order
- literal measure of central tendency

18
Q

What is mean?

A

average of central tendency

- most influenced by outliers –> best used when data is normally distributed.

19
Q

True/False

The mean and median are computable with categorical variables

A

False

20
Q

Where are the mean, median, and mode located on a normally distributed data?

A

all three of them are in the middle of the distribution and they will overlap on the highest point of the distribution

21
Q

Where are the mean, median, and mode located on a positively skewed distribution?

A

mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the right of the curve
the median will be in between the mode and the mean

22
Q

Where are the mean, median, and mode located on a negatively skewed distribution?

A

mode will fall on the highest peak of this distribution (the most frequent observation)
the mean will fall on the left of the curve
the median will be in between the mode and the mean

23
Q

How to calculate the mean?

A

sum of all data divided by the sample size

24
Q

What is the difference between population mean and sample mean?

A

population all McGill Students

sample 1000 McGill Students that will represent the population.

25
Q

How to calculate median?

A
  • order the values of your variable from lowest to highest
  • if it is odd number of observations then the median is the number in the middle
  • if it is even number of observations then the median is the 2 middle values divided by 2
26
Q

What is variability?

A

also referred to as dispersion, spread, or scatter

27
Q

What is range?

A

the simplest measure of dispersion (largest data values - smallest data value)

28
Q

What is variance?

A

average of the squared differences from the mean

29
Q

What is standard deviation?

A

degree to which individual values differ from the mean

- the square root of the variance

30
Q

What are the measures of variability?

A

range
variance
standard deviation
standard error of the mean (SEM)

31
Q

What is the standard error of the mean (SEM)?

A

estimate of how fat the sample mean is from the population mean

  • measures how accurately the sample represents the population
  • a sample mean will always deviate from the population mean. This deviation is the standard error of the mean
  • Large SEM indicated inaccuracy in estimate
32
Q

Describe the descriptive statistics for categorical variable

A

not appropriate to calculate mean or median but can calculate mode
- instead calculate proportions (percentages)

33
Q

Bar chart

A

displays summary statistics for continuous variables according to some category(ies).
- helpful to compare means/medians across groups.

34
Q

Pie chart

A

illustrates proportions for categorical data that total to 100

35
Q

Histogram

A

displays the distribution of continuous data

  • frequency displayed on y-axis
  • helpful to evaluate shape of data distribution
  • can add dimensions by stacking variables on x-axis
36
Q

Boxplot

A

standardized format for displaying min, max and quartiles, and interquartile range (IQR)
- SPSS displays outliers

37
Q

Scatterplot

A

displays values for two variables in a dataset (generally independent variable (x-axis) and dependent variable (y- axis)).
o Helpful when assessing correlations and linear relationships
o Can display two dependent variables using two different coloured circles.