Lecture 2 - Intro to Stats Flashcards

1
Q

What are the 3 common scales of measurement for variables in medicine?

A
  • Nominal
  • Ordinal
  • Numerical
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Nominal data

A
  • Simplest - data fits in categories (no actual order)
  • Often dichotomous of binary (yes/no or male/female)
  • Could be multiple categories like blood groups
  • We can just describe it - no way to rank it
  • Just use proportion or percentages
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are nominal data also called?

A
  • Qualitative Observations

- Categorical Observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe Ordinal data

A
  • Inherent order to the categories (ex. Cancer staging 0-4)
  • Summary statistic = median
  • Difference between 2 adjacent categories is not the same throughout the scale
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Numerical data

A
  • Difference have meaning on numerical scale

- Also called quantitative observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two types of numerical scales?

A
  • Continuous scale - has a value on a continuum (ex. age)

- Discrete scale - values are integers (# of fractures, # of medications)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What summary statistics do you use for numerical data?

A

mean and SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What type of data:
Nominal, ordinal, or continuous ?

Name

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of data:
Nominal, ordinal, or continuous ?

Hair color

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What type of data:
Nominal, ordinal, or continuous ?

Eye color

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What type of data:
Nominal, ordinal, or continuous ?

Height

A

continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What type of data:
Nominal, ordinal, or continuous ?

Age

A

continuous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What type of data:
Nominal, ordinal, or continuous ?

Gender

A

nominal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 3 “Measures of Middle”?

A
  • Mean
  • Median
  • Mode
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the mean?

A
  • it’s the average yo

- used with numerical variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the median?

A

The median is the middle observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the mode?

A

The mode is the value that occurs most frequently

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Can data have more than 1 mode ?

A

bimodal distribution

ex. some diseases have 2 peaks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

If the data is not skewed, you can use ____ and ___.

A

mean and SD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

If the data is skewed, you should use ____ and ___.

A

median and IQR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Negatively skewed is ____ skewed (outlying small values)

A

left

22
Q

Positively skewed is ____ skewed (outlying values are large)

A

right

23
Q

How do you know if something is right/positively skewed?

A

Mean > Median

24
Q

How do you know if something is left/negatively skewed?

A

Mean < Median

25
Q

Use mean if data is _____

A

symmetric

26
Q

Use _____ for ordinal data or numerical data that is skewed

A

median

27
Q

What are some measures of spread?

A
  • Range
  • Standard deviation/variance
  • Coefficient of variation
  • Percentiles
  • Interquartile range
28
Q

What is the range?

A

difference between smallest and largest values

29
Q

How is variance related to standard deviation?

A

Variance is the statistic before the square root is taken

30
Q

What is the coefficient of variation?

A

Measure of relative spread

CoV = SD/mean x 100

31
Q

What is a percentile?

A

It is the percentage of a distribution that is equal to or below a particular number

(median = 50th percentile)

32
Q

What is IQR?

A

interquartile range

IQR = Q3 - Q1

33
Q

What do you use SD with?

A

mean (with symmetrical data)

34
Q

What do you use percentiles and IQR with?

A

median for ordinal data or skewed numerical data

35
Q

List 4 ways we can express numerical data

A
  • Stem and leaf plots
  • Five number summary
  • Boxplots
  • Grouped Frequency Tables
36
Q

Why are stem and leaf plots useful?

A
  • get some idea about the centrality

- helps to see if it’s skewed or not

37
Q

What is a 5 number summary and why is a 5 number summary useful?

A
  • Min
  • Q1
  • Median
  • Q3
  • Max

*Helps to show the location and spread of the data

38
Q

What is the formula for finding percentile that he gave us?

A

p(n+1)

So say you’re trying to find the 25th percentile out of 16 numbers, you would do:

(0.25)(17) = 4.25

You would round down and choose the 4th number.

39
Q

Describe a box and whisker plot

A
  • Upper and lower hinges of box are the Q1 and Q3

- Median is inside the box

40
Q

Describe how symmetry can be interpreted from a box and whisker plot ?

A
  • Hinges equidistant from median means that the data is symmetrical
  • If upper hinge is further away from the median, data are positively skewed
  • If lower hinge is further away, data are negatively skewed
41
Q

What do the whiskers represent?

A

the largest/smallest non-outlying values

42
Q

What are outliers identified with in a box and whisker plot?

A

asterisk

43
Q

What is the boundary for outliers?

A

(1.5)(IQR) + Q3

44
Q

Describe grouped frequency tables

A
  • Group observations on variable - into contiguous, non-overlapping (preferably equal) class intervals (bins)
  • Place each observation into only one bin
  • Tabulate frequency of observations in each bin
  • Can calculate relative frequency - proportion or percentage
  • Can also tabulate cumulate frequency and cumulative relative frequencies
45
Q

Grouped frequency tables:

What does k represent?

A

how many bins

46
Q

Grouped frequency tables:

What does w represent?

A

how wide

47
Q

Grouped frequency tables:

What is the formula for determining the # of bins (k)?

A
K = the # of bins
n = the sample size

k = 1 + 3.322 x log10(n)

48
Q

Grouped frequency tables:

What is the formula for determining the width (w) ?

A
w = width of bind
k = # of bins
R = range

w = R/k

49
Q

How is a frequency polygon created?

A

by linking the mid-points of successive bins

50
Q

How do you work backwards on a frequency polygon to find the mean?

A

Mean = Sum (f*mid)/ Sum (f)

*not on formula sheet

51
Q

How do you find the median from a frequency polygon?

A

Go to 50% and look over to see where it hits the line

52
Q

How does sample size and population affect the probability distribution?

A

As sample size gets bigger and width decreases, the underlying distribution becomes clearer and you get a more smooth curve