Week 2 Describing and Summarizing Data Flashcards

A, B, C, D

1
Q

Difference between risk and odds?

A

Risk: Proportion of events to total sample size.
Odds: Ratio of events to non-events.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why are risk and odds confused?

A

They both describe event likelihood but are calculated differently. Risk is a proportion, odds are a ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Range of values for risk and odds?

A

Risk: 0 to 1.
Odds: 0 to infinity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does risk describe?

A

Risk is the proportion of events in the total population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does odds describe?

A

Odds is the ratio of events to non-events in the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is risk calculated?

A

Risk = Events ÷ Total sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When are odds and risk similar?

A

When events are rare (e.g., 1 death in 1000).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are risk and odds expressed?

A

Risk: Proportion/percentage.
Odds: Ratio.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Definitions of mean, median, and mode?

A

Mean: Average.
Median: Middle value.
Mode: Most frequent value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Properties of mean, median, and mode?

A

Mean: Sensitive to outliers.
Median: Not affected by outliers.
Mode: Most common value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is skewness?

A

Skewness measures the asymmetry of a data distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do outliers affect skewness?

A

Outliers can cause positive or negative skewness by stretching the data in one direction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are bar graphs, pie charts, and histograms used for?

A

Bar graphs: Categorical data.
Pie charts: Proportions.
Histograms: Frequency distributions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Effect of skewness on mean and median?

A

Positive skew: Mean > median.
Negative skew: Mean < median.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is continuous data?

A

Data with numerous possible values that can’t be summarized like dichotomous data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the mean?

A

The mean is the sum of all values divided by the number of values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the median?

A

The middle value of a data set when arranged from smallest to largest. If even, the median is the average of the two middle values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is the median calculated if the number of values is even?

A

Average the two middle values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the mode?

A

The value that appears most often in a data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How is the median calculated if the number of values is odd?

A

It is the middle value of the ordered data set.

20
Q

Can a data set have more than one mode?

A

Yes, if multiple values occur most frequently.

21
Q

When is the mode not used?

A

For strictly continuous data, like blood pressure.

22
Q

What is the most common way to summarize continuous data?

23
Q

When would the median be used over the mean?

A

When the data set is skewed or contains outliers.

24
Q

What is the main drawback of using mode for continuous data?

A

Mode is rarely reported for continuous data due to its limited use.

25
Q

How are mean and median related?

A

Both provide central tendencies but are calculated differently: the mean is the average, and the median is the middle value.

26
Q

What does “spread” refer to in statistics?

A

Spread refers to how data is distributed around a central tendency (like the median or mean), showing the variability or dispersion of the data.

27
Q

What is the range of a data set?

A

The range is the difference between the maximum and minimum values in a data set.

28
Q

What is the Inter-Quartile Range (IQR)?

A

The IQR is the difference between the 75th percentile (Q3) and the 25th percentile (Q1) of a data set.

29
Q

How is the IQR different from the range?

A

Unlike the range, which uses the minimum and maximum values, the IQR uses the 25th (Q1) and 75th (Q3) percentiles to measure spread.

30
Q

What is variance in statistics?

A

Variance measures the average squared difference between each data point and the mean of the data set.

31
Q

How is standard deviation related to variance?

A

Standard deviation is the square root of the variance. It gives a measure of spread in the same units as the original data.

32
Q

What does a larger standard deviation indicate about a data set?

A

A larger standard deviation indicates that the data is more spread out and has greater variability.

33
Q

What are the three main measures of spread discussed in this content?

A

Range, Inter-Quartile Range (IQR), and Standard Deviation.

34
Q

How is the standard deviation different from the IQR and range?

A

Standard deviation is related to the mean and provides an average of how much data deviates from it, while the IQR and range are based on the median and describe spread using percentiles or extremes.

35
Q

What are the steps to calculate the standard deviation?

A

Find the mean of the data set.
Subtract the mean from each data point to get the deviation of each value.
Square each deviation.
Sum all the squared deviations.
Divide the sum of squared deviations by
the number of data points minus 1, to find the variance.
Take the square root of the variance to get the standard deviation.

36
Q

What are some types of graphical displays used for categorical data?

A

Bar graphs and pie charts.

37
Q

What are some types of graphical displays used for continuous data?

A

Dot plots, histograms, and box plots (box-and-whisker plots).

38
Q

What does a box-and-whisker plot display?

A

A box-and-whisker plot displays the distribution of data, with the top and bottom of the box representing the first (Q1) and third (Q3) quartiles, the middle line representing the median, whiskers extending out to 1.5 times the interquartile range (IQR) or the minimum/maximum values, and outliers placed individually.

39
Q

What does skewness refer to in a data set?

A

Skewness refers to the asymmetry in the distribution of data around its mean.

40
Q

How is data described if it is “skewed to the right” (positively skewed)?

A

Data is “skewed to the right” if the majority of the data points are smaller than the mean, with more outliers on the right. This is more common in medicine.

41
Q

How is data described if it is “skewed to the left” (negatively skewed)?

A

Data is “skewed to the left” if the majority of the data points are larger than the mean, with more outliers on the left. This is less common.

42
Q

What does symmetric data imply about the relationship between the mean and median?

A

Symmetric data implies that the mean equals the median.

43
Q

What happens to the mean and median in a right-skewed data set?

A

In a right-skewed data set, the mean is usually greater than the median.

44
Q

What happens to the mean and median in a left-skewed data set?

A

In a left-skewed data set, the mean is usually less than the median.

45
Q

What is the benefit of using graphical representations of data?

A

Graphical representations can often convey information more effectively than summary statistics, providing a clearer understanding of data patterns.

46
Q

How can skewness often be spotted in data?

A

Skewness can usually be spotted in histograms or box plots.

47
Q

What are the key characteristics represented in a box-and-whisker plot?

A

Key characteristics include the median, quartiles (Q1, Q3), interquartile range (IQR), whiskers, and outliers.