Statistics Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Categorical Data

A

“observations or variables that are classified or categorized by means of labels or other descriptive terms. In the examples listed above, the AB blood type and the 5-point scale (ranging from ‘poor’ to ‘excellent’) are categorical measures” (6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Quantitative Data

A

“observations, counts and measures made by determining quantities or by assigning number values to the data” (6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Nominal Data

A

categorial data that “fall into categories that have no inherent order (ie, the categories serve simply as names). Examples of such categories are occupation, country of birth, language spoken at home, and blood type” (6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dichotomous (Binary) Data

A

“a special kind of nominal data that fall into one of only 2 possible categories and are frequently used in health research. Examples of the categories used to describe dichotomous data are female/male, treatment/control, diseased/not diseased, and alive/dead” (6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Ordinal Data

A

“exhibit an inherent ranking (eg, from lowest to highest)” (6)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Discrete Data

A

“whole numbers; their values are presented only as integers” (7). Examples are number of teeth with cavities, number of pregnancies, number of children in a family (7)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Continuous Data

A

“include a full range of possible fractional values—that is, no matter how close any 2 values are to one another, other values always exist between them” (7). Examples are blood pressure, height, temperature, and age (7-8).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

True Zero Point

A

E.g. birth or Kelvin 0. “relative measures involving multiplication and division (eg, x is twice as big as y) can be performed only with continuous data that have a true zero point” (8)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Ratio Data

A

“Continuous data with a true zero point” (8).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Interval Data

A

Continuous data without a true zero point (8).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Descriptive Statistics

A

“provide the most basic form of data summarization and are usually the starting point of any data analysis” (13-14). E.g. bpm of 40 medical writers and editors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Graphical Displays

A

“frequently used to summarize and present data” (14)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Frequency Distribution

A

A graphical display that indicates how often each value occurs in the data set (14).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Central Tendency

A

the data’s tendency to cluster near the central value (15).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Variability/Dispersion/Spread

A

The spread of the data points (15).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

2 Needed Elements to Summarize Data

A

1) Measure of central tendency

2) Measure of variability (16).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Mean (arithmetic)

A

The average value (16)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Median

A

Middle value of a sequential set of data, or its 50th percentile, value at which half of the data points and half are lower (16)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mode

A

Most frequently occurring value in a set of data (16).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

3 Measures of Central Tendency

A

Mean, median, mode

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Range

A

Difference between lowest and highest values of a data set (17). Usually written as minimum followed by maximum value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Interquartile range (IQR)

A

“the range between the data’s first quartile (25th percentile) and third quartile (75th percentile)—the middle 50% of data values” (17). Often reported witht he median, allowing one to work out quartiles (17).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Standard Deviation (SD)

A

“the average distance between each data point and the mean value of the distribution” (18). *“The SD should be used and reported with the mean only when the data are normally distributed (or nearly so)” (18).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Standard Deviation Calculation

A
  1. Calculate the mean.
  2. Determine the distance of each data value from the mean.
  3. Square each of these calculated differences (distances) from the mean to convert the negative values to positive values….(Note: If we don’t perform this step, then when we add these 40 differences, as we will do in the next step, we will obtain a total of zero because all of the positive differences will be exactly balanced by the negative differences.)
  4. Add these 40 squared differences (this yields a number called the sum of squares) and divide the total by the number of values minus 1…to obtain the average squared distance from the mean….This number is known as the variance. (The sum of the squares is divided by n-1 rather than by n because this slight increase in the average distance provides a more accurate estimate of the variability of the data.)
  5. Finally, find the square root of the variance…to determine the standard deviation” (18-19)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

3 Measures of Variability

A

Range, Interquartile Range, Standard Deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

3 Properties of Normal/Bell-Shaped/Gaussian Distributions

A

1) Mean, median, and mode are exactly equal in value.
2) Frequency distribution is symmetrical about its center (the mean).
3) [T]he area under the curve can be precisely defined by the mean and standard deviation (SD), as follows:
a) ~68% of the data points fall within ± 1 SD of the mean,
b) ~95% of the data point fall within ± 2 SD of the mean,
c) ~99% of the data points fall within ± 3 SD of the mean” (19)

27
Q

Skewed distribution

A

Asymmetrical (non-normal/-Gaussian) distributions of data (19).

28
Q

Outlier

A

A data point far away from the center of the data distribution (20).

29
Q

3 Examples of Skewed Distributions

A

1) Laboratory values for which the clinically normal (ie, expected) range lies near zero (eg, serum bilirubin or urinary protein concentration)…because abnormal values can be quite high on one side of the distribution but can never be less than zero on the other side.
2) Birth weight
3) Number of days in a hospital stay (22).

30
Q

How to describe skewed distributions

A

Use the median and interquartile range (IQR) (22).

31
Q

Reporting consequence for most biological data not being normally distributed

A

The median and IQR should appear more frequently in medical literature than the mean and SD (22).

32
Q

2 Ways to Recognize Skewed Distributions from Mean and SD

A

1) Difference between mean and SD produces an impossible result (eg, negative tumor size)
2) Mean differs markedly from median (23-24)

33
Q

Parametric Statistical Tests

A

Those that should be applied to normally distributed data (24)

34
Q

Nonparametric Statistical Tests

A

Those that should be applied to non-normally distributed data (24)

35
Q

How to Summarize Nominal Data

A

Using simple counts or percentages of categories; depicted with dot, pie, or bar charts (25).

36
Q

How to Summarize Ordinal Data

A

As for nominal data, using simple counts or percentages of categories; depicted with dot, pie, or bar charts (25).

37
Q

How to Summarize Discrete Data

A

Because they represent numeric counts, using measures of central tendency and dispersion (26).

38
Q

Census

A

Data from every individual in a population

39
Q

Standard Error of the Mean (SEM)

A

An estimate of the variability among sample means. Calculated in order to get closer to the variability of a population than does standard deviation of a single sample (35-36).

40
Q

SEM Calculation

A

Standard deviation / [divided by] square root of sample size (36).

41
Q

Importance of SEM

A

–“[P]erhaps the greatest value of the SEM is…that it enables us to determine the proportion of sample mean values that will be expected to fall within certain portions of the normal distribution (eg, 68% of all possible sample mean values will occur within ±1 SEM of the ‘true’ [population] mean value)” (36).

42
Q

How to report precision of mean estimate w/SEM

A

Eg, “The estimated mean (SEM) heart rate of the population is 80 (0.8) bpm” (36)

43
Q

In medical sciences, the preferred way of reporting precision of estimate

A

Mean + 95% Confidence Interval (CI) (2 SEMs).

44
Q

[95%] Confidence Interval (CI)

A

Along with the mean, the preferred way in medical sciences to report the precision of an estimate. 95% CI = 2 SEMs.

45
Q

How to reduce the width of the Confidence Interval

A

Increase the sample size (38).

46
Q

Null Hypothesis

A

The hypothesis that the intervention has no effect on the data (45).

47
Q

Test Statistic

A

A description of how closely observed data matches prediction of null hypothesis. In the example, the number of SEMs between the mean heart rate difference observed and that predicted under the null hypothesis. ([Dif observed - null hypothesis]/SEM)(47).

48
Q

One Key Takeaway of Statistics Book (via example)

A

Find the observed mean, SD, and SEM; see how much the observed mean differs from the mean (often 0) in the null hypothesis; divide by 1 SEM to determine the test statistic, how much this is in SEMs; consult a statistical (often t) table that tells you probability for this SEM difference; and draw a conclusion based upon that probability (47).

49
Q

P [ital] value

A

“[T]he probability that, if the null hypothesis is true, chance alone could have produced a result as extreme as the one observed” (47). I.e., the probability that this could have happened by chance.

50
Q

Common P value

A

.05. : “As a matter of convention, researchers commonly set the P value (the level at which results will be classified as statistically significant) at .05. This means that they are willing to accept a 5% chance that they could wrongly conclude that an observed result was real when, in fact, it was due to chance” (47).

51
Q

Clinically Important

A

Demonstrating (in clinical trials, between the control and study group) a large enough difference to have a practical impact on the patient (47).

52
Q

Paired t [ital] Test

A

Performed on a “study [that] uses paired data[: for instance,] two measurements from the same person at different times” (48).

53
Q

Alpha (α) Level

A

The conventional p [ital] threshold of .05 (49).

54
Q

Common Features of Statistical Tests in Medical Science

A

All of these tests determine the difference between the observed results and the results that would be expected according to the null hypothesis; they then standardize this difference between observed and expected results by dividing it by the applicable measure of variability (eg, the SEM)” (49).

55
Q

X^2 (chi-square) Test

A

“used to assess the statistical significance of results obtained with categorical data” (49). [Mnemonic: “Chi” for “Categorical.”] Eg, a study of relapse/no relapse and percentages of individuals who fall into each camp (49).

56
Q

t [ital] Test

A

“used to determine the statistical significance of the difference between the means obtained from 2 (and only 2) independent samples” (49). [Mnemonic: “t” for “two”]

57
Q

Analysis of Variance (ANOVA) Test

A

“an extension of the t test…used to compare the means obtained from 3 or more groups” (50). N.b.: “The results of an ANOVA indicate only whether a statistically significant difference exists; they don’t indicate which group or groups are different from the others” (50).

58
Q

Parametric Statistical Tests

A

Tests involving “parameters (measurable characteristics) such as group means, SD, and variance.” The t [ital] [, paired t,] and ANOVA are parametric tests. N.b.: “researchers should use these parametric statistical tests only for data that are roughly normally distributed” (50).

59
Q

Nonparametric Statistical Tests

A

To be performed on study data that are not normally distributed (50).

60
Q

Wilcoxon Signed Rank Test

A

“[T]he nonparametric alternative to the paired t test” (50).

61
Q

Kruskal-Wallis Test

A

“[T]he nonparametric alternative to ANOVA” (50).

62
Q

Type I Error

A

“[M]istakenly concluding that there is an effect when in fact there isn’t one” (50)

63
Q

Type II Error

A

“[I]ncorrectly conclud[in]g that there is no effect, when in fact there really is one” (50).

64
Q

How to Present Statistical Significance and Clinical Importance

A

“many scientific publications expect authors to present both P values [for statistical significance] and [95%] confidence intervals [for clinical importance]” (51).