Statistics Flashcards
Categorical Data
“observations or variables that are classified or categorized by means of labels or other descriptive terms. In the examples listed above, the AB blood type and the 5-point scale (ranging from ‘poor’ to ‘excellent’) are categorical measures” (6)
Quantitative Data
“observations, counts and measures made by determining quantities or by assigning number values to the data” (6)
Nominal Data
categorial data that “fall into categories that have no inherent order (ie, the categories serve simply as names). Examples of such categories are occupation, country of birth, language spoken at home, and blood type” (6)
Dichotomous (Binary) Data
“a special kind of nominal data that fall into one of only 2 possible categories and are frequently used in health research. Examples of the categories used to describe dichotomous data are female/male, treatment/control, diseased/not diseased, and alive/dead” (6)
Ordinal Data
“exhibit an inherent ranking (eg, from lowest to highest)” (6)
Discrete Data
“whole numbers; their values are presented only as integers” (7). Examples are number of teeth with cavities, number of pregnancies, number of children in a family (7)
Continuous Data
“include a full range of possible fractional values—that is, no matter how close any 2 values are to one another, other values always exist between them” (7). Examples are blood pressure, height, temperature, and age (7-8).
True Zero Point
E.g. birth or Kelvin 0. “relative measures involving multiplication and division (eg, x is twice as big as y) can be performed only with continuous data that have a true zero point” (8)
Ratio Data
“Continuous data with a true zero point” (8).
Interval Data
Continuous data without a true zero point (8).
Descriptive Statistics
“provide the most basic form of data summarization and are usually the starting point of any data analysis” (13-14). E.g. bpm of 40 medical writers and editors.
Graphical Displays
“frequently used to summarize and present data” (14)
Frequency Distribution
A graphical display that indicates how often each value occurs in the data set (14).
Central Tendency
the data’s tendency to cluster near the central value (15).
Variability/Dispersion/Spread
The spread of the data points (15).
2 Needed Elements to Summarize Data
1) Measure of central tendency
2) Measure of variability (16).
Mean (arithmetic)
The average value (16)
Median
Middle value of a sequential set of data, or its 50th percentile, value at which half of the data points and half are lower (16)
Mode
Most frequently occurring value in a set of data (16).
3 Measures of Central Tendency
Mean, median, mode
Range
Difference between lowest and highest values of a data set (17). Usually written as minimum followed by maximum value.
Interquartile range (IQR)
“the range between the data’s first quartile (25th percentile) and third quartile (75th percentile)—the middle 50% of data values” (17). Often reported witht he median, allowing one to work out quartiles (17).
Standard Deviation (SD)
“the average distance between each data point and the mean value of the distribution” (18). *“The SD should be used and reported with the mean only when the data are normally distributed (or nearly so)” (18).
Standard Deviation Calculation
- Calculate the mean.
- Determine the distance of each data value from the mean.
- Square each of these calculated differences (distances) from the mean to convert the negative values to positive values….(Note: If we don’t perform this step, then when we add these 40 differences, as we will do in the next step, we will obtain a total of zero because all of the positive differences will be exactly balanced by the negative differences.)
- Add these 40 squared differences (this yields a number called the sum of squares) and divide the total by the number of values minus 1…to obtain the average squared distance from the mean….This number is known as the variance. (The sum of the squares is divided by n-1 rather than by n because this slight increase in the average distance provides a more accurate estimate of the variability of the data.)
- Finally, find the square root of the variance…to determine the standard deviation” (18-19)
3 Measures of Variability
Range, Interquartile Range, Standard Deviation