Organizing, Displaying, and Describing Data Flashcards
What is a variable
- Any characteristic that can & does assume different values for different people, objects, or events being studied
What are the four measurement scales for variables
- Nominal
- Ordinal
- Interval
- Ratio
Describe nominal
- Numbers are simply used as a code to represent characteristics.
- There is no order to the categories.
- The assignment of numbers to categories is arbitrary
- Ex: gender or ethnicity
Describe ordinal
- Numbers represent categories that can be placed in a meaningful numerical order (e.g., from lowest to highest).
- There is no information regarding the size of the interval between the different values.
- The size of the interval may be different between the different categories.
- There is no “true” zero.
- EX: pain scale 1 = no pain, 2 = a little pain, 3 = some pain, 4 = a lot of pain
Describe interval
- Numbers can be placed in meaningful order.
- The intervals between the numbers are equal.
- It is possible to add and subtract across an interval scale.
- There is no true zero, so ratios cannot be calculated.
- Ex: Fahrenheit temp., SAT, or GRE
Describe ratio
- Numbers can be placed in meaningful order.
- The intervals between the numbers are equal.
- There is a “true” zero, determined by nature, which represents the absence of the phenomena.
- Almost all biomedical measures (weight, pulse rate, and cholesterol level) are of ratio scale.
- Ex: weight, age, # of min. spent exercising, cholesterol level, or # of wks pregnant
What is the goal of displaying data
- To get a feeling for the distribution of the data
Define the parts of displaying data
- Central tendency: most frequently occurring values
- Dispersion: how the values are spread out
- Shape and skewness: symmetry or asymmetry of the distribution of the values
- Outliers: unusual values that do not fit the pattern of the data
Describe frequency distributions
- A table that shows classes or intervals of data with a count of the number in each class. The frequency (f) of a class is the number of data points in the class.
Define class width
- The distance b/w lower (or upper) limits of consecutive classes
Define range
- The difference b/w the max and min data entries
Describe histograms
- A way of organizing the data in visual form
- Data have to be at least ordinal in scale
What are the rules for histogram construction
- The values of the variable being graphed are on the x-axis
- Class intervals are used (mutually exclusive, exhaustive, & even widths)
- The bars of the histogram touch
Describe a stem and leaf plot
- Each number is separated into a stem (usually the entry’s leftmost digits) and a leaf (usually the rightmost digit)
- Allows us to see the shape of the data as well as the actual values
What is the advantage and disadvantage of using a graphical method for describing data
- Advantage: Its visual representation
- Disadvantage: Its unsuitability for making inferences (our main goal)
What are some numerical methods for describing data
- Frequency distribution table
- Histograms
- Stem and leaf plot
- Pie chart
- Scatter plot
- Times series chart
Describe the differences between mode, median, and mean
- Mode: most frequently recurring value (appropriate for nominal, ordinal, interval, & ratio data); if no entry is repeated then there is no mode
- Median: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data); middle entry when all entries are put in order & if it’s a even # of entries take the mean of the 2 middle values
- Mean: the arithmetic average of the distribution ( appropriate for interval & ratio data); sum of all values divided by total entries
Define an outlier
- A data entry that is far removed from the other entries in the data set
Comparing mean, median, and mode which ones are affected by an outlier
- Mean is affected while median and mode are not influenced by extreme values
Define midrange
- The average of the highest and lowest value in the data set
- Very easy to find but highly effected by the extreme values
Describe a weighted mean
- It’s the mean of a data set whose entries have varying weights
- Ex: homework is 30%, exams are 50%, and projects are 20% of your final grade
What are the measures of dispersion and their goal
- Goal is to get a feeling for the spread of the data
- Range: difference b/w the highest & lowest value in a data set (appropriate for ordinal, interval, & ratio data)
- Interquartile range: the value that is in the middle of the distribution (appropriate for ordinal, interval, & ratio data)
- Standard deviation: average distance of each point from the mean (appropriate for interval & ratio data)
Describe symmetrical distributions
- Data are evenly distributed about the center
- There is the same amount of data on the right & left side of the distribution
- Not all symmetrical distributions are “normal”
Describe skewed distributions
- Data are not evenly distributed about the center
- Can be “right skewed” or “left skewed”
Define deviation
- Difference b/w the entry & the mean of the data set
Guidelines for finding the sample standard deviation
1) Find the mean of the sample data set
2) Find the deviation of each entry
3) Square each deviation
4) Add to get the sum of squares
5) Divide by n-1 to get the sample variance
6) Find the square root of the variance to get the sample standard deviation
Describe the empirical rule for standard deviation
- For data with a (symmetric) bell-shaped distribution the standard deviation has the following characteristics
1) ~68% of the data lie within 1 standard deviation of the mean
2) ~95% of the data lie within 2 standard deviations of the mean
3) ~99.7% of the date lie within 3 standard deviations of the mean
Describe standard error
- The values of a specific variable from a sample are an estimate of the entire population of individuals who might have been eligible for the study
- A measure of the precision of a sample in estimating the population parameter
- Dependent on sample size: larger the sample, the smaller the standard error
Standard error of the mean equation
- Standard deviation ÷ square root of (sample size)
- if sample greater than 60
Describe confidence intervals
- Range of values which we can be confident includes the true value
- Defines the “inner zone” about the central index (mean, proportion or ration)
- Describes variability in the sample from the mean or center
- Will find CI used in describing the difference b/w means or proportions when doing comparisons b/w groups
- Ex: 95% CI indicates that we are 95% confident that the population mean will fall within the range described
Describe quartiles & percentiles
- Useful for comparing scores within one data set
- Ex: if a score is in the 80th percentile (P80) it means that 80% of all the scores fall at or below this score in the distribution & 20% of all the scores fall above this value
Describe quartiles
- The 3 quartiles, Q1, Q2, and Q3 approximately divide an ordered data set into four equal parts
- Q1 is the median of the data below Q2
- Q2 is the median
- Q3 is the median of the data above Q2
Describe the interquartile range (IQR)
- The difference b/w the third and first quartiles
Define fractiles, percentiles, and deciles
- Fractiles: numbers that partition, or divide, an ordered data set
- Percentiles: they divide an ordered data set into 100 parts (there are 99 percentiles)
- Deciles: they divide an ordered data set into 10 parts (there are 9 deciles)
Define hypothesis
- Statement about a population, where a certain parameter takes a particular numerical value or falls in a certain range of values.
Define null hypothesis (H0)
- “Innocent until proven guilty”
- Usually states that no difference b/w test groups really exists
- Fundamental concept in research is the concept of with “rejecting” or “conceding” the H0
Limitations of significance tests
- Statistical significance does not mean practical significance
- Significance tests don’t tell us about the size of the effect (like a CI does)
- Some tests may be “statistically significant” just by chance
- Be skeptical when you hear reports of new medical advances
- There may be no actual effect
- If an effect does exist, we may be seeing a sample outcome in right-hand tail of sampling distribution of possible sample effects, and the actual effect may be much weaker than reported.
Difference between confidence interval and P-value
- CI will give information about the size of the difference & the strength of the evidence
- P-value will tell you whether or not there is a statistically significant difference
Describe clinical importance
- A medical judgement not statistical
- Clinicians should change practice only if they believe the study has definitively demonstrated a treatment difference and that the treatment difference is large enough to be clinically important.
Clinical importance depends on your knowledge of
- A range of possible treatments
- Their costs
- Their side effects