Module 3 Flashcards
when does the process of data analysis begin?
at the very start of any project, long before any data are collected or analyzed.
what are the principles of measurements?
- reliability
- validity
define reliability
the repeatability and consistency of series of the measurement
define validity
the accuracy
what is descriptive technique
helps you find outliers or missing information which may harm your analysis and cause bias of your interpretation
what is inferential statistics
allows you to make generalization about populations from the sample thats collected
**to make inferences
what is descriptive statistics?
-used to organize, summarize, and report data
- can summarize large amounts of data in just a few numbers or a simple graphic display
- descriptive statistics= data reduction
what is the first step in any data analysis?
- describe that data
**describing the data is like performing the hx and physical exam on a patient before prescribing treatment and further testing
**descriptive statistics are the ‘vitals signs’ of data
what does the level of measurement dictate?
-dictates what types of descriptive statistics can be performed
what are the types of categorical data?
- Nominal (names)
- gender, race, disease status
- Ordinal (ranked)
- satisfaction with care, pain, cancer stages
descriptive statistics for categorical data (3)
- Frequencies (counts) (n)
- Relative frequencies presented as percentages (%)
- graphic display
- bar chart for nominal
-histogram for ordinal. if histogram is not available,
use the bar chart
- bar chart for nominal
what does a frequency table do?
it takes a disorganized set of scores and groups together all individuals who have the same scores
relative frequency
measures the fraction of the entire group that is associated with each score
–> to compute the % associated with each score, first find the relative frequency then multiply by 100
what are types of continuous data?
- interval level
- ratio
what is interval level?
- equal intervals between each number on the scale
- no ‘true’ zero
what is ratio
- equal intervals between each number on the scale
- there is a true zero
description of continuous data (3)
- measures of central location (central tendency)
- measures of spread (dispersion)
- graphic display (histograms, stem-and-leaf plots, box-and-whisker plots)
what are the measures of central location
-mean: add all scores and divide
-median: 50% of the values are above the median and 50% are below the median
-mode: most freq occurring
-geometric mean
-midrange: geometric mean and midrange
measures of central location: the decision of whether to use the mean or median is based on:
- the shape of the distribution (normal or assymetrical)
- the presence or absence of outliers
measures of central location: mean is used when
- the data are normally distributed AND
- there are no outliers
measures of central location: median is used when
- the data are NOT normally distributed OR
- there are any outliers (even 1)
measures of spread (7)
- statistical range
- epidemiological range
- percentiles
- quartiles
- interquartile range (IQR)
- standard deviation (SD)
- coefficient of variation (CV)
statistical range
- one number
- calculated as the difference between the largest value minus the smallest value
- if the largest value was 50 and the smallest was 20, the statistical range would be 30
epidemiological range
- two numbers
- both the minimum and the maximum values are reported
- if the largest value was 50 and the smallest was 20, the epidemiological range would be present as 20,50
standard deviation
a measure of the average distance between the observations and the mean
what are the most common values used to describe a set of data?
the mean and standard deviation
the most commonly used measures of spread are: (3)
- standard deviation
- interquartiles range
- epidemiological range
what is the measure of spread determined by?
choice of a measure of central tendency
data display for continuous data
- histogram
- stem and leaf plot: similar to a histogram
- shows the overall shape of the distribution
- show individual data values
- box and whisker plot: conveys more info than both histogram and stem-and-leaf plot
- depicts the:
- overall distribution
- center of distribution
- quartiles
- outliers
**good for comparison across groups
- depicts the:
what test are used to assess normality?
- shapiro wilk test
- kolmogorov-smirnov
description of continuous data (4)
- frequency (number of observation)
- measures of central location
- measures of spread
- graphic display (histogram, stem and leaf plot, box and whisker plots)
how is the presence of outliers assessed?
with the box and whisker plot with tukey fences
normality of the distribution: p-value ≥ .05
normal distribution
normality of the distribution: p-value <.05
the distribution is not normal
what should be reported if the data follow a normal distribution and there are no outliers?
the mean, standard deviation, and epidemiological range should be reported
what should be reported if the data are not distributed normally or if there are outliers?
the median, interquartile range and epidemiological range should be reported