Biostat Chapters 1-4 Flashcards
3 Basic Measurement Scales
- Categorical (nominal)- represent unordered categories
- Ordinal- Categories that can be put in order/ranked
- Quantitative (scale, continuous, interval, ratio)- represent meaningful numerical values for which arithetic operations make sense
Arithmetic Average
A distribution’s gravitational center. This is where the distribution would balance if placed on a scale
Average
Refers to the center of distribution. Can be measured as arithmetic average and the median
Axis multiplier
included to show the total value of the stem
Bias
An over or underestimation of something (value, factor)A systematic error
Biostatistics
Broad range of activities that help us improve the intellectual content of data from biological, biomedical, and public health realted studies. It is more than just a compilation of computational methods
Blinding
When the study subject is kept in the dark about the explanatory variable being used.Double blind is when the study subject and investigator are both kept in the dark.Triple blind is when the study subject, the investigator and the statistician are all kept in the dark.
Cargo Cult Science
Appears to be scientific but does not follow the scientific method
Categorical Measurements
Places observations into classes or groups
Census
A survey that attempts to collect information on all individuals in the population
Chebychev’s rule
Applies to all data sets, regardless of their shape. Says at least three-fourths of the data points will lie within two standard deviations of the mean.
Cluster Samples
Randomly selected larger units (clusters) consisting of smaller subunits.
Ex. Households
Comparative Studies
Designed to quantify the relationship between variables
Complex Sampling Designs
- Cluster Sample
- Stratified Sample
- Multistage Sample
Confounding
When lurking variables effect the explanatory variable
Controlled Trial
A trial when there is one or more control group
Data tables
Form containing observations, variables and values
Degrees of freedom
The variance is the average of the sum of squares, with the sum of squares divided by “n-1” instead of “n”. The number “n-1” is the degrees of freedom of the variance. You lose one degree of freedom because knowing n-1 of the deviations determines the last deviation.
Descriptive statistics
a set of observations that describe the characteristics of a sample
Deviation
Data point minus the mean
Distributional shape
Describes the symmetry, direcion of skew if asymmetric, modality (number of peaks), and kurtosis (steepness of peaks).Note: descriptors of shape are unreliable whn data sets are small and moderate in size.
Equipose
Balance doubt between benefits and risks
Explanatory Variable (Independent Variable/Factor)
The exposure being investigated in a comparative study
Factors
Explanatory variables
Frequency Distributions
Tells us how often various values occur in a batch of numbers
Frequency tables
List frequencies (counts), relative frequencies (proportions), and cumulative frequencies (proportion of values up to and including the current value). Quantativative data may first need to be grouped into class intervals before tallying frequencies.
GIGO
“Garbage In, Garbage Out”
How are categorical variables displayed?
Bar or Pie chart
Imprecision
Inability to get the same result upon repetition
Infer
Deduce or conclude based on evidence
Inferential statistics
consists of a set of statistical techniques that provide predictions about the population based on information in a sample from that population
Interaction
When multiple factors produce an effect that would otherwise not be predicted
Location
References the distribution’s center. Measured by mean and median
Lower outside value
Values below the lower fence
Lurking variables
Extraneous factors that influence the explanatory variables’ response variable
Measurement
Assigned numbers or codes according to prior-set rules
Multistage sampling
Large units are chosen at random. Subunits are sampled in successive stages.
Nonexperimental Studies
“observational study”. The investigator merely classifies individuals as exposed or nonexposed without intervention
Nonresponse bias
Nonresponse bias is the bias that results when respondents differ in meaningful ways from nonrespondents.
Nonresponse is often problem with mail surveys, where the response rate can be very low.
Occurs when a large percentage of individuals refuse to participate in a survey or cannot otherwise be contacted.
Objectivity
The intent to measure things as they are without shaping them to conform to a preconceived worldview
Observation
A unit upon which measurements are made (e.g. individuals). Data from observations are stored in rows of data tables
Placebo
an inert or innocuous intervention
Population
A group of people
Precision
Ability to be replicated
Probablity Sample
A sample in which each member of the population has a known probablity of entering the sample
Resistance
Ability to remain relatively consistent when outliers are present
Response Variable
The outcome of a study
Sample
A subset of a population
Sampling frame
The actual set of units from which a sample has been drawn.
Sampling Independence
Independent samples are those samples selected from the same population, or different populations, which have no effect on one another. That is, no correlation exists between the samples.
Simple Random Sample (SRS)
A probability sample in which each population member has the same chance of entering the sample
Spread
Refers to the extent to which values are dispersed.
Stem-and-leaf Plot
The stem on a stemplot represents a number line with “bins”. Leaves represent the following significant digits value.
Excellent way to explore the shape, location and spread of distribution.
Stratified Random Sample
A sample that draws independent SRSs from within relatively homogenous groups or “strata”.
Ex. Population divided into 5 year age blocks (0-4, 5-9, 10-14, etc.)
Surveys
Designed to quantify population characteristics
The Normal Rule
When there is Normal distribution:
68% of the data points will lie within one SD of the mean.
95% of the data points will lie within two standard deviations of the mean.
99.7% of the data points lie within three standard deviations of the mean.
The three measures of central location
Mean, Median, Mode
Treatment
A specific set of factors applied to study subjects
Trials
Experiment involving human subjects
Types of Comparative Studies
Experimental studies permit the investigator to assign study exposures to study subjects
Observational studies- do not permit the assignment of exposures to study subjects
Types of Statisical Studies
- Surveys
- Comparative Studies
- Experimental
- Observational
Undercoverage
When some groups in the source population are left out or are underrepresented. Can cause bias
Upper outside value
values above the upper fence
Valid
When somthing is unbiased. It is not over or under estimated when compared to the true value
Validity
Ability to objectivley identify the true nature of the observation
Values
Realized measurements. For example, the value for the variable “AGE” for observation #1 is “32”. Values are stored in table cells
Variables
Characteristics which are measured such as age, gender or disease status. Data from variables form columns of data tables
Volunteer Bias
Can occur because self-selected participants of a survey tend to be atypical of the population