Biostatistics Flashcards
What is Biostatistics defined as?
-the application of statistical theory in medicine, public health, or biology
Population
- large group of all subjects interest in a study.
- very large almost impossible to collect data from whole population
- have to select a subset of subjects to get samples from a population, then generalize findings from samples to the target population
conceptual population
-population of persons people/ identity
population in clinical setting?
-usually talk about it as population of measurements such as weight, height, and blood pressure
Total population
- target population
- all of the subjects of interest in a study, about which the study wants to generalize the conclusion
- ex: all 12 year olds in US
Defined population
- a subpopulation confined by certain characteristic(s) such as demographics and geographical areas in the total population
- ex: 12 year olds in RUSD
Study population
The group of individuals in a study
- In a clinical trial, all participants who followed criteria of inclusions and exclusions make up the study population
- ex 12 year olds in certain school in RUSD
Sample
- a procedure to select samples from the study population, they are representative of the total population
- validity based on how random and well rounded this sample group is
How do we estimate the required sample size?
1) pre-determined power (80/90%)
2) specific significance level
3) mean & variance of the primary outcome; can be approximated
4) the design of the study
what is a statistic?
- calculated from a sample for a specific characteristic of the sample
- will be used to estimate the corresponding parameter of the study population & further generalized to the target population to find mean & standard deviation
Why do we sampling to get samples?
Money, Time, Practicality, and Accuracy
Probability sampling
- each subject has a known probability of being selected
1) simple random sampling
2) stratified random sampling
3) Systematic random sampling
4) Clustered random sampling
simple random sampling?
based on probability to take the samples
-each member of the subset has an equal probability of being chosen
stratified random sampling?
- stratify the study population into subgroups, then take random samples from such subgroups
- ex: names of 25 employees being chosen out of a hat from a company of 250
Systematic random sampling
- type of probability sampling
- members from a larger population are selected according to a random starting point and a fixed, periodic interval
- interval, is calculated by dividing the population size by the desired sample size
Clustered random sampling
- the researcher divides population into separate groups (clusters)
- a random sample of clusters is selected from the pop
- researcher conducts his analysis on data from the sampled clusters
non-probability sampling
- each subject doesnβt know probability of being selected, βstimations are biased
1) voluntary samples
2) convenience samples
voluntary samples
-whoever is self selected into the samples
convenience samples
-whoever is convenient to be selected and/or investigated. -ex: staff members in med school recruited for some trials
Sampling errors
- random errors
- are unavoidable
- the differences between the sample & population, due to sampling randomness
sampling errors effect on the data?
- Random error does not have consistent effects across the entire sample
- the sum would be zero if the sample size is large enough.
what does sampling error add? what does it not affect?
-random error adds variability to the data but doesnβt affect average performance of the samples
Non-sampling errors
- more serious due to mistakes made in the acquisition of data, inappropriate sample selection, or response biases
- can bias estimation
Random Samples vs. Randomization
Both involve the use of the probability sampling method.
Random sampling
- determines who should be included in the samples;
- related to sampling procedure;
- effect external validity (generalizability)
Randomization or random assignment
- determines if sampled subjects should be in treatment or control group
- related to design operation
- effects internal validity
how we can conduct correct population inferences from the sample data we have?
- mean and variance are the cornerstones for statistical inference
- used to estimate the corresponding values of population.
1) letters used for population?
2) letters used for sample estimation?
1) greek letters
2) sample estimation
How use an unbiased estimation of population variance?
- use n-1 instead of n in sample variance equation
- N (capital N) used for the population variance
Standard deviation (SD)?
- measure of the variability of a measurement among the subjects in a population or in a sample
- can be estimated for both population & sample
- often unknown for pop. but can be estimated from SD of its samples
Standard Error (SE)?
- measure of the precision of the sample mean
- decreases w/ increasing sample size
- estimated only for sample.
In general what is reported when summarizing the sampling variation?
- SD (standard deviation)
- SE is given for any statistical inference of mean
What is Outcome?
- a specific characteristic of interest being studied
- could be more than one
- also called endpoints
- can be described or quantified by different measurement scales
Measurement scales (4)?
1) Nominal
2) Ordinal
3) Interval
4) Ratio
Nominal
- Categorizing subjects by characteristics such as gender and ethnicity
- the categories have neither order or ranking
- neither logic nor mathematical operation
Ordinal
- ranking subjects into different orders
- precise differences between ranks do not exist, and one rank is not better than other
- pain severity
Interval
- quantitative measurement with equal units, every one unit difference is meaningful and constant,
- doesnβt have absolute zero point
- ex: temperature
Ratio
- uses zero to present the absence of value (absolute zero point)
- ex: height, age, weight
Qualitative data
- describe subjects in qualities, cannot measure subjects in precise quantities
- nominal scale and ordinal scale.
Quantitative data
- quantify the quantities of the subjects, can be measured by scales or counted by numbers.
- can be discrete or continuous
Discrete
- Reflect a number during the counting process, no decimal
- Zero is the minimum
- ex: number of children
Continuous
Reflect a measurement with decimal places, often depends on the precision of the measuring device
What does probability provide?
- a quantitative description of the chances (of successful treatment outcomes) or likelihoods (of disease) associated with various outcomes
- provides a bridge between sample statistics and population parameters
what three components do you need to estimate probability?
1) outcome
2) sample space
3) an event
outcome?
- a possible result from an experiment
- ex: toss coin get heads or tails
Sample Space?
- the set of all possible outcomes of an experiment
ex: flip coin can be head or tails
An Event?
- a subset of outcomes that are equally likely to take place, can be defined by one or more members of the sample space.
ex: toss coin can be H/T; but if toss coin more than once can be HHT or TTH etc
Empirical Probability ?
- estimated as the proportion of: how many times the event of interest occurred / the total number of all the potential events that might be observed
- epidemiology, is defined by empirical probability
Marginal probability
the probability of an event occurring
Conditional probability
- measure of the probability of an event occurring given that another event has occurred.
- important in medicine since disease is based on many factors
Prevalence
- expressed & reported as a percentage, per 1000, or per million
- measure of disease burden in the population but not a measure of risk
- is a snapshot in time, but can use different time scales
time scales used in prevalence calculations?
1) point prevalence (at a certain time point, most common)`
2) period prevalence (during a certain time period)
3) cumulative incidence (before a time point)
what does incidence rate measure?
1) new cases in a population over a given period
2) risk of developing a disease within a given period
3) how quickly new cases develop in the population
- per 100; 100 million, or 1000
- also called absolute risk
how are prevalence and incidence rate related?
-Prevalence= Incidence Γ Disease Duration
how does a new treatment for lung cancer patients that prolongs survival effect the prevalence, incidence & disease duration?
1) prevalence increased
2) incidence unchanged
3) duration increased
how does an effective AIDS vaccine created and approved
effect the prevalence, incidence & disease duration?
1) prevalence decreased
2) incidence decrease
3) duration unchanged
how does a sensitive & early detection test is developed to diagnose cancer at earlier stages to make cancer relatively easy to control
effect the prevalence, incidence & disease duration?
1) prevalence increase
2) incidence increase
3) duration increase
Mortality Rate
(ππ’ππππ ππ ππππ‘βπ ππ’ππππ π π πππππππ ππππππ)/(ππ’ππππ ππ ππππ πππ ππ π‘βπ ππππ’πππ‘πππ ππ’ππππ π‘βπ ππππππ
Case fatality rate
- is a %
- (ππ’ππππ ππ ππππ‘βπ πππ‘ππ πππ πππ π πππ ππ‘ ππ πππππππ ππ )/(ππ’ππππ ππ π‘βπ ππππ πππ πππππππ ππ π€ππ‘β π‘βπ πππ πππ π) x 100