Biostats. Flashcards by Mary Klauber

labels/names

nominal data

How well did you know this?

Not at all

Perfectly

data with only 2 outcomes

ex: yes or no

dichotomous data

How well did you know this?

Not at all

Perfectly

data that consists of names, labels or other nonnumerical data

categorial data

How well did you know this?

Not at all

Perfectly

uses labels in an order

ex: poor, fair, excellent

ordinal data

How well did you know this?

Not at all

Perfectly

data that can take any value

ex: numbers

continuous data

How well did you know this?

Not at all

Perfectly

values that are equally spaced

ex: age

interval data

How well did you know this?

Not at all

Perfectly

values that has an actual zero point

ex: blood alcohol level

ratio data

How well did you know this?

Not at all

Perfectly

central tendencies of continuous data is measured with….

mean, median, mode

How well did you know this?

Not at all

Perfectly

variation/spread of the data

dispersion

How well did you know this?

Not at all

Perfectly

with a normal [Gaussian Distribution], how does mean/median/mode relate?

mean = mode = median

How well did you know this?

Not at all

Perfectly

If data has a right/positive skew [tail is to the right], what does this mean?

Mean > Median

How well did you know this?

Not at all

Perfectly

If data has a left/negative skew [tail is to the left], what does this mean?

Mean < Median

How well did you know this?

Not at all

Perfectly

Measures of dispersion

range, variance, standard deviation

How well did you know this?

Not at all

Perfectly

value below the point where a particular percent of scores or observations fall

percentiles

How well did you know this?

Not at all

Perfectly

what does 95th percentile mean?

95% of values are below this number

How well did you know this?

Not at all

Perfectly

data from the 25th to 75th percentiles

interquartile range

How well did you know this?

Not at all

Perfectly

why is interquartile range used?

helps to ignore outliers

How well did you know this?

Not at all

Perfectly

calculates on average how far the mean is from other data points

variance

How well did you know this?

Not at all

Perfectly

square root of variance

larger = more spread out

standard deviation

How well did you know this?

Not at all

Perfectly

for skewed distributions what is the best ways to evaluate the data?

Median, range, interquartile range

How well did you know this?

Not at all

Perfectly

For normal distributions what is the best ways to evaluate the data?

mean and standard deviation

How well did you know this?

Not at all

Perfectly

with a normal distribution, how much of the data data should be within 1 SD of the mean?
Within 2 SD of the mean?

1: 68.3%
2: 95%

How well did you know this?

Not at all

Perfectly

process of using data obtained from a sample to make estimates about the characteristics of a population

statistical interference

How well did you know this?

Not at all

Perfectly

what is the basis of statistical interference?

random sampling

How well did you know this?

Not at all

Perfectly

error that is due to chance and is not standardized

random error

large number repeated sampling = normal distribution

central limit theorem

standard deviation of a sampling distribution

standard error

what effect on standard error does a larger sample size have?

larger sample size = smaller standard error

95% of the sample menas should be within how many units of standard error?

1.96

determine how close the sample relates to the actual population - Are 95% of the samples within 1.96 SE units from the mean?

confidence intervals

there is no difference in the outcome between variable groups

null hypothesis

there is a difference in the outcome between variable groups

alternative hypothesis

when the null hypothesis is true and you reject it "you say there is a difference but there isn't" false positive

Type I Error

the probability of making a Type I error | typically 0.05

alpha error

failing to reject the null hypothesis when there is a difference between groups false negative

Type II Error/Beta Error

probability of correctly rejecting the null | 1 - beta

power

the greater the ability to NOT make a Type II error….

the larger the power

what increases the power of a study?

increased sample size larger effect size decreased variability sample data increased alpha

If p < alpha

reject the null hypothesis | there is statistical significance

If p > alpha

fail to reject the null | not statistically significant

compares in the menas of normally distributed continuous variables between two gorups; determines tif the means of two groups shows significantly different distributions

T-test

what is the non-parametric version of a T-test?

Mann-Whitney U Test

1 way analysis of variance | compares distribution of continuous variables among more than 2 independent groups

ANOVA

Problem with ANOVA?

can determine there is statistical difference among groups but cannot tell which group is different

Nonparametric version of ANOVa

Kruskal Wallis Test

compares ranks between groups rather than means | uses the h-statistic

Kruskal Wallis Test

T-test performed on a repeated measures two-group designs | same thing measured on patient at two different ties - pre and post assessments

paried T-test

what data is used with paired T-test?

dependent normally distributed continuous variables

what statistical test do you used with dependent but ordinal data?

Wilcoxan Matched-Pairs Signed Ranks Test

scatterplots are an effective way to convey info for

2 continuous variables

this assesses linear relationships between 2 continuous variables

Pearson correlation coefficient

(+) Pearson correlation coefficient

positive linear relationships | as x increase y increases

in Pearson correlation coefficient, the closer r is to -1 or +1

the stronger the relationship

(-) Pearson correlation coefficient

negative linear relationship | as x increases y decreases

Pearson correlation coefficient = 0

no linear relationship

compares association between rankings of 2 variables (non-parametric) rho

spearman correlation coefficient

(observed value - expected)^2 / expected value

Chi-square analysis

if any value for Chi-square test < 5, what test should you then do?

Fisher Test

Used to check difference between to or more percentages or proportions of categorical outcomes

chi-square test

relationship between two variables that are due to the presence of unmeasured variables

confounding

what ways can you account for confounding?

stratified analysis | multivariable analysis

measure of the relationship between two continuous variables - represented by scatterplots

correlation

formulae that forms a line - used to quantify a change in y based on a change in x

linear regression

3 ways to measure confounding

multiple linear regression logistic regression proportional hazards modeling

This looks at relationship between multiple independent variables and a single dependent variable

Multiple linear regression

amount of variance in the dependent variable that is predicted from the independent variable

R^2

The closer R^2 is to 1 ….

the better the model

same thing as linear regression for multiple continuous and/or categorical variables - dichotomous outcome

logistic regression

likelihood that an outcome will occur based on changes in the variables

odd ratio

in logic regression you evaluate the beta coefficient AND ___ for each independent variable

odd ratio

this looks at the relationship between multiple variables and the TIME to an event How long does it take to get a certain outcome?

Cox proportional hazards analysis

independent variables in Cox proportional hazards analysis can be either

continous or categorical

In Cox you evaluate the beta coefficient AND ___ for each independent variable

hazard ratio

compares the probability of an event occuring over a given time chance of an event occurring the treatment arm/chance in the control arm

hazard ratio

variation that occurs due to change with random sampling - affects the study and control equally

random error

error disproportionately affects one group

bias

bias that is introduced in the way in which participants are assigned to groups

selection bias

error that is due to differences in the way data is collected

measurement bias

participants do not accurately recall information

recall bias

Proper study design can control

confounding and limit bias

probability is the expression of ___ | = # of times an event occurs/total # of opportunities for occurrence

risk

ratio of probability an event occurs vs probability that anything else occurs

odds

multiplication law of probability

prob of A + B = (prob of A)*(prob of B)

addition law of probability

prob of A or B = (prob of A) + (prob of B) - (prob of (A+B))

focuses on describing the distribution of health conditions

descriptive epidemiology

compares groups to test hypotheses regarding potential causes and contributing facotrs

analytic epidemiology

existing cases of a disease in a given population | = # with disease/total population

prevalence

number of people in a population who have a disease over a given time period

period prevalence

3 determinants of prevalence

incidence of disease duration of disease entry/exist of cases

in a stable population what is the formula for prevalence

incidence * duration

number of new cases of disease in a given population during a specific time period

incidence

proportion of at risk people who get the disease

attack rate

proportion of people diagnosed with a given condition who die due to that condition

case fatality ratio

proportion of all deaths in a given time period that are due to a specific condition

proportionate mortality

of new events that occur during a time period/average population at risk

rate

the number of deaths per 1000

mortality rate

(total # of deaths/mid interval population) * 1000

crude mortality rate

types of observational epidemiological studues

cross sectional study cohort study case control study

types of experimental epidemiological studies

random control trial

describes prevalence of potential risk factors (exposures) and conditions (outcomes)

cross sectional study

strengths of cross sectional study

quick and not costly; usefull for developing hypotheses

limits of cross sectional study

can't determine relationshipd between variables | late look bias (cases of disease are longer duration)

group of people are followed over time to monitor the development of disease or health condition can be prospective or retrospective

cohort study

strengths of cohort study

useful for rare exposure can study multiple outcomes can measure risk of outcome in group

limits of cohort study

time and cost requirements not as good for rare outcomes loss follow up

ratio of incidence of disease in exposed persons to the risk in unexposed

relative risk

relative risk = 1

no difference in groups

relative risk > 1

exposed group as greater risk

relative risk < 1

exposed group has less risk

measure of how much disease is actually attributable to the risk factor

attributable risk

participants with a disease are compared to participants without a disease

case control study

strengths of case control study

useful for rare outcomes able to study multiple exposures less time and cost

limiations of case control study

not as good for rare exposures potential for recall bias challenges in selecting control only ESTIMATE risk

estimates relative risk when disease is relatively uncommon

odds ratio

additional amount of disease that is present in a population because of the presence of a risk factor

population attributable risk

randomization minimizes

confounding

blinding minimizes

selection and measurement bias

of patients who need to receive treatment to prevent one event occuring

number needed to treat (NNT)

Biostats. Flashcards

(118 cards)